At one of our customers we build a synthetic monitoring platform on top of Amazon Web Services(AWS) since due to very special requirements existing solutions are not sufficient for us. To speed up development the synthetic checks are deployed as Lambda functions. Check results and metrics, collected with OpenCensus, are stored in Prometheus and visualized with Grafana. This setups works pretty well, except for one little glitch. Developers are not able to access the Lambda execution logs!
Logs were accessible only through the AWS CloudWatch Logs UI. Why is this problematic? First, for us it would not be an option to give all developers access to AWS CloudWatch Logs. Second, we have more than 300 Lambdas running and everyone who ever worked with AWS CloudWatch UI will probably agree that it is not the most intuitive UI. That’s why we were looking for a more satisfactory solution.
So please welcome Loki – The new tool in the box
Loki, who are you?
Loki, a CNCF incubator project, is a Prometheus-inspired logging backend for cloud native applications. It is build on top of Cortex and optimized for Grafana, Prometheus and Kubernetes. Contrary to other logging solutions, Loki does not do full indexing or parse the incoming log stream. Instead, Loki indexes and groups log streams using the same labels already used with Prometheus. Starting with version 6.0. Grafana offers a full fledged exploring and visualization datasource for Loki. Thus, Loki and Grafana can be considered as a perfect match. Anyway, Loki is still beta software and not yet considered as production ready. 
After reading about Loki for the first time, we knew that we wanted to integrate it into our platform! If this works out nicely, we would have solved our logging challenge and defined a single place of visualization for metrics as well as logs. Wouldn’t this be awesome?
As shown in the figure below, the idea was quite simple. Since almost all our components were implemented as Lambda functions, the Loki-Shipper should also be one of them. To forward the logs from CloudWatch to Loki a Lambda function will register for log events — transform, enrich and push them towards Loki.
Our setup did not include a Kubernetes cluster. Hence, we did not use Loki as it was initially optimized for. That’s why we faced some challenges which will be covered in the following.
From now on it is assumed that the reader is familiar with AWS CloudWatch logs. Especially, w.r.t to the difference of log groups and log streams.
A log stream is a sequence of log events that share the same source. Each separate source of logs into CloudWatch Logs makes up a separate log stream.
A log group is a group of log streams that share the same retention, monitoring, and access control settings. You can define log groups and specify which streams to put into each group. There is no limit on the number of log streams that can belong to one log group .
A basic Loki logging installation consists of 3 components :
- Promtail is the agent, responsible for gathering logs and sending them to Loki.
- Loki is the main server, responsible for storing logs and processing queries.
- Grafana for querying and displaying the logs.
In order to move forward quickly, we have chosen the simplest installation method: Run Loki in Docker containers on EC2 as described here.
The lack of a Kubernetes cluster forced us to use an alternative method to move logs to Loki. Fortunately, Promtail provides a simple REST API whose documentation is available here.
To provide the Loki-Shipper with log streams, an integration from CloudWatch Logs to Lambda was necessary. AWS provides so-called Lambda triggers for this purpose. To activate such a trigger, the Lambda function must subscribe to certain log groups. As soon as the subscription is completed, the Loki-Shipper is called with the last log entries written to the subscribed log group.
To ensure the Loki-Shipper subscribes for all deployed Lambdas, a Lambda’s log group is automatically registered as trigger for the Loki-Shipper during deployment.
Label Loki Log Streams
As described earlier, Loki is Prometheus-inspired and thus uses labels to organize log streams. Again we are facing a challenge because we don’t use Loki as it was intended to be used. Ideally, Promtail fetches labels from Kubernetes and forwards them to Loki. This meant for us: No Kubernetes, no labels and therefore no structured logs!
In our scenario a proper labeling was only possible if the Loki-Shipper knows which labels needs to be attached to the current processed log event. An exemplary message consumed from the Loki-Shipper is shown below.
'message': '[INFO]\t2019-09-03T08:21:13.641Z\t5a510c45-49ca-47a6-bd8f-7ee19f4e1b0f\tfunc1 started...\n'
'message': '\n[INFO]\t2019-09-03T08:21:13.642Z\t5a510c45-49ca-47a6-bd8f-7ee19f4e1b0f\tHey there! This should go to Loki please!\n'
As show, only information about log group, log stream and log events are available. Consequently, a mechanism is needed to deduce from log groups to Loki labels.
To bypass this limitation several solutions were conceivable:
- Loki-Shipper queries a configuration database and resolves a log group to a set of labels
- Requires an additional component and implementation effort
- Lambdas encode Loki labels in the log format and Loki-Shipper parses the labeles
- Requires code changes in all Lambda implementations
- AWS tagging functionality to enrich log groups with tags
- Requires enhanced Lambda deployment scripts
Loki-Shipper in Action
If you expect to see a lot of code at this point we have to disappoint you. This blogpost is meant to give you a general overview about the idea for the implementation. But in this GitHub repository you can find the complete source code including detailed instructions how to install a demo in the Amazon Cloud.
In case you already have applications running in the AWS you can install the Loki-Shipper using the provided CLI and link it to existing CloudWatch log groups.
When you install the demo, the Loki-Shipper and two sample Lambda function are started. These sample Lambdas are invoked every minute and the corresponding logs are transferred to Loki via the Loki-Shipper.
The following figure illustrates how to use Loki to check the configurations of all Lambda functions which are implemented in Python, are executed in the AWS region eu-central-1 and whose name starts with fun. This simple query is also an example of what would not be possible with CloudWatch out of the box!
To sum up, with Loki we were able to improve our infrastructure so that metrics and logs are available in one tool. This was the last building block we missed in our monitoring environment. We are aware that Loki is still beta software and not ready for production. Anyway, Loki is promising and helps us to complete our desired monitoring environment as illustrated below. We stay tuned how Loki evolves!