Ocelot meets Lightstep - Enhanced Tracing with Lightstep
This is the second part of the blog series “Ocelot meets friends”. The introductory blog post explains in a generic way, how advanced data collection may enhance the features provided by modern observability platforms. In this blog post we will have a closer look on how the analytics possibilities with Lightstep may benefit from an advanced data collection with the Java-agent inspectIT Ocelot.
Lightstep is an observability platform that provides means for quick analysis of distributed tracing data. Lightstep was founded by the people behind Dapper, Google’s research project from which distributed tracing and OpenTracing emerged. Besides a highly scaleable tracing backend, Lightstep provides analysis and correlation features that facilitate identification of problems, anomalies and outliers. By using filtering and grouping utilities on any tag and meta-information within trace and span data, Lightstep allows to efficiently pinpoint root causes of performance problems.
In this article we will enhance trace collection for Lightstep. For illustration, we will use the running example introduced in the initial blog post. We recommend to have a short look into the introduction of the running example. The demo application represents a pet clinic where pet owners may schedule visits to the vets. To recap, we will aim for the following observability aspects:
- Automate Data Collection
- Flexible Customization of Data Collection
- Increasing Trace Transparency
- Powerful Context Propagation
- Flexible Collection of Business Data
To achieve all of this, we will integrate inspectIT Ocelot (short: Ocelot) with Lightstep using the following monitoring setup (created using openapm.io):
As Ocelot uses OpenCensus (and in future OpenTelemetry) internally, we are just using an OpenCensus Lightstep plugin to report traces to the Lightstep platform. This setup is more than easy.
You can easily play with this setup for yourself! Just checkout this demo scenario and follow the corresponding instructions in the readme.
For simplicity reasons we will show just one configuration example to demonstrate the flexibility and power of configuring Ocelot. The remaining configuration for this running example is available in the GitHub repository of this demo example.
Regarding our demo example, whenever a pet owner visits a vet, we want to track the pet type, to be able to distinguish the system behaviour based on the pet type. Of course, we could achieve this by using some instrumentation SDK and change the source code of the target application to enrich it with the instrumentation code. However, as mentioned above, we want to collect the data in an automatic way, without the need to change the source code. Let’s consider an example how we can achieve this with Ocelot.
With the following configuration we tell Ocelot to extract the pet type from a method called ‘validatePet‘ and attach this information to the tracing context, so it can be used anywhere along the request flow.
- name: validatePet
First, with the scope ‘s_gateway_controller_validate_pet’ we describe where (on which method) to apply an instrumentation snippet. In this case, the scope covers all methods starting with ‘validatePet’ within the ‘ApiGatewayController’ class. The action ‘a_extract_pet_type’ defines how to extract our desired information (pet type) from the instrumented method. Basically, we retrieve the information from the object in the first argument (‘_arg0’) of the instrumented method. Finally, the rule ‘r_extract_pet_type’ combines the scope and the action and tells Ocelot to attach the retrieved pet type information as an attribute to the span collected on that method. The final block (up- and down-propagation) seems rather unimpressive, however, is a very powerful means. This block defines that the pet_type information shall be propagated up and down along with the request flow. In this way this information can be attached to any span (even across service boundaries) within the corresponding trace.
Enriched Data in Lightstep
Now, let’s see how we can use the data collected with Ocelot in the Lightstep platform.
First of all, with Lightstep’s Service Diagram we see that our target application comprises four microservices: api-gateway, visits-service, customers-service and vets-service.
The endpoint overview dashboard shows the main metrics (latency, error rate and load) for each logical endpoint that Ocelot instrumented automatically.
Looking into the details of one of the traces, we see that the method we covered in the instrumentation configuration above (‘ApiGatewayController.validatePet’) is part of the trace. Moreover, the extracted pet type information has been attached as a tag to the span.
Do you remember our configuration for the propagation of the pet type? Since this information is available across the entire request flow, we are able to use that information as tags also on other span. The following screenshot shows the root span of the trace being selected.
As you can see, the pet type tag has been attached to this root span, although this information has been extracted somewhere downstream the trace structure.
Now, we can use (in my opinion) one of the best Lightstep features: Correlations. As you can see in the following screenshot, the latency significantly correlates with the values of the tag ‘pet-type‘. When selecting the corresponding correlation item, in the Latency Histogram we see that traces with pet-type ‘dog’ are significantly slower than all other traces. This feature is extremely useful when analysing problems and slow latencies.
We can now group the corresponding traces by the pet type as proposed by Lightstep when clicking on the correlation item. The resulting table is depicted in the following screenshot:
Now we can draw some conclusions from this analysis: First, scheduling a visit for a dog seems to take much longer than for any other type of a pet. Second, snake owners seem to visit a vet more often than owners with pets of other types. Of course, this example is just a demo case, but you get the idea how powerful flexible and custom data propagation is, especially if you don’t need to change a single line of your origin source code to extract, propagate and use the data.
So far, all the traces we considered originated from instrumentation points covering solely entry and exit points of the individual microservices (except the method where we extracted the pet type from). This is a valid approach for “real”, small microservices. However, in large enterprises we often see hybrid scenarios where systems comprise both, microservices as well as larger, monolithic components. In such cases you would like to see what’s happening within individual components, even if you don’t know what you would need to instrument. Ocelot comes with an optional stack trace sampling feature that allows to exploratively enrich trace granularity. The following screenshot shows a trace enriched by spans / methods that are collected with stack trace sampling (spans starting with a ‘*’).
For instance, we now see the ‘*AbstractLoadPlanBasedLoader.executeLoad’ method as a span in the tree, that is a method from the Hibernate framework. On the right side we in addition see the parent frames of the method, so you can precisely break down the execution context of the span. With this Ocelot feature and the possibilities of Lightstep you can precisely analyse problems and their root causes.
Using the open-source Java-agent inspectIT Ocelot we enhanced the distributed trace collection for Lightstep to enable advanced analysis. Without changing a single line of of code in the target Java services, we extracted business-relevant information to annotate corresponding spans and traces and increased the trace granularity using stack trace sampling. The simple examples in this article demonstrate the possibilities and the potential of combining the open, flexible and powerful data collection utility inspectIT Ocelot with the observability platforms Lightstep.
Want to try out this promising combination as well? What do you think about it?
Get in touch with us, leave a comment below or just try Ocelot and the demo scenario described above by yourself in combination with Lightstep.
Posts in this Series
- Introductory Post: Ocelot meets Friends – Enhancing Modern Observability Platforms
- Part 1 – Ocelot meets Bits – Enhanced Observability for Datadog
- Part 3 – Ocelot meets Wavefront – Enhanced Tracing with Wavefront
- Part 4 – Ocelot meets Elastic – Better Java Instrumentation for Elastic APM via Jaeger
- Part 5 – Ocelot meets SignalFX
- Part 6 – Ocelot meets Instana
- Part 7 – Ocelot meets NewRelic