19. March 2020
timer-icon 6 min

Ocelot meets Bits - Enhanced Observability for Datadog

In this blog post you will learn how to get the full potential out of Datadog in Java-based environments by combining advanced data collection with Datadog's sophisticated features.

This article is the first part of the blog series “Ocelot meets friends”. The introductory blog post explains in a generic way, how advanced data collection may enhance the features provided by modern observability platforms. Let’s see how it relates to Datadog in particular!

About Datadog

Datadog is a powerful data platform that covers the three pillars of observability by processing and correlating metrics, traces and log data. Through its open and extensible architecture Datadog covers a huge set of target technologies, frameworks and environments to be monitored. Moreover, Datadog comes with a consisten tagging concept that allows very flexible and powerful correlation, filtering and grouping of data. Combining Datadog’s concepts with advanced data collection using inspectIT Ocelot makes the features of Datadog even more powerful and opens new perspectives and possibilities.

Demo Setup and Goals

In this article I will show how to enhance data collection for Datadog. Therefore, I will use the running example introduced in the initial blog post. We recommend to have a short look into the introduction of the running example. To recap, we will aim for the following aspects:

  1. Automate Data Collection
  2. Flexible Customization of Data Collection
  3. Increasing Trace Transparency
  4. Powerful Context Propagation
  5. Flexible Collection of Business Data

… and our goals are:

  1. G-1: We want to count the number of successfully scheduled  visits at the Visits Service grouped by the pet type (cat, dog, etc.)
  2. G-2: We want to understand what is happening inside our services.
  3. G-3: We want to annotate the root span of the collected trace with the pet type tag to be able to filter traces by the type of the pet.

To achieve all of this, we will integrate inspectIT Ocelot (short: Ocelot) with Datadog using the following monitoring setup (created using openapm.io):

You can easily play with this setup for yourself! Just checkout this demo scenario and follow the corresponding instructions in the readme.

Configuring Ocelot

To achieve the above-mentioned goals, we need to enhance data collection. Ocelot provides flexible means to adapt instrumentation through configuration (even at runtime). Instrumentation and configuration is managed as YAML-based representation in Ocelot. Agents may retrieve the configuration through local files or dynamically by connecting to a central configuration server. In this example we are using a configuration server.

So, let’s roll up sleeves and see what we need to configure to achieve the above mentioned goals. For the sake of clarity we will illustrate what is needed to achieve Goal G-1. The rest works accordingly and can be looked up in the corresponding GitHub repository. We have to define a metric counting the visits grouped by the type of the pet. This metric will be exposed in Prometheus exposition format so that the Datadog Agent can scrape it and report to the Datadog platform. The following configuration snippet defines such a metric:


To be able to differentiate by pet type, we need to extract the pet-type information and attach it to the execution context to propagate it to the point when the metric is recorded.


So, first we define where to extract the pet type by defining the scope s_gateway_controller_validate_pet. The pet type is extracted in the API-Gateway service within the validatePet method. Using the custom action a_extract_pet_type we retrieve the pet type from the first method parameter. Finally, the instrumentation rule r_extract_pet_type applies our action on the defined scope and attaches the pet-type information to the corresponding trace as well as to the propagation context. Now, we need to tell Ocelot that the pet-type attribute in the propagation context should be propagated across JVM boundaries, since we want to use that information in the visits-service where the visits metric is recorded. Therefore, we use GLOBAL propagation strategy for up and down propagation:


Finally, we instrument the visit creation method in the visits-service to report to the visits metric using the propagated pet_type as a tag.


Save the config and wait just a few second and you can see the results in your Datadog account.

Enhanced Observability in Datadog

First of all, all the normal Datadog functionality can also be used with the traces reported by Ocelot. For instance, you can view the service map of our target application Pet Clinic, with all its services:

Now, as we have enhanced our data collection with Ocelot, let’s see how we can benefit from it in Datadog. First of all, we now have metric that shows the number of visits of customers at the vets, grouped by the pet type. We can now build corresponding dashboards, for example using a top 10 panel in Datadog:VisitsOfPetsTop10

Extracted data (such as the pet type in our example) can be extracted anywhere and used in any other span or metric. The following screenshot shows the propagation of the pet type from the api-gateway (Span: ApiGatewayController.validatePet) down the call to the down-stream service visits-service (Span: VisitsResource.create) where it is attached as a tag to the span:

Custom data propagation is extremely powerful, as it allows to annotate metrics and traces with business information that may be derived from any place in the system.

Using Ocelot’s (optional) Stack Trace Sampling capability we get deeper insights into the internals of a service. The following trace is enriched by additional spans from stack trace sampling. Individual spans even show the parent frames they are recorded in.


And note, all of these we achieved without touching the code of the target application!

Conclusion

In my opinion the result is awesome! Because, we used a commercial observability platform in combination with an open source observability tool to create additional value. In the past I never thought this could be possible because usually commercial where often rather closed. Closed in the meaning of, how to add data (metrics and traces) and how to extend the solution. Datadog is different, it is open. Open, because among other things they rely on open standards, such as OpenCensus (in future OpenTelemetry) and provide different possibilities to ingest data. In this way the flexibility of open source utilities can be combined with the powerful features of the observability platform to achieve even more value out of monitoring.

With this setup it is not only possible to get technical metrics and traces but also business metrics such as a the pet type in out example. Let’s consider another example: a connected car system. Can you image how powerful and helpful it would be to track response times, load, and any other metrics and traces while being able to filter and group all of them by the car model, the customer category, the car location, etc. ? You get the idea, there are endless examples for this.

Of course, nearly all of these things can also be achieved using the Datadog SDK, i.e. defining and collecting business metrics, enriching traces, data propagation, etc. However, using the Datadog SDK implies code changes and complicated manual implementations and extensions, especially when it comes to custom data propagation or stack trace sampling. Ocelot just relieves you from the effort and burden of implementing all of this manually.

In the end, it is not a decision whether commercial or open source is the right choice, but which standards are used, so that more possibilities in monitoring arise.

Posts in this Series

Coming Soon:

  • Part 5 – Ocelot meets SignalFX
  • Part 6 – Ocelot meets Instana
  • Part 7 – Ocelot meets NewRelic

Comment article