Modern observability and monitoring platforms like Datadog, Instana, LightStep, etc. provide great features to gain valuable insights from collected data. In this blog post series we will consider the following observability, monitoring and APM solutions. Using concrete examples, we will demonstrate how to improve observability for those platforms with automated and flexible data collection using inspectIT Ocelot.
These platforms provide different, sophisticated means to analyze, visualize, automate and act on observability data such as traces, metrics and logs. Yet, data analysis can only be as powerful as the quality of the collected data allows. Though most of these tools provide different means of collecting data through APIs, SDKs and Agents, in the context of Java systems there is room for improvement in the data collection. Automatic, flexible and powerful data collection opens up new potential of data analysis and, thus, strengthens the possibilities of the observability platforms even more.
Potential of improving observability
Depending on the concrete observability tool, there are different aspects that provide potential for improvement in the observability and data collection.
1. Automate Data Collection
Many tools follow an “instrumentation as code” approach, which means that source code needs to be enriched manually at development time using SDKs in order retrieve monitoring data, such as traces and metrics. This is an absolutely valid approach and, actually, should be the preferred way in most DevOps-based software projects. However, sometimes this approach is just not feasible due to third party software components or due to the development practice within an organisation. From our consulting experience in different enterprises form different branches we often see the requirement that instrumentation in Java-based and .NET systems must be automated without the need to touch the actual source code. For this purpose, most of the mentioned tools provide Java agents that allow to automate basic instrumentation.
2. Flexible Customization of Data Collection
All of the Java agents provided by the listed tools above come with a broad set of supported frameworks. So, as long as you use common frameworks and technologies you will get a basic visibility into your services. This typically covers entry and exit points of your services. In many cases it’s enough to monitor the entry and exit points of the services, especially when the services are small (as microservices should be). However, what if you would like to include some important, custom methods into your traces, or generate custom metrics from some internals of your services? With the means provided by the above mentioned observability tools you would need to use an SDK and adapt your source code manually. You see, we are back to our first aspect.
3. Increasing Trace Transparency
Sometimes you would like to enrich your traces to get better transparency what is happening within your services (not only at the borders), however, often you just do not know what you explicitly need to instrument for this. Explorative Stack Trace Sampling is the answer, as it allows to enrich traces without the need to explicitly instrument methods. In many cases this feature would provide very valuable insights, however, it is not supported by common data collection utilities.
4. Powerful Context Propagation
Distributed tracing is based on the concept of context propagation, which means that additional information is propagated with the request flow. This allows to correlate different actions along a request flow. In most cases propagation is limited just to the tracing context (span and trace ids). There is no way to extend the propagation context by custom tags and business context, except for using an SDK and doing it manually. However, context propagation is an extremely powerful feature, as you will see in the example below. For most of the mentioned tools, propagating custom context would involve a lot of manual effort and code adaption.
5. Flexible Collection of Business Data
With their generic way of handling metrics, observability platforms are not only great means to provide insights into the performance behaviour of a system but also can be used to cover business monitoring aspects. You could ingest any type of business metric into the observability platform and then use the generic dashboarding, calculation, alerting and other utilities as you would do with any performance metric. And again, ingesting custom (business) metrics in an automated, flexible way, without touching the source code, is not possible with the existing data collection utilities provided by the mentioned observability tools.
In this blog post series we will use a running example to illustrate the enhanced data collection in combination with the different observability platforms. For simplicity reasons we will take the microservice version of Spring’s Pet Clinic blueprint application. The Pet Clinic manages pets and their owners, as well as visits to the vet. The Pet Clinic consists, among others, of the services API Gateway, Visits Service and Customers Service, that are relevant for our running example.
We will consider the following scenario in all the blog posts in this series: Whenever a pet owner schedules a new visit at a vet for one of his pets, the request hits the API Gateway. Internally, the API Gateway calls the Customers Service to retrieve pet information (pet type, birth date, etc.) for pet validation. Finally the API Gateway calls the Visits Service (with the pet ID) to create a visit for the corresponding pet.
Our goal is the following:
- We want to count the number of successfully scheduled visits at the Visits Service grouped by the pet type (cat, dog, etc.)
- We want to understand what is happening inside our services.
- We want to annotate the root span of the collected trace with the pet type tag to be able to filter traces by the type of the pet.
Achieving our goals in an automated way addresses all the aspects described above. As the pet information is processed in an internal method of the API Gateway, and we want to extract the pet type there, we need to instrument a custom method, and want to see it in the resulting trace (see Aspect 2.). The requirement to understand the internal behaviour of services addresses Aspect 3. Moreover, as the Visits Service does not have any detail information about the pets (apart from the pet ID), we need to propagate (Aspect 4.) the pet type information from the API Gateway to the Visits service to be able to derive a corresponding metric when visits are created there. And, the metric we derive is a business metric (Aspect 5.). None of the data collection utilities provided by the above-mentioned observability vendors allow to do all of this in an automated way, without the need to change source code.
Let’s see how we can achieve this with inspectIT Ocelot!
Automated, Flexible and Open Data Collection in Java
Combining the openness and flexibility of open standards (such as OpenCensus and OpenTelemetry) with the possibilities of automated Java agents is the key to address the above mentioned aspects for observability improvement. This is exactly the aim of inspectIT Ocelot (short: Ocelot). Ocelot is an open-source, flexible, automated and powerful agent that allows to collect metrics and distributed traces in Java-based systems. Moreover, as Ocelot is based on OpenCensus (OpenTelemetry respectively) it can integrate with any tool that supports those standards or provides an open API for data submission.
The following features make Ocelot a good choice to address the above mentioned observability challenges: Ocelot …
- … instruments Java applications dynamically at runtime
- … comes with a meaningful default instrumentation supporting common frameworks and technologies
- … provides a powerful configuration language that allows to flexibly extend instrumentation (even at runtime) to derive additional metrics and enrich traces
- … provides sophisticated means for custom up- and down-propagation of context
- … integrates with any metrics or tracing backend that supports OpenCensus / OpenTelemetry
- … supports explorative stack trace sampling (currently: experimental feature)
This approach – a combination of commercial and established observability tools with open source tools – means that the question of whether to use a closed or an open source tool no longer arises! Actually, a combination of both can be a very good solution or even the solution in today’s monitoring landscapes!
Articles in this Blog Post Series
In the following blog posts we will provide concrete scenarios and examples of how we enhanced observability for the individual observability platforms:
- Part 1 – Ocelot meets Bits – Enhanced Observability for Datadog
- Part 2 – Ocelot meets Lightstep – Enhanced Tracing with Lightstep
- Part 3 – Ocelot meets Wavefront – Enhanced Tracing with Wavefront
- Part 4 – Ocelot meets Elastic – Better Java Instrumentation for Elastic APM via Jaeger
- Part 5 – Ocelot meets SignalFX
- Part 6 – Ocelot meets Instana
- Part 7 – Ocelot meets NewRelic