Can opentracing scene benefit from new tracer implementations?
Version 1.7.11 of the open-source APM tool inspectIT introduced the support for remote tracing based on HTTP and JMS communication. inspectIT based the tracing functionality on the opentracing.io approach, fully implementing the opentracing.io Java API as part of it’s java-agent-sdk project. This way it became number 8 on the official opentracing.io supported tracers list. As new tracer implementations are joining the list that has been led by Zipkin for a long time, the question that arises is if the open-source tracing scene can benefit from new tracers. Thus, in this blog post I will try to answer this question by providing a short comparison between inspectIT and Zipkin.
If we stick to the Java universe, let’s take a look on how can you integrate the tracers into your application. Zipkin has a great support for SpringBoot applications, so if you have such application it’s simply enough to add a specific dependency and you are ready to go. If this is not the case then it’s more complicated as you need to add proper Brave filters and interceptors to get automatic tracing for your HTTP endpoints. In any case, you’ll need to change your code base in some way, either by altering your dependencies or going deeper into the application’s configuration.
inspectIT on the other hand uses the Java agent approach and byte code modification to add measurement and tracing points into your code. It’s enough to start your Java app with:
and all the frameworks supported by inspectIT will be instrumented. It provides a certain benefit as you don’t need to change the source code, thus you can for example analyse applications that are not yours or easily start your app without inspectIT.
When you start your application with any of the tracers you will start opening the black box(es) and gain some insights about the way your application performes. Both Zipkin and inspectIT provide traces that show what’s the complete time of each trace, how much time was spend on each node you are tracing, who is communicating with who and that’s pretty cool. However, inspectIT comes with one small benefit here and that’s the set of default instrumentation profiles that, in addition, instrument other important parts of your application. For example, alongside your tracing data inspectIT will also collect all SQL statements executed during the request execution, thus providing more detailed trace information out-of-the-box.
Difference between trace details in Zipkin and inspectIT when calling same use-case with default configurations
However, soon you will discover that with no additional hacking the data provided by both tools is not enough for meaningful performance diagnosis. Hence, sooner or later you would like to start adding additional measurement points, in order to make your traces more detailed. Sticking to the opentracing.io API you can do this easily by creating spans inside of your source code and explicitly declaring parts of the application to be additionally traced. This usually looks like:
// get the tracer implementation (tool specific)
// build new span with some optional custom tags for example
// optionally: attach some baggage to the span which is propagated with next remote call
// then do some actual work
// finish span at the end so it's reported
The idea of opentracing.io movement that developers should know what should be traced in same way as they know what should be logged is really great. Still, if you already have a huge existing code base it can be quite a hustle to define all the spans you would like to have. inspectIT can solve this problem in no time, because it’s offering the UI-based configuration interface that allows you to quickly bound measurement points to any Java method, thus you will get the duration of that method executions together with your tracing data. In addition, inspectIT provides the dynamic instrumentation feature which enables user to add and remove instrumentation points without a need for restarting the JVM, similar to the hot-code deployment we are all used to.
Until now, we have seen that both tools have some advantages, but what’s with the way they present the data to users. Zipkin has a web-based user interface which nowadays seems much more acceptable by users than the fat, Eclipse-based UI client inspectIT is running. The impression is that the Zipkin web-based UI offers more filtering possibilities for searching the traces than inspectIT at the current state, while inspectIT uses the fat client features to present the traces in a more “fancy” way by showing more icons, having a styled details box and offering multiple navigation options. One additional benefit of Zipkin here is that it provides an overview on the node dependencies, thus you can clearly see how the system that’s being traced looks like.
Zipkin dependencies view showing the system layout
Sampling rate (!)
If you are running an application with a high load, you will start wondering how much of an overhead will the tracer(s) bring to my production system. Zipkin has a great feature here – sampling rate. With sampling rate you can define that not all user requests are traced, but only a portion (for example 1%) . That can significantly decrease the overhead introduced by the tracing tool. The sampling rate approach is based on the Google Dapper paper that concludes that if a problem exists in a system with high-throughput than the same problem will surface multiple times and would be part of one of the captured traces.
New Dapper users often wonder if low sampling probabilities – often as low as 0.01% for high-traffic services – will interfere with their analyses. Our experience at Google leads us to believe that, for high-throughput services, aggressive sampling does not hinder most important analyses. If a notable execution pattern surfaces once in such systems, it will surface thousands of times.
inspectIT at the moment does not provide the sampling rate feature, it collects all the traces and is thus more suitable for services with lower volume.