Atlassian's Confluence is ubiquitous when it comes to documenting and elaborating all kinds of information. As such, it is a central component of day-to-day operations in many organizations.
However, in many cases this essential system is not monitored and may crash unexpectedly or perform poorly. You can prevent such disturbances by setting up proper monitoring to identify potential problems and gain insight into your Confluence usage.
These days, there are many different commercial and open source solutions available for the purpose of monitoring. At the same time, in many enterprises, central systems are lacking any kind of monitoring. Although Atlassian’s Confluence software is found in virtually every company, it is often operated completely blind in terms of performance, availability and use of the system. The reasons are manifold. Some companies have no application monitoring at all in use. Others consider an internal system like Confluence not as mission critical, though it would block any internal processes when it is not available. Others again have a monitoring solution in use, but one that is not able to monitor Confluence. By the way, many agent-based solutions, both commercial and open-source, technically fail to monitor Confluence.
In this blog post, we will demonstrate how easily you can setup a monitoring solution for Confluence based on open source tools. With the right monitoring you will be able to easily and automatically identify performance and availability issues, analyse root causes of problems and get insights into the usage behaviour of your Confluence system.
In order to reach this goal, we would like to gather data which gives information about the technical performance and status of our Confluence system – like response times of requests, load metrics as well as utilization of our servers and processes. Of course we would like to have all the collected data displayed on good looking dashboards. Besides the technical view, we would also like to receive some business information about the usage of the system, for example which pages or spaces are accessed the most or how many users are logging in into Confluence and how many logins are failing. As a side effect, this can be seen as an increase in security, since attacks such as brute force attacks could be recognized.
Let’s see what you need for this…
The Tools You Need
The beauty of open source monitoring solutions is its flexibility to tailor your solutions to your needs by integrating different open source components. You can read about this in one of our previous blog posts.
In our example, we will use the following components:
inspectIT Ocelot – This is a Java agent and it is used to collect behaviour and performance data from the Confluence application. For this purpose it is started together with Confluence and integrates itself into its JVM. It automatically injects “sensors” at certain points, which then collect the actual data – e.g. which pages are accessed, what is the flow through the system, response times, errors, etc.
Prometheus – This can be seen as a kind of database in the broadest sense. Prometheus is responsible for gathering and persisting the metrics (response times, throughput rates, …) collected by the agents.
Jaeger – Detailed tracing data (e.g. what exactly happens within an HTTP request) collected by the agents is sent to Jaeger. This data is then stored and can be analyzed, which is very helpful in problem situations.
Grafana – This is more or less the standard when it comes to visualizing data. With Grafana you can easily create beautiful dashboards and visualize data in a clear way.
It is also important to note that the choice of tools is only one possible solution. In the open source area there are a lot of tools allowing you to achieve a similar result. If you would like more information about existing open source monitoring tools, the OpenAPM website would be the right place for you, as it lists a lot of them and shows the possibilities of integrating components.
Connecting everything together
In order to obtain the monitoring data from Confluence, we have to install all necessary tools. All these tools are available as a package that can be simply downloaded, unpacked and run without having to install anything. So, this should be straight forward. Of course, the individual components still have to be configured accordingly, but we won’t explain this in detail here, as there are enough tutorials for this on the Internet:
And that’s it. From now on, the agent monitors and collects data from Confluence.
Note: Many commercial and open-source Java Agents have trouble with instrumenting Confluence without additional special measures. Using other Java agents can cause problems because Confluence is based on OSGi and they may have problems with its special class loading strategy (in this case, sometimes an additional configuration may help). The inspectIT Ocelot Agent covers this use case out-of-the-box, without any need of additional workarounds.
What do we get out of it?
Now let’s take a look at the data we collect. When we integrate the inspectIT Ocelot agent with its standard configuration, we already get useful information about the state of the used servers and about the JVMs of the Confluence application. For example, in the following screenshot a Grafana dashboard is shown which displays information about CPU (1) and memory (2) usage as well as running threads (3) and garbage collections (4) statistics of the involved JVMs.
Another useful feature provided by inspectIT Ocelot out-of-the-box is the ability to generate interactive service dependency graphs based on HTTP calls. In the graph we can see that Confluence brings another service alongside itself, namely “Synchrony”, which is responsible for the simultaneous editing of pages. In order to get a complete overview, we have added the inspectIT Ocelot agent to this process as well.
The dashboards regarding the system and JVM statistics we have seen so far and the Service Dependency Graphs are all standard dashboards provided and maintained by the inspectIT team. You can either install them via the Grafana marketplace or download them directly from the inspectIT Oelot’s GitHub repository. The following dashboards have been created and customized especially for monitoring Confluence.
Now, we’ ll look at metrics that are not collected out-of-the-box, but are manually configured and provide a business view. We configured the inspectIT Ocelot agent to derive metrics based on the application events sent by Confluence. The following screenshot shows a dashboard providing statistics about the pages visited by users (1) including their response times (2), as well as a top 10 list of the most visited pages in the selected time period (4). Furthermore a usage statistic by spaces is displayed (3).
Another quite interesting metric, which is derived from events, is a statistic of logins and logouts. These are categorized according to the authentication method used (remember-me cookie, LDAP or via login form) or the reason for a logout (user logout or session timeout).
Now we have already seen many metrics that give us deeper insights into the state, performance and usage of the Confluence system. These metrics can also be used as a basis for alarms to be notified about problems or deviations from the normal behavior of the system.
Of course, the metrics shown here are only a few examples of many. The inspectIT Ocelot Agent is very powerful when it comes to configuring metric collections. Basically, almost any technical or business metric can be pulled from the system as long as you configure the agents properly using the right settings. Some more examples of metrics that can be collected:
the number of active database connections,
the usage statistics of installed plugins,
or how many attachments and files exist in Confluence
However, if a problem does occur, it would also be helpful to have not only abstract metrics, but the ability to look deeper into the system to identify the cause of a problem.
Achieving this is a piece of cake! The inspectIT Ocelot Java Agent supports distributed tracing and, thus, can follow the flow of individual requests and operations down to the called methods – even beyond JVM and system borders. Should a problem occur, you can use the traces to find out the exact cause or location of the problem.
The screenshot above shows the user interface of Jaeger, which is used for saving and analyzing traces in our example. Here we can see a trace of a request to show a specific Confluence page. Furthermore, we can see that when collecting data, we also add context information to the corresponding traces, such as the page that is requested. This makes troubleshooting much easier. In the trace you can also clearly see the interaction between Confluence and Synchrony.
You have seen how easy it can be nowadays to monitor an application like Confluence with open source tools and what is possible. You get technical performance and availability monitoring, deep dive root cause analysis capabilities and business and usage insights.
Actually, there are no reasons not to monitor your Confluence system. 🙂
Using open source components often implies some configuration effort, however, you gain a great piece of flexibility. In the end you have a flexible system which you can customize and extend according to your needs – basically without any restrictions! Isn’t that awesome?
What do you use to monitor your Confluence system – or don’t you monitor it at all?
Do you want to discuss similar scenarios, or do you have any questions regarding this or similar topics?