Time is money.
This phrase, used by Benjamin Franklin, often also holds in the context of business process management. With the feature I implemented for our Camunda cockpit plugin, we try to give you a tool to get a deeper insight into the underlying structure of your data related to time, i.e. durations, start and end times.
Now, what do I mean by underlying structure?
The feature addresses three main points:
- Estimate the probability distribution of the durations of your processes and activities with a histogram. This gives you an answer to:
- What durations are most likely for my processes and activities?
- What does the underlying probability distribution look like?
- Is there a strong variance from the mean?
- Estimation of a linear trend in the time series of the durations with a scatter plot and linear regression. This gives answer to:
- Is it true that my processes and activities tend to take longer over the course of the last years?
- Did the newly introduced action really help to shorten the durations?
- Finding clusters of start and end times of processes and activities. This gives you an answer to:
- When do I need most of my capacities to be ready?
- When are the busiest times of the week? Of a day?
- At what time of day does this specific activity/that specific process mostly start/end?
Now let’s make a small data analysis together! Here is what the Analytics tab looks like when you start with your analysis. No data is chosen yet and no plot is shown.
First I choose some data to plot. Therefore I designed a new menu that we will explore now. Choosing the data is done in the “data”- tab of the menu. Here all known process definitions can be selected.
But I am not only interested in processes as a whole. Since I also want to plot some activities, I click on one of the processes to expand a list of all activity types specific to that process. Clicking on one of the types shows me all activities belonging to that type. I select all the activities I am interested in.
Specify a time window
Now, one of the new features comes into play. In many cases only data coming from a certain time window is of interest for the analysis. This time window can be set in the “time-window”- tab of the menu.
I choose the “started before” option and a date picker slides down. Only instances started before the chosen date will be considered in the plot. The same could be done for the end date of the instances to further narrow down the selection.
Choose the property which should be plotted
The actual plotting is done in the “properties to be plotted”-tab. Here you can choose what kind of plot and what kind of information should be displayed.
Let us first have a look at the distribution of the durations.
The resulting histogram shows that the probability to get a duration contained in the first bin is very high for each of the chosen processes and activities. Afterwards there is a small but more or less constant probability for longer durations. When showing a plot, the menu is automatically replaced by a legend displaying the processes and activities that have been chosen and in what color they are shown in the plot.
Let’s now take a look at the time series of the durations. Maybe I can detect a trend to see if my instances recently tend to take longer or shorter and to make a prediction about the future development of the durations.
The time series is displayed in a scatter plot. A clear trend towards shorter durations can be detected for the chosen processes and activities in the time window I chose earlier. In less clear cases a linear regression can help to make a hidden trend more obvious.
In this example we can see that the linear regression is strongly influenced by a few outliers. This would also be a good starting point for a future feature that detects outliers. The linear regression is of course only able to detect linear trends, which again leaves space for new features. Let’s now focus on something different. I want to know at which time of the day most of my instances start, so that I can make sure I have all the capacities needed ready at the right time.
Therefore I choose the start/end time option.
Here I can see that most of my instances start between 10:00 am and 2:00 pm. I also want to know at what days of the week most of my instances start. I can therefore use the time frame option and set it to weekly. This will produce the below plot. But why do the dots have different sizes?
In this plot the k-means algorithm is used to cluster the data. This should make it possible to detect clusters and give a better overview. Bigger clusters, i.e. clusters that contain more instances are drawn with a greater radius than clusters with fewer instances. Hovering over the clusters will show how many instances are contained in one cluster.
A disadvantage of the algorithm though, is that it needs the number of clusters as an input, i.e. before running the algorithm and clustering the data the algorithm must know into how many clusters it should cluster the data. Choosing an inadequate number of clusters might produce a misleading result and not represent the structure of the data.
Here the default number of clusters is chosen by a simple rule of thumb: k = sqrt(n/2), where n is the number of observations. Since giving the algorithm a “good” number of clusters is essential for a good result, the user has the chance to change the number of clusters and play with the results.
Something is not clear yet?
Clicking on the question mark in each of the plot tabs will expand the explanation for the plot.
If the help of one the plots is not clear either, your are highly encouraged to leave a comment below or use our feedback tab. We spent a lot of effort trying to make the use of the plots understandable and we are glad for every feedback we can get on that issue.
We already mentioned a few new features that we are thinking to implement. Here are some more:
Since I have a mathematical background, one of the topics I am really interested in is finding a better way to determine a “good” number of clusters for the k-means algorithm. Another big step would be to give you the opportunity to save plot settings and have your favorite plots on your dashboard as mentioned by Eric, so each morning you can see what’s going on in your data. We are really looking forward to get your feedback on the plots that are available now. How useful are they? Is the explanation clear enough? Do you have any ideas how to improve them?
And last but not least, what do you think about our ideas about what we should implement next?
More features of our Camunda cockpit plugin are introduced by Kerstin in the next part.