Intelligent Process Automation (IPA) brings together well-established practices used in digital transformation, such as Business Process Management (BPM) and Robotic Process Automation (RPA), and applies Artificial Intelligence (AI) to further extend and improve task automation, but also to optimize the entire processes.
In this blog post, we show how even a simple process can profit from the usage of AI.
In Novatec, each month employees follow a specific process to submit travel expenses to the Invoice Department. The process includes scanning documents, entering the data into a system, email exchange, and even printing and physical exchange of documents. On the one hand, it is a vital process for the company as it enables the Invoice Department to record and reimburse expenses, charge clients, etc. On the other hand, this semi-automated process costs both employees and the Invoice department a lot of time.
Towards the first step of automation, the complete lifecycle of the process can be handled by a Process Engine (e.g., Camunda). This offers a centralized point of reference for orchestrating the different parties involved in the process. In the following image, we show the semi-automated and somewhat simplified process for cost submission. After costs occur (payment of a service, completion of a business trip, etc.), the employees start a costs submission process. They need to insert all the relevant cost data and upload the related documents. These data are then sent to the Invoice Department in the form of an email, and the process completes.
The insertion of the cost-related data is the most time-consuming task for the employees; therefore, we will automate it. In simple terms, we need a service that extracts the required data from the invoice documents (e.g., name, amount, date, type of service, etc.), provides them to the Invoice Department, and thereby saves the employees’ time. The transformation of the manual process step to an automated/intelligent one is shown in the figure below. In the intelligent process, the employee only needs to upload the related documents, and then the process will proceed autonomously by extracting the appropriate data and sending them to the Invoice Department.
Project Architecture and Implementation Details
The picture below shows the architecture of the project (code available here). The business process implemented in BPMN 2.0 Standard, is deployed on a Camunda distribution. The process communicates with an Intelligent Document Analysis (IDA) service to extract the required data from an invoice. In turn, the IDA service requests a machine learning model to apply the data extraction. In our case, this is Amazon Comprehend.
- When the process starts, the user needs to complete the user task “Upload Invoice” and upload a PDF file with the invoice.
- Then, the process moves to the step of “Intelligent Document Analysis”. This step is completed by an external task, which is implemented in Python. The Intelligent Document Analysis service will analyze the invoice, i.e., the PDF file, and convert it to raw text. This text is then passed to the Amazon Comprehend service, which performs entity recognition.
- Finally, the process will send an e-mail with the extracted data to the Invoice Department.
Intelligent Document Analysis
Intelligent Document Analysis (IDA) is the utilization of Natural Language Processing (NLP) and Machine Learning to gain insights from unstructured data (e.g., text documents, e-mails, mails, images, etc.). For this project, we apply an IDA technique called Named Entity Recognition (NER). The goal of NER is to locate and classify unstructured text into pre-defined categories (entities) such as person, names, organizations, locations, etc. Here we map invoice data to specific, pre-defined labels (e.g., first name, last name, date, costs, etc.).
Before applying NER, we need to convert scanned invoices to text files. This can be done in many ways. Our first approach was to use Amazon Textract. The usage of the service was straightforward, and the conversion resulted in no loss of information. The service supports asynchronous API calls and the parallel analysis of a batch of files. We observed that the response times for an Amazon Textract (round-trip) were approx. 30 sec. As we only needed to analyze one file at a time, we aimed for better response times and implemented the conversion locally, with the Python library Py2PDF. Also, in this case, we did not have information loss in the conversion.
The next step is to apply NER to the resulting text. Again, there are many options to implement this, for example, the SPaCy or the NLTK python libraries. We have decided on the pre-trained models of Amazon Comprehend to detect default entities in invoice files. By default, Amazon Comprehend identifies entities (COMMERCIAL_ITEM, DATE, EVENT, LOCATION, ORGANIZATION, OTHER, PERSON, QUANTITY, TITLE) and returns a confidence score, as well as positional information of the detected entity. We adapted these to our use case, i.e., Deutsche Bahn tickets, and the data model we have (FIRSTNAME, LASTNAME, COSTS, TICKET_TYPE, DATE) using a set of rules to extract the required data.
Amazon Comprehend can be custom trained to recognize additional entities. However, the training requires 10 documents per custom label. As our data set was limited for the development of this demo, we did not proceed with the custom training. This remains an interesting option for future work. Unlike Textract, Comprehend supports synchronous calls, allowing for short response times (approximately 1s for the complete round trip and pdf to text conversion). However, it is worth to mention, that in the case of an asynchronous call with one PDF file as input, the Comprehend Service needed approximately 6 minutes to respond. Although not tested in detail, we suspect that this time would remain stable, also for a larger batch of files.
Business Process Implementation and Camunda Architecture
In the following, we discuss decisions that we met during the implementation of the process and its deployment.
The Intelligent Document Analysis service is implemented in Python as an external task/worker. The external task is a service task that provides a unit of work in a list that can be polled by workers. The polling logic means that no information is pushed to Camunda, but the worker asks (polls) for the information it needs, by making requests to Camunda. As soon as the worker is ready, it sends the message to Camunda, that it completed the task. Then Camunda resumes the execution of the process to the next step. For this task, we used the camunda-bpm-mail plugin.
Camunda can be flexibly combined with any architecture, therefore each use case should be examined individually. For this project we have tried out two state-of-art Camunda distributions, each of which has very diverse qualities to offer:
Camunda Platform Run (launched March 2020) is a pre-packaged Camunda distribution based on the Camunda Spring Boot Starter. It provides a simple but powerful configuration mechanism that everyone can operate, regardless of their knowledge about Java or application server configuration. Camunda Platform Run was considered optimal for this use case we needed to orchestrate the distributed services (Intelligent Document Analysis, Amazon Comprehend, and Camunda Connectors).
Camunda Micronaut is being developed by Novatec Consulting GmbH with the support of the open-source community and launched in April 2020 its first release. It uses Micronaut, a JVM-Based, full-stack framework for building modular, easily testable microservice and serverless applications. As Micronaut is very lightweight, the Camunda Micronaut distribution has a much faster start-up time and less memory consumption in comparison to the Spring-Boot distribution (on which Camunda Platform Run bases). Camunda Micronaut distribution will be our future platform of choice.
Intelligent Process Automation is an umbrella term that describes the utilization of Artificial Intelligence and Robotic Automation technologies to leverage the automation of business processes. In this blog post, we have orchestrated an Intelligent Document Analysis service with Amazon Comprehend to leverage our internal cost-submission process automation. For the implementation of the business process, we tried two state-of-art Camunda distributions (Camunda Platform Run and Micronaut Camunda).
Although this project realized a simple use case, it still shed light on many important aspects such as architectural decisions, performance issues, and limitations on training machine learning models. There are numerous ways to enrich this showcase: a Chatbot-based user interface, more complex logic, or Robotic Process Automation solutions. The data model for documents should be extended with additional entities and further machine learning models should be trained, support more invoice types. The complete project, can also now be hosted on the recently launched Camunda Cloud.
Learn more about our Intelligent Process Automation offering here.