23. May 2019
5 min

Emoji analysis for a feelgood index

Emojis are a widely used in chattools, even in professional use. Also emojis are quite simple to analyze in terms of a intended connotation. Now combine these two facts and create a simple analyzing tool to check if your company has a positive communication culture or if your new idea is accepted by your coworkers.

The Novatec SMILE Program is the employee satisfaction program of Novatec. The goal is to realize satisfaction improvements for the employees well being. Each employee is encouraged to submit new ideas. Then the SMILE team reviews these ideas with the corresponding experts. Every suggestion that passes this step can be voted and commented by any employee. If enough employees voted for an idea this idea will be implemented or proposed to the management. With our following project we could analyze the positive or negative impact on such an idea based on emojis in text messages.

Novatec uses a chat tool for the internal communication heavily by every employee. Therefore it is a good approach to use the data given by this tool to evaluate the happiness of the employees. But there is one restriction. We could not fetch messages from direct messages and from private groups, but from public channels. These public channels contribute with about 10% of all written messages and 40% of all read messages. However what is private should stay so and we as responsible Data Scientists respect that. We now would like to evaluate the happiness of the employees. To detect the happiness we have analyzed the usage of emojis within all public conversations on a per day basis. Furthermore we had a look at the distribution of positive and negative connotation of the emojis. In contrary to emojis natural language can sometimes hard to be interpreted by artificial intelligence and it requires a lot of work in fine-tuning the algorithms. Emojis allow us to gather non verbal feedback on a standardized method and meaning. The meaning of an emoji can also depend on the context but mostly their meanings are unambiguous.

There is an API from which we can receive all public messages from all public chats. We transfer these messages to an Elasticsearch instance on a regular basis. With this Data we now can run some analysis to get an overview. We used Python and Pandas to handle the analysis.
To detect emojis we created a python-module which can detect emojis in the messages, as these emojis where not stored in Unicode. Furthermore we developed a conversion to display these emojis in Unicode. With this emoji module we can insert thousands of messages and just receive the included emojis. We now can work on these emojis.

We first had a look at the general usage of the emojis. Until the end of February 2019 we had about 92.000messages in public channels. 17.5 percent of these messages contain emojis. We also analyzed the distribution of the used emojis. All the messages contain about 800 unique emojis but just some of them are in frequent use. As seen in the image below just about 40 emojis were used in a frequent manner. The majority of the emojis were just used a few times. We also deduce that only some emojis were in a very frequent use.

occurrence of all used smileys

occurrence of all used smileys

occurrence of emojis with at least 81 occurrences (39 emojis)

occurrence of emojis with at least 81 occurrences (39 emojis)

For example the emoji 🙂 or in Common Locale Data Repository(CLDR) slightly smiling face is the most used emoji with about 4100 occurrences and used in about 25% the cases. The second most used emoji is 😄 or in CLDR grinning face with smiling eyes with about 2000 occurrences.  As we see the two most used emojis are positive annotated. This leads us to the next analysis.

For the next step we annotated the 39 most used smileys with a connotation value from negative to neutral up to positive (about 90% of all used emojis). Using this annotations we created the distribution of the used emojis and their connotation. Therefore we extracted the emojis from all messages and used our connotation table to annotate each message with an annotation. Then we created a histogram over the amount of positive and negative messages. In general the usage of positive emojis is much higher than the usage of negative emojis, as we see in the following image.

number of positive and negative messages across the months

number of positive and negative messages across the months

We are using absolute numbers in this graph. Therefore we can see in April and May 2018 a rise in the usage of the new chat tool that was introduced into the company by this time. There is also a big drop in the usage of positive emojis at the end of 2018. This can be fully explained with a massive drop of public messages during the Christmas and New Year holidays. We have a few channels that are mainly used by bots for example to inform the users about a status of something. Sometimes these bots use emojis on a regular basis that we annotated as negative.  As our bots do not have holidays they continue posting negative annotated messages on holidays.

In most cases the assignment of an emoji is easy to one connotation is an quite easy task. But sometimes even a emoji that has a clear connotation can be used in a very different manner. We have one example for this. In our chat tool we have an extra bot for praising a colleague for things like bringing a cake to the bureau or good work. This bot will then drop a praise-bomb to praise the coworker. This message is accompanied by a bomb emoji . One will annotate the bomb as a negative emoji if he would not have had the context of use. So did we. However this bomb is not mentioned to be in a negative context but in a very positive. Therefore analyzing emojis is not as easy as it seems to be and unwritten rules of communication have to be included in an analysis.

Interpreting these results is as we seen above very complicated and even for a human a nearly impossible task. Seeing this results can lead to several conclusions. Firstly the employees communicate friendly and positive with each other. Second, if there are bad things to discuss these topics were not discussed in public channels but face to face or in private chats/calls. Concluding this, there is a positive mood in our company.

We now want to detect changes in the communication. Therefore we had a look at the five most used emojis every day. The emoji is not only the overall most used emoji but also the nearly solely winner of the emoji of the day competition. If we take a look at the days before Christmas (19.12-22.12 2018) we see a wide use of Christmas-themed emojis. For example ? was on the second place on several days. If now we detect an unexpected rise of a (negative) emoji we (or our feelgood manager) can deduce a serious threat to the companies’ feelgood.

In public channels we have a very good mood. But as public channels only contribute 10% of all written messages there is a large amount of  messages we could not and also do not want to analyze. But by having this vast amount of positively annotated messages in public messages we can deduce the overall happiness in Novatec.

We had a short overview of what can be done to measure the feelgood in a company. Our approach can be improved to watch channels individually which can give a clearer look on the mood of the individual teams.
Also we did not separate between the types of messages. Our chat tool supports normal messages, but also direct answers to a message. Also reactions to a message in form of an emoji is supported. We could use this reactions to create a bad information detector. This detector could identify messages that have mainly negative reactions.