#natural language #predictive analytics

We created a solution to help to remove derogatory and inappropriate comments. Õhtuleht is the largest daily newspaper in Estonia, which has also a significant web portal. Õhtuleht brings the latest news to its readers honestly and directly seven days a week. It’s entertainment that people appreciate. Õhtuleht Kirjastus belongs equally to the groups Ekspress Grupp and Alexela Grupp.

The newspaper Õhtuleht began publication in October 1944, when it was essential to provide daily consumer information to the inhabitants of Tallinn, who had been devastated by the war after the departure of German troops. Õhtuleht was published as Tallinn’s city newspaper until March 1997 and then made a turn towards the tabloid, starting to offer more sensational and more individual buyer-oriented news, entertainment and consumer information.

What did we solve?

The commentary on Õhtuleht’s web format contained a number of discriminatory, derogatory or otherwise inappropriate comments. The inappropriate content could have led to accusations or even fines for Õhtuleht’s possession of unethical content. We created a solution that helped filter out such comments, thus reducing the time spent managing and cleaning the comments of Õhtuleht editors.

How did we solve it?

We built a web API (application programming interface) that allows you to identify comments with unwanted content using a machine learning model, rules, and language resources. For the creation of the machine learning model, Õhtuleht’s articles and data was used. Additionally, we created various unwanted content lexicons for example for violence, threats and racism. The input texts were processed using natural language processing best practices before being fed to the model. Our logistic regression model identified 6 different types of unwanted content.

What were the benefits?

Õhtuleht’s commentary became cleaner in sense of inappropriate language and the risk of being charged with inappropriate content was reduced. In addition, managing the content of the commentary became easier and faster and there was an opportunity to redeploy work resources.

“The beginning of the cooperation between Õhtuleht and STACC took place during a 2-hour random train ride from Tallinn to Tartu, which shows that they can understand quite complex and specific systems and on the other hand explain how doing something with them adds value to the service buyer. After that conversation everything was just the question of formalization.

The renewal of the moderation system with STACC took place at a good pace and every mid-term meeting was well prepared from their side. Nevertheless, they didn’t hang on to their original ideas but were always up to manage with some unexpected solutions. All the work was completed on time as promised. After completing the tasks, it was always possible to offer opportunities for developments that would help to improve the system even further.”

Martin Šmutov, managing editor