Uncategorized

Text Analytics

Statistical analysis of textual data

A large part of data in public sector is in the form of text. Unlike numbers, text can’t be merged, subtracted or multiplied. At the same time, the statistical analysis of large texts is necessary for finding trends, identifying contexts, classifying documents, etc. STACC, in collaboration with its spin-off company TEXTA, offers solutions for analyzing, classifying and visualizing text.

Identifying documents containing personal data

Public sector operates with personal data on daily basis, but this data exists in very different formats (SQL databases, Word, Excel, PDF, etc.). In the context of the revised General Data Protection Regulation, organizations must have a clear overview of which documents contain personal information. Over the years, STACC has developed an anonymization solution for texts in Estonian, which identifies whether the text contains characteristics referring to a particular person (name, personal identification code, address, etc.).

Pseudonymization of data for publishing open data

Public sector is moving steadily towards increasing the use of open data. However, the precondition for using open data is anonymizing the data (no specific persons can be identifiable). Over the years, STACC has developed an anonymization solution for texts in Estonian, which identifies whether the text contains characteristics referring to a particular person (name, personal identification code, address, etc.) and pseudonymizes the found characteristics. The data processed by the tool can be published as open data.