The Estonian Gene Bank

The Estonian Gene Bank of the University of Tartu (often also called the UT Estonian Gene Bank or the Estonian Gene Bank) has been a research and development institution of the University of Tartu since 2007.

The gene bank manages the collection of health, family tree and genetic data of the Estonian people. As of 2020, it contains data on the entire ~ 200,000 people (gene donors), promotes genetic research and applies the results of genetic research to improve public health. In 1999, the Estonian Gene Center Foundation was established, from which the Estonian gene bank grew. The gene bank works under the Institute of Genomics of the University of Tartu from 2018.

What did we solve?

The existing electronic health documents about the Estonian population contain a lot of information that is invaluable from the research point of view. Using this information is difficult – the data are presented in free text, with typos, in several repetitions, the results of similar analyzes are presented in different units of measurement, etc. Therefore, it is very difficult to use these data for research without previous processing.

How did we solve it?

We used various machine learning technologies to extract and structure information from electronic health records concerning Estonian residents. We did it by bringing the data to an interpretable and high-quality form.

What were the benefits?

The data have been transferred to a uniform and high-quality form suitable for research. Thanks to this, researchers can effectively use the health data of the Estonian population.

“The quality of our research has increased due to the work performed by STACC. The availability of high-quality health data and the structuring of free-text information have enabled us to participate in important international studies analyzing the links between genes and diseases or between genes and clinical indicators. Among other things, we led a recently published study that found an association between gene variants and penicillin allergy, a trait we identified through the structuring of gene donor histories by STACC.”

Team, Estonian Gene Bank