Are you looking for an internship or thesis topics? Great!
In order to apply, send your resume (or link to your github account) and your solution for at least one of the following puzzles.
1. The Tile Challenge
Data can be messy! The ability to organize pieces of information to get useful insights is essential to a data scientist.
In this challenge, your task is to reconstruct the correct image from the messy data provided in the link below. The solution reveals the habitat of the best data scientists in Estonia 😉
Download the image in the link below, use your favorite programming language, and good luck!
Image link: puzzle.png
Send us your final image and the code you created to solve it. (image in png format, code in text format)
2. Text analysis with Estnltk toolkit
Your task is to download the article from http://www.sirp.ee/s1-artiklid/c21-teadus/kvantilm/ and answer the following questions:
- What is the number of unique words and lemmas in the text?
- What are the most frequently mentioned person names?
- What is the distribution of parts of speech in the text?
To answer these questions, you will need to write a Python script which loads a html page, extracts an article body and does the necessary text analysis.
Resources:
- Estnltk — Open source tools for Estonian natural language processing: https://pypi.python.org/pypi/estnltk/1.4
- Estnltk documentation: http://estnltk.github.io/estnltk/1.4/