Tasks for a data scientist

Tasks for a data scientist

You have to solve two data science tasks and send the results for evaluation.
Tools:

  • Python + Jupyter notebook + libraries on your choice
  • MS Word or Google Sheets

The results must be sent as two files:

  • Rmarkdown | Jupyter notebook file with code and comments
  • MS Word file (can be converted also from Google docs), which includes the results with explanations

Evaluation criterias:

  • The correctness of the results
  • The purity of code and comments
  • The visual side of the results, graphs

Task 1

You have the purchase data from retail store and you have to make a graph that shows the performance of different producers by thirds (third is 4 months).
1. Download the data here.
2. Change variable THIRD into factors.
3. Keep only the rows where “THIRD” equals 1 or 3.
4. Find the answer to the question: Which producer was able to increase their sales the most when you compare 1st and 3rd THIRD?
5. Create the chart on your own choice.

  • Show the top-7 producers by sales growth
  • On top of the each bar, show the value of the bar
  • Color the bars by their values
  • Name the x and y axis logically

6. Copy your results to MS Word or Google sheets and analyze the results in 100-200 words.

Task 2

You have the data about SP500 stock index and different variables that show the current business climate. Your have to create a machine learning model that is able to predict the “label” variable. The label shows whether we should currently be in the market or not. You’ll see that you don’t have labels for year 2015. Your aim is to predict these labels.
1. Download the data here and the description of variables here.
2. Try to understand what data you have. Use google, if needed.
3. Create a machine learning model (or models) to predict the label based on other variables. Try to select the model which allows you to argument about the importance of different variables and to interpret the results of the model. NB! Don’t use the value of SP500 for prediction.
4. Create a confusion matrix for your results. Copy it to the text file (your Google sheet or Word) and explain what you see.
5. In text file, explain what variables are the most important ones.
6. Predict the labels for 2015. Copy the results to the text file and explain in 150-200 words whether we should have been in the market or not.

When finished, send the results to kristjan.eljand@www.stacc.ee