Unit 5 Stretch Task: Can you validate me?
Canvas: U5: Stretch Task — Can You Validate Me?
Type: Stretch Task (1 pt, complete/incomplete)
Copilot: Allowed for syntax lookup; disallow for answer generation.
Background
Survey data is notoriously difficult to munge. Even when the data is recorded cleanly the options for ‘write in questions’, ‘choose from multiple answers’, ‘pick all that are right’, and ‘multiple choice questions’ makes storing the data in a tidy format difficult.
In 2014, FiveThirtyEight surveyed over 1000 people to write the article titled, America’s Favorite ‘Star Wars’ Movies (And Least Favorite Characters). They have provided the data on GitHub. We have done some initial cleaning and added ficitious email addresses to the dataset.
For this project, your client would like to use the Star Wars survey data in a new marketing initiative. They would like to email survey participants with product offers, but also use the responses to build a machine learning model to predict a person’s outcome based on a few responses about Star Wars movies.
Most of the data cleaning has been done for you, but there is still some data preparation that needs to take place.
Client Request
The Client is who performed the survey but outsourced the analitics to a 3rd party (you). They want you to clean up the data so you can: a. Extract the names of respondents and their email info a. Validate the data provided on GitHub lines up with the article by recreating 2 of the visuals from the article a. Predict if a person from the survey makes at least $50k
Data
URL: Star Wars data
Information: Article
Readings
This is a chance to use skills learned in this unit and practice skills learned in previous units.
Review pivoting and unpivoting.
-Python Polars: The Definitive Guide, Chapter 15 Reshaping (especially pivoting and unpivoting)
Optional Alternative Reading
Polars Cookbook, Chapter 8 Reshaping and Tidying Data
Stretch Questions
Skills: pivot / unpivot, chart replication, visual QA
- Validate that the data provided lines up with the article by recreating 2 of the visuals from the article. These visuals should be similar, including titles, subtitles, gridlines, order of categories, colors, etc. But, they don’t have to be exact. They need to be close enough that we can validate that the values in the dataset match the graphs in the chart. Though their charts were built using a different plotting software, the more you push yourself for an exact replica, the more you will learn. Spend at least a couple of hours on this.
Submission / Deliverables:
Rather than submitting something for each task of this unit, you will submit one client report at the end of the unit that is the culmination of all the tasks. Use this template to create your Client Report. Answer the questions. Each answer should include a written description of your results, code cells with comments, charts and/or tables. xn
Your instructor will advise you — or it will be evident in Canvas — whether to submit a rendered .html file, or a link to the rendered file on GitHub Pages (gh-pages). Do not submit the URL to the GitHub .qmd file.
When you have completed the report and are ready to submit, render the project into HTML and publish it to GitHub Pages. Follow these steps:
- Have this assignment’s template/quarto file open in VS Code and nothing else
- Click the
Previewbutton in VS Code (top right of the screen)- This renders the project so you can review it
- Confirm everything displays as you would like it to
- How you see it is how it is viewed for grading
- If there is an error in any cell, the rendering stops and you will need to fix the error before rendering again (if you get stuck post your error in Slack)
- Once the report is confirmed, close the preview and open the
GitHub Desktopapplication - Confirm you are in the correct repository (top left corner)
- Confirm you are on the
Mainbranch (top left corner — never change offMain) - Type a summary of the changes in the
Summarybox - Click
Commit to main(blue button, bottom left) - Click
Push origin(blue button, middle right)- This pushes your changes to GitHub
- The
publish.ymlworkflow renders the project into HTML files - The HTML files are published to the
gh-pagesbranch - The URL of the published project is in the deployment section on GitHub
- In
GitHub Desktop, clickOpen in GitHubto navigate to the repository - Click the
Actionstab and confirm there were no errors in rendering - Open the
deploymentsection on the main repo page to find the URL - Navigate to the URL and confirm it displays as you intended
- Copy the URL and submit it in Canvas
- In