Unit 5 Core Task 3: Star Wars for Dummies

Published

October 29, 2025

Note

Canvas: U5: Core Task 3 — Star Wars for Dummies
Type: Core Task (1 pt, complete/incomplete)
Copilot: Allowed for syntax lookup; disallow for answer generation.

Background

Survey data is notoriously difficult to munge. Even when the data is recorded cleanly the options for ‘write in questions’, ‘choose from multiple answers’, ‘pick all that are right’, and ‘multiple choice questions’ makes storing the data in a tidy format difficult.

In 2014, FiveThirtyEight surveyed over 1000 people to write the article titled, America’s Favorite ‘Star Wars’ Movies (And Least Favorite Characters). They have provided the data on GitHub. We have done some initial data cleaning and added ficitious email addresses to the dataset.

For this project, your client would like to use the Star Wars survey data in a new marketing initiative. They would like to email survey participants with product offers, but also use the responses to build a machine learning model to predict a person’s outcome based on a few responses about Star Wars movies.

Most of the data cleaning has been done for you, but there is still some data preparation that needs to take place.

Client Request

The Client is who performed the survey but outsourced the analitics to a 3rd party (you). They want you to clean up the data so you can:

  • Extract the names of respondents and their email info
  • Validate the data provided on GitHub lines up with the article by recreating 2 of the visuals from the article
  • Predict if a person from the survey makes at least $50k

Data

URL: Star Wars data

Information: Article

Readings

Core Questions

Skills: one-hot encoding, target creation, full ML pipeline

  1. Finish prepping the data for machine learning:

    1. Drop the RespondentID column, first and last name columns, and email column
    2. One-hot encode all remaining non-numeric columns
    3. Create your target (also known as “y” or “label”) column to indicate whether a person had a Household Income $50,000 or greater
  2. Build a machine learning model that predicts whether a person had a Household Income $50,000 or greater with at least 55% accuracy. Describe your model and report the accuracy.

Submission / Deliverables:

Rather than submitting something for each task of this unit, you will submit one client report at the end of the unit that is the culmination of all the tasks. Use this template to create your Client Report. Answer the questions. Each answer should include a written description of your results, code cells with comments, charts and/or tables.

Your instructor will advise you — or it will be evident in Canvas — whether to submit a rendered .html file, or a link to the rendered file on GitHub Pages (gh-pages). Do not submit the URL to the GitHub .qmd file.

When you have completed the report and are ready to submit, render the project into HTML and publish it to GitHub Pages. Follow these steps:

  1. Have this assignment’s template/quarto file open in VS Code and nothing else
  2. Click the Preview button in VS Code (top right of the screen)
    1. This renders the project so you can review it
    2. Confirm everything displays as you would like it to
    3. How you see it is how it is viewed for grading
    4. If there is an error in any cell, the rendering stops and you will need to fix the error before rendering again (if you get stuck post your error in Slack)
  3. Once the report is confirmed, close the preview and open the GitHub Desktop application
  4. Confirm you are in the correct repository (top left corner)
  5. Confirm you are on the Main branch (top left corner — never change off Main)
  6. Type a summary of the changes in the Summary box
  7. Click Commit to main (blue button, bottom left)
  8. Click Push origin (blue button, middle right)
    1. This pushes your changes to GitHub
    2. The publish.yml workflow renders the project into HTML files
    3. The HTML files are published to the gh-pages branch
    4. The URL of the published project is in the deployment section on GitHub
      1. In GitHub Desktop, click Open in GitHub to navigate to the repository
      2. Click the Actions tab and confirm there were no errors in rendering
      3. Open the deployment section on the main repo page to find the URL
      4. Navigate to the URL and confirm it displays as you intended
      5. Copy the URL and submit it in Canvas
Back to top