Project 5: Recoding Range Variables

Published

April 8, 2025

Background

Survey data is notoriously difficult to munge. Even when the data is recorded cleanly the options for ‘write in questions’, ‘choose from multiple answers’, ‘pick all that are right’, and ‘multiple choice questions’ makes storing the data in a tidy format difficult.

In 2014, FiveThirtyEight surveyed over 1000 people to write the article titled, America’s Favorite ‘Star Wars’ Movies (And Least Favorite Characters). They have provided the data on GitHub.

For this project, your client would like to use the Star Wars survey data.

Client Request

The Client is who performed the survey but outsourced the analitics to a 3rd party. They want you to clean up the data so you can ultimately run a machine learning model.For this task, complete the items in the Questions section below.

Data

URL: StarWars.csv
Information: Article

Readings

No new readings. Review previous readings as needed:

Questions

  1. Clean and format the data so that it can be used in a machine learning model. As you format the data, you should complete each item listed below. In your final report provide an excerpt of the reformatted data with a short description of the changes made.

    1. Create a new column that converts the age ranges to a single number. Drop the age range categorical column
    2. Create a new column that converts the education groupings to a single number. Drop the school categorical column
    3. Create a new column that converts the income ranges to a single number. Drop the income range categorical column
    4. Create your target (also known as “y” or “label”) column to indicate whether a person had a Household Income $50,000 or greater
    5. Encode favorability ratings as a number. Remove the favorability categorical columns.
    6. One-hot encode all remaining categorical columns

Submission / Deliverables:

Use this unit5_task2_template to create your Client Report. Answer the questions. Each answer should include a written description of your results, code cells with comments, charts and/or tables.

Your instructor will advise you, or it will be evident in Canvas, whether to submit an rendered .html file, or a link to the rendered file on GitHub on gh-pages. (Do not submit the URL to the GitHub .qmd file)

Here are some reminder instructions if you are using GitHub:

When you have completed the report and are ready to submit it, you will need to render the project into HTML files and publish it to GitHub pages. Follow these steps:

  1. Have this assignment’s template/quarto file open in VS Code and nothing else
  2. Click Preview Button in VS Code in the top right of the screen
    1. This will render the project but also entire course work portfolio into HTML files for review
    2. Confirm everything displas as you would like it to
    3. How you see it will be how it is viewed for grading
    4. If there is an error in any cell of the quarto files, the rendering will stop and you will need to fix the error before rendering again (if you get stuck post your error in Slack)
  3. Once the report is confirmed close the preview and open the GitHub Desktop application
  4. Confirm you are in the correct repository in the top left corner of the screen
  5. Confirm you are on the correct branch Main in the top left corner of the screen (Never change off the Main branch)
  6. Type a summary of the changes in the Summary box
  7. Click Commit to main blue button in the bottom left corner
  8. Click Push origin blue button in the middle right of the screen
    1. This will push all your changes in the project .qmd file to GitHub
    2. The publish.yml file will kick off an automated process to render the project into HTML files
    3. The HTML files will be published to GitHub pages in the gh-pages branch
    4. The URL to the published project will be in the deployment section in GitHub
      1. In GitHub Desktop click Open in GitHub to navigete to the repository
      2. Click on the Actions tab and make sure there were no errors in the rendering process
      3. Click on the deployment section of the main page of the repository to find the URL
      4. Navigate to the URL and confirm it displays as you intended
      5. Copy the URL and submit it in Canvas
Back to top