Unit 4 Stretch: Regression ML
Background
The clean air act of 1970 was the beginning of the end for the use of asbestos in home building. By 1976, the U.S. Environmental Protection Agency (EPA) was given authority to restrict the use of asbestos in paint. Homes built during and before this period are known to have materials with asbestos You can read more about this ban.
The state of Colorado has a large portion of their residential dwelling data that is missing the year built and they would like you to build a predictive model that can classify if a house is built pre 1980.
Colorado gave you home sales data for the city of Denver from 2013 on which to train your model. They said all the column names should be descriptive enough for your modeling and that they would like you to use the latest machine learning methods.
Client Request
The Client is a state agency in Colorado that is responsible for the health and safety of its residents. They have a large portion of their residential dwelling data that is missing the year built and they would like you to build a predictive model that can classify if a house is built pre 1980.
Data
URL: dwellings_ml.csv (ml ready)
Optional URL: dwellings_neighborhoods_ml.csv (ml ready)
Informational URL: dwellings_denver.csv (not cleansed)
Information: Data description
Readings
- P4DS: CH22 Joins
- All regressor algorithms in scikit-learn (skim)
- How to choose a good evaluation metric for your Machine learning model (you can start in section 11 - Evaluating for regression problems)
Optional References
Questions
Repeat the classification model using 3 different algorithms. Display their Feature Importance, and Classification Report. Explian the differences between the models and which one you would recommend to the Client.
Join the
dwellings_neighborhoods_ml.csv
data to thedwelling_ml.csv
on theparcel
column to create a new dataset. Duplicate the code for the model you recommended in the stretch question above and update it to use this data. Explain the differences and if this changes the model you recomend to the Client.Can you build a model that predicts the year a house was built? Note this is a regression ML model, not a classifier. Report appropriate evaluation metrics for the model. Explain the model and the evaluation metrics you used to determine if the model is good.
Submission / Deliverables:
No template is provided for this assignment. You must create your own file as part of the task. Answer the questions in this assignment. Each answer should include a written description of your results, code cells with comments, charts and/or tables.
Your instructor will advise you, or it will be evident in Canvas, whether to submit an rendered .html file, or a link to the rendered file on GitHub on gh-pages. (Do not submit the URL to the GitHub .qmd
file)
Here are some reminder instructions if you are using GitHub:
When you have completed the report and are ready to submit it, you will need to render the project into HTML files and publish it to GitHub pages. Follow these steps:
- Have this assignment’s template/quarto file open in VS Code and nothing else
- Click
Preview Button
in VS Code in the top right of the screen- This will render the project but also entire course work portfolio into
HTML
files for review - Confirm everything displas as you would like it to
- How you see it will be how it is viewed for grading
- If there is an error in any cell of the quarto files, the rendering will stop and you will need to fix the error before rendering again (if you get stuck post your error in Slack)
- This will render the project but also entire course work portfolio into
- Once the report is confirmed close the preview and open the
GitHub Desktop
application - Confirm you are in the correct repository in the top left corner of the screen
- Confirm you are on the correct branch
Main
in the top left corner of the screen (Never change off theMain
branch) - Type a summary of the changes in the
Summary
box - Click
Commit to main
blue button in the bottom left corner - Click
Push origin
blue button in the middle right of the screen- This will push all your changes in the project .qmd file to GitHub
- The publish.yml file will kick off an automated process to render the project into HTML files
- The HTML files will be published to GitHub pages in the gh-pages branch
- The URL to the published project will be in the deployment section in GitHub
- In
GitHub Desktop
clickOpen in GitHub
to navigete to the repository - Click on the
Actions
tab and make sure there were no errors in the rendering process - Click on the
deployment
section of the main page of the repository to find the URL - Navigate to the URL and confirm it displays as you intended
- Copy the URL and submit it in Canvas
- In