Unit 3 Task 4: Longevity and Highest Paid Positions
Background
When you hear the word “relationship” what is the first thing that comes to mind? Probably not baseball. But a relationship is simply a way to describe how two or more objects are connected. There are many relationships in baseball such as those between teams and managers, players and salaries, even stadiums and concession prices.
The graphs on Data Visualizations from Best Tickets show many other relationships that exist in baseball.
Client Request
You will need to connect to the database to pull the data in from the correct tables, but all the filtering, calculating, joining, etc. must be done with Python polars.
Data
This project will use the Lahman Baseball Database. In order to complete this project, you will need to download the database and save it inside the DS250 folder (hopefully you have a folder dedicated to this class) next to the .qmd file you will be using for this assignment. Note: Right click the ‘Download’ link and select “Save Link As” to download the data to your computer.
Download: lahmansbaseballdb
Information: Lahman Data Dictionary
Setup Instructions: See SQL Setup
Readings
Complete these readings before we cover the material in class. This will help you retain the material and make the class period more engaging - not less.
In other words, if an assignment is due on Wednesday, we will cover the material needed to complete the assignment on Tuesday. Therefore, you should complete the readings on Monday (or anytime before class on Tuesday).
The table(s) you choose to solve the problem with may lead you down different code paths and learning new skills. In order not to bias/lead you down one path or the other, no reading links are provided, but you may need to research and learn new Polars commands.
Question 1
Identify the top 10 players with the longest Major League Baseball careers. Include their:
- playerID
- first_name
- last_name
- career_length
Question 2
- Write an SQL query that pulls the tables you need into a polars dataframe. Then, use python polars to create a summary table showing the average salary for each position (e.g., pitcher, catcher, outfielder). Position information can be found in the fielding table in the POS column.
Include the following columns:
- position
- average_salary
- total_players
- highest_salary
The highest_salary column should display the highest salary ever earned by a player in that position.
Additionally, add a new column to the summary table you just created to label the average salary:
- If the average salary is above $3 million, categorize it as “High Salary.”
- If the average salary is between $2 million and $3 million, categorize it as “Medium Salary.”
- Otherwise, categorize it as “Low Salary.”
Order the table by average salary in descending order.
Hint: Beware, it is common for a player to play multiple positions in a single year. For this analysis, each player’s salary should only be counted toward one position in a given year. You get to determine which position to count it towards.
Submission:
When you have completed the report, you will need to follow this process to submit your work:
- Have this assignment’s template/quarto file open in VS Code and nothing else
- Click
Preview Buttonin VS Code in the top right of the screen- This will render the project but also entire course work portfolio into
HTMLfiles for review - Confirm everything displas as you would like it to
- How you see it will be how it is viewed for grading
- If there is an error in any cell of the quarto files, the rendering will stop and you will need to fix the error before rendering again (if you get stuck post your error in Slack)
- This will render the project but also entire course work portfolio into
- Once the report is confirmed close the preview and open a VS Code
Terminal- click
Terminalin the top menu bar and thenNew Terminal
- click
- Type the following in the terminal
quarto render- Then drag and drop this assignment into the terminal. This will add the file path to the terminal command
- Press Enter
- This will render the project into a
HTMLfile in the same location as the.qmdfile - To locate the file in VS Code
- Right click on the file in the file explorer and select
Reveal in File Explorer(Win) orReveal in Finder(Mac)
- Right click on the file in the file explorer and select
- Upload
HTMLfile into Canvas
Deliverables:
No template is provided. Create a new quarto file for this assignment. It is part of stretching yourself. But don’t forget to include a 1 paragraph write-up / summary description of your results!
Your report should be written in quarto markdown files and rendered to an HTML File. Upload the HTML file in Canvas. (Do not submit the .qmd file)