NBA Players

A large dataset of NBA players that spans decades. It has some useful columns and a slew of weird columns of unknown purpose.
MATH221
sports
Author

MATH 221

Published

May 1, 2024

Data details

There are 24,691 rows and 61 columns. The data source1 is used to create our data that is stored in our pins table. You can access this pin from a connection to posit.byui.edu using hathawayj/nba_players.

This data is available to all.

Variable description

  • player A unique key that identifies each player
  • year Unknown
  • player2 Player Name
  • traded (Stay, traded)
  • select (Hide, View)
  • pos Position on team
  • height Height (cm)
  • weight Weight (kg)
  • college College
  • born Birth year (YYYY)
  • birth_city City of birth
  • birth_state State of birth
  • age age (years)
  • tm Unknown
  • g Unknown
  • gs Unknown
  • mp Unknown
  • per Unknown
  • ts_ Unknown
  • x3par Unknown
  • ftr Unknown
  • orb_ Unknown
  • drb_ Unknown
  • trb_ Unknown
  • ast_ Unknown
  • stl Unknown
  • blk_ Unknown
  • tov_ Unknown
  • usg_ Unknown
  • blanl Empty Column
  • ows Unknown
  • dws Unknown
  • ws Unknown
  • ws_48 Unknown
  • blank2 Empty column
  • obpm Unknown
  • dbpm Unknown
  • bpm Unknown
  • vorp Unknown
  • fg Unknown
  • fga Unknown
  • fg_ Unknown
  • x3p Unknown
  • x3pa Unknown
  • x3p_ Unknown
  • x2p Unknown
  • x2pa Unknown
  • x2p_ Unknown
  • efg_ Unknown
  • ft Unknown
  • fta Unknown
  • ft_ Unknown
  • orb Unknown
  • drb Unknown
  • trb Unknown
  • ast Unknown
  • stl Unknown
  • blk Unknown
  • tov Unknown
  • pf Unknown
  • pts Unknown

Variable summary

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
player 0 1.00 12345.00 7127.82 0.00 6172.50 12345.00 18517.50 24690.00 ▇▇▇▇▇
year 67 1.00 1992.59 17.43 1950.00 1981.00 1996.00 2007.00 2017.00 ▂▂▅▆▇
age 75 1.00 26.66 3.84 18.00 24.00 26.00 29.00 44.00 ▃▇▅▁▁
g 67 1.00 50.84 26.50 1.00 27.00 58.00 75.00 88.00 ▅▃▃▅▇
gs 6458 0.74 23.59 28.63 0.00 0.00 8.00 45.00 83.00 ▇▂▁▁▂
mp 553 0.98 1209.72 941.15 0.00 340.00 1053.00 1971.00 3882.00 ▇▅▃▃▁
per 590 0.98 12.48 6.04 -90.60 9.80 12.70 15.60 129.10 ▁▁▇▁▁
ts_ 153 0.99 0.49 0.09 0.00 0.46 0.51 0.54 1.14 ▁▂▇▁▁
x3par 5852 0.76 0.16 0.19 0.00 0.00 0.06 0.29 1.00 ▇▂▂▁▁
ftr 166 0.99 0.33 0.22 0.00 0.21 0.30 0.40 6.00 ▇▁▁▁▁
orb_ 3899 0.84 6.18 4.87 0.00 2.60 5.40 9.00 100.00 ▇▁▁▁▁
drb_ 3899 0.84 13.71 6.64 0.00 8.80 12.70 18.10 100.00 ▇▂▁▁▁
trb_ 3120 0.87 9.95 5.04 0.00 5.90 9.20 13.50 100.00 ▇▁▁▁▁
ast_ 2136 0.91 13.01 9.19 0.00 6.50 10.50 17.60 100.00 ▇▂▁▁▁
stl_ 3899 0.84 1.65 1.02 0.00 1.10 1.50 2.10 24.20 ▇▁▁▁▁
blk_ 3899 0.84 1.41 1.77 0.00 0.30 0.90 1.90 77.80 ▇▁▁▁▁
tov_ 5109 0.79 15.09 6.92 0.00 11.40 14.20 17.70 100.00 ▇▁▁▁▁
usg_ 5051 0.80 18.91 5.45 0.00 15.40 18.60 22.20 100.00 ▇▅▁▁▁
ows 106 1.00 1.26 2.14 -5.10 -0.10 0.40 1.90 18.30 ▁▇▁▁▁
dws 106 1.00 1.23 1.27 -1.00 0.20 0.80 1.80 16.00 ▇▂▁▁▁
ws 106 1.00 2.49 3.06 -2.80 0.20 1.40 3.80 25.40 ▇▃▁▁▁
ws_48 590 0.98 0.07 0.10 -2.52 0.03 0.07 0.12 2.12 ▁▁▇▁▁
obpm 3894 0.84 -1.78 3.79 -73.80 -3.40 -1.50 0.30 47.80 ▁▁▇▆▁
dbpm 3894 0.84 -0.55 2.25 -30.40 -1.70 -0.50 0.70 46.80 ▁▇▃▁▁
bpm 3894 0.84 -2.33 4.69 -86.70 -4.20 -1.80 0.30 36.20 ▁▁▁▇▁
vorp 3894 0.84 0.56 1.34 -2.60 -0.20 0.00 0.90 12.40 ▇▃▁▁▁
fg 67 1.00 195.33 188.11 0.00 41.00 141.00 299.00 1597.00 ▇▂▁▁▁
fga 67 1.00 430.65 397.62 0.00 99.00 321.00 661.00 3159.00 ▇▂▁▁▁
fg_ 166 0.99 0.43 0.10 0.00 0.39 0.44 0.48 1.00 ▁▃▇▁▁
x3p 5764 0.77 22.22 38.54 0.00 0.00 2.00 27.00 402.00 ▇▁▁▁▁
x3pa 5764 0.77 63.60 102.44 0.00 1.00 11.00 84.00 886.00 ▇▁▁▁▁
x3p_ 9275 0.62 0.25 0.18 0.00 0.10 0.29 0.36 1.00 ▅▇▂▁▁
x2p 67 1.00 178.25 179.48 0.00 35.00 122.00 268.00 1597.00 ▇▂▁▁▁
x2pa 67 1.00 381.76 371.26 0.00 82.00 270.00 579.25 3159.00 ▇▂▁▁▁
x2p_ 195 0.99 0.45 0.10 0.00 0.41 0.46 0.50 1.00 ▁▂▇▁▁
efg_ 166 0.99 0.45 0.10 0.00 0.41 0.46 0.50 1.50 ▁▇▁▁▁
ft 67 1.00 102.39 113.37 0.00 18.00 63.00 149.00 840.00 ▇▂▁▁▁
fta 67 1.00 136.78 146.08 0.00 27.00 88.00 201.00 1363.00 ▇▁▁▁▁
ft_ 925 0.96 0.72 0.14 0.00 0.66 0.74 0.81 1.00 ▁▁▂▇▃
orb 3894 0.84 62.19 67.32 0.00 12.00 38.00 91.00 587.00 ▇▂▁▁▁
drb 3894 0.84 147.20 145.92 0.00 33.00 106.00 212.00 1111.00 ▇▂▁▁▁
trb 379 0.98 224.64 228.19 0.00 51.00 159.00 322.00 2149.00 ▇▁▁▁▁
ast 67 1.00 114.85 135.86 0.00 19.00 68.00 160.00 1164.00 ▇▁▁▁▁
stl 3894 0.84 39.90 38.71 0.00 9.00 29.00 60.00 301.00 ▇▂▁▁▁
blk 3894 0.84 24.47 36.94 0.00 3.00 11.00 29.00 456.00 ▇▁▁▁▁
tov 5046 0.80 73.94 67.71 0.00 18.00 55.00 112.00 464.00 ▇▃▁▁▁
pf 67 1.00 116.34 84.79 0.00 39.00 109.00 182.00 386.00 ▇▆▅▂▁
pts 67 1.00 510.12 492.92 0.00 106.00 364.00 778.00 4029.00 ▇▂▁▁▁

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
player2 67 1 5 24 0 3921 0
traded 0 1 4 6 0 2 0
select 0 1 4 4 0 2 0
pos 67 1 1 5 0 23 0
height 0 1 3 4 0 29 0
weight 0 1 2 4 0 77 0
college 0 1 1 58 0 424 0
born 0 1 4 4 0 85 0
birth_city 0 1 1 25 0 1266 0
birth_state 0 1 1 32 0 130 0
tm 67 1 3 3 0 69 0
Explore generating code using R
library(tidyverse)
library(pins)
library(connectapi)

nba_players <- read_csv('https://github.com/byuistats/data/raw/master/NBA_players/NBA_players.csv')


# Publish the data to the server with Bro. Hathaway as the owner.
board <- board_connect()
pin_write(board, nba_players, type = "parquet", access_type = "all")

pin_name <- "nba_players"
meta <- pin_meta(board, paste0("hathawayj/", pin_name))
client <- connect()
my_app <- content_item(client, meta$local$content_id)
set_vanity_url(my_app, paste0("data/", pin_name))

Access data

This data is available to all.

Direct Download: nba_players.parquet

R and Python Download:

URL Connections:

For public data, any user can connect and read the data using pins::board_connect_url() in R.

library(pins)
url_data <- "https://posit.byui.edu/data/nba_players/"
board_url <- board_connect_url(c("dat" = url_data))
dat <- pin_read(board_url, "dat")

Use this custom function in Python to have the data in a Pandas DataFrame.

import pandas as pd
import requests
from io import BytesIO

def read_url_pin(name):
  url = "https://posit.byui.edu/data/" + name + "/" + name + ".parquet"
  response = requests.get(url)
  if response.status_code == 200:
    parquet_content = BytesIO(response.content)
    pandas_dataframe = pd.read_parquet(parquet_content)
    return pandas_dataframe
  else:
    print(f"Failed to retrieve data. Status code: {response.status_code}")
    return None

# Example usage:
pandas_df = read_url_pin("nba_players")

Authenticated Connection:

Our connect server is https://posit.byui.edu which you assign to your CONNECT_SERVER environment variable. You must create an API key and store it in your environment under CONNECT_API_KEY.

Read more about environment variables and the pins package to understand how these environment variables are stored and accessed in R and Python with pins.

library(pins)
board <- board_connect(auth = "auto")
dat <- pin_read(board, "hathawayj/nba_players")
import os
from pins import board_rsconnect
from dotenv import load_dotenv
load_dotenv()
API_KEY = os.getenv('CONNECT_API_KEY')
SERVER = os.getenv('CONNECT_SERVER')

board = board_rsconnect(server_url=SERVER, api_key=API_KEY)
dat = board.pin_read("hathawayj/nba_players")

Footnotes

  1. Unknown↩︎