2015 Census

Documentation on this dataset is scarce, so tread lightly. Dataset contains summary statistics from a 2015 census in the U.S. It is grouped by county.
MATH221
population
Author

MATH 221

Published

May 2, 2024

Data details

There are 3,220 rows and 37 columns. The data source1 is used to create our data that is stored in our pins table. You can access this pin from a connection to posit.byui.edu using hathawayj/census_2015.

This data is available to all.

Variable description

  • censusid: Unique ID for each county
  • state: State that county is in
  • county: County name
  • totalpop: Total population of county
  • men: Total population of men
  • women: Total population of women
  • hispanic: Percent of population that is marked as Hispanic
  • white: Percent of population that is marked as White
  • black: Percent of population that is marked as Black
  • native: Percent of population that is marked as Native
  • asian: Percent of population that is marked as Asian
  • pacific: Percent of population that is marked as Pacific
  • citizen: Unknown, could refer to total population of citizens
  • income: Unknown, appears to be a summary statistic of some kind involving income
  • incomeerr: Unknown
  • incomepercap: Unknown, appears to relate to income per capita
  • incomepercaperr: Unknown
  • poverty: Unknown, possibly a percent of people in poverty
  • childpoverty: Unknown, possibly a percent of children in poverty
  • professional: Percent of people working in the category “professional”
  • service: Percent of people working in the category “service”
  • office: Percent of people working in the category “office”
  • construction: Percent of people working in the category “construction”
  • production: Percent of people working in the category “production”
  • drive: Percent of people who fit into the category “drive”
  • carpool: Percent of people who fit into the category “carpool”
  • transit: Percent of people who fit into the category “transit”
  • walk: Percent of people who fit into the category “walk”
  • othertransp: Percent of people who fit into the category “other transport”
  • workathome: Percent of people who fit into the category “work at home”
  • meancommute: Most likely the mean duration of commutes (min)
  • employed: Total people employed
  • privatework: Percent of people who fit into the category “private work”
  • publicwork: Percent of people who fit into the category “public work”
  • selfemployed: Percent of people who fit into the category “self employed”
  • familywork: Percent of people who fit into the category “family work”
  • unemployment: Unemployment percentage (not part of the percentages in the previous 4 columns)

Variable summary

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
censusid 0 1 31393.61 16292.08 1001.0 19032.50 30024.00 46105.50 72153.0 ▅▇▆▆▁
totalpop 0 1 99409.35 319305.45 85.0 11218.00 26035.00 66430.50 10038388.0 ▇▁▁▁▁
men 0 1 48896.94 156681.28 42.0 5637.25 12932.00 32992.75 4945351.0 ▇▁▁▁▁
women 0 1 50512.41 162661.95 43.0 5572.00 13057.00 33487.50 5093037.0 ▇▁▁▁▁
hispanic 0 1 11.01 19.24 0.0 1.90 3.90 9.83 99.9 ▇▁▁▁▁
white 0 1 75.43 22.93 0.0 64.10 84.10 93.20 99.8 ▁▁▂▃▇
black 0 1 8.67 14.28 0.0 0.50 1.90 9.60 85.9 ▇▁▁▁▁
native 0 1 1.72 7.25 0.0 0.10 0.30 0.60 92.1 ▇▁▁▁▁
asian 0 1 1.23 2.63 0.0 0.20 0.50 1.20 41.6 ▇▁▁▁▁
pacific 0 1 0.08 0.73 0.0 0.00 0.00 0.00 35.3 ▇▁▁▁▁
citizen 0 1 69935.07 205118.91 80.0 8450.50 19643.00 49920.50 6046749.0 ▇▁▁▁▁
income 1 1 46129.87 12911.30 10499.0 38191.50 44749.00 52074.00 123453.0 ▁▇▂▁▁
incomeerr 1 1 2850.40 1918.94 270.0 1635.00 2406.00 3446.00 21355.0 ▇▁▁▁▁
incomepercap 0 1 23981.77 6204.34 5878.0 20238.50 23460.00 27053.25 65600.0 ▁▇▁▁▁
incomepercaperr 0 1 1362.52 1049.88 113.0 755.00 1096.50 1631.00 15266.0 ▇▁▁▁▁
poverty 0 1 17.49 8.32 1.4 12.10 16.15 20.70 64.2 ▆▇▁▁▁
childpoverty 1 1 24.18 11.70 0.0 16.30 22.70 30.00 81.6 ▃▇▂▁▁
professional 0 1 30.99 6.37 13.5 26.70 29.90 34.40 74.0 ▂▇▂▁▁
service 0 1 18.35 3.64 5.0 16.00 18.10 20.30 38.2 ▁▇▇▁▁
office 0 1 22.22 3.20 4.1 20.20 22.40 24.40 35.4 ▁▁▇▇▁
construction 0 1 12.71 4.22 1.7 9.80 12.10 14.90 40.3 ▃▇▂▁▁
production 0 1 15.73 5.74 0.0 11.50 15.25 19.33 55.6 ▃▇▂▁▁
drive 0 1 79.18 7.66 5.2 76.60 80.70 83.70 94.6 ▁▁▁▂▇
carpool 0 1 10.28 2.91 0.0 8.40 9.90 11.80 29.9 ▁▇▂▁▁
transit 0 1 0.97 3.06 0.0 0.10 0.40 0.80 61.7 ▇▁▁▁▁
walk 0 1 3.32 3.76 0.0 1.40 2.40 4.00 71.2 ▇▁▁▁▁
othertransp 0 1 1.61 1.67 0.0 0.90 1.30 1.90 39.1 ▇▁▁▁▁
workathome 0 1 4.63 3.18 0.0 2.70 3.90 5.60 37.2 ▇▁▁▁▁
meancommute 0 1 23.28 5.60 4.9 19.50 23.00 26.80 44.0 ▁▅▇▂▁
employed 0 1 45593.52 149699.50 62.0 4550.75 10508.00 28632.75 4635465.0 ▇▁▁▁▁
privatework 0 1 74.22 7.86 25.0 70.50 75.70 79.70 88.3 ▁▁▁▆▇
publicwork 0 1 17.56 6.51 5.8 13.10 16.20 20.50 66.2 ▇▅▁▁▁
selfemployed 0 1 7.93 3.91 0.0 5.40 6.90 9.40 36.6 ▇▆▁▁▁
familywork 0 1 0.29 0.46 0.0 0.10 0.20 0.30 9.8 ▇▁▁▁▁
unemployment 0 1 8.09 4.10 0.0 5.50 7.60 9.90 36.5 ▇▇▁▁▁

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
state 0 1 4 20 0 52 0
county 0 1 3 33 0 1928 0
Explore generating code using R
library(tidyverse)
library(pins)
library(connectapi)

census_2015 <- read_csv('https://github.com/byuistats/data/raw/master/census2015/census2015.csv')


# Publish the data to the server with Bro. Hathaway as the owner.
board <- board_connect()
pin_write(board, census_2015, type = "parquet", access_type = "all")

pin_name <- "census_2015"
meta <- pin_meta(board, paste0("hathawayj/", pin_name))
client <- connect()
my_app <- content_item(client, meta$local$content_id)
set_vanity_url(my_app, paste0("data/", pin_name))

Access data

This data is available to all.

Direct Download: census_2015.parquet

R and Python Download:

URL Connections:

For public data, any user can connect and read the data using pins::board_connect_url() in R.

library(pins)
url_data <- "https://posit.byui.edu/data/census_2015/"
board_url <- board_connect_url(c("dat" = url_data))
dat <- pin_read(board_url, "dat")

Use this custom function in Python to have the data in a Pandas DataFrame.

import pandas as pd
import requests
from io import BytesIO

def read_url_pin(name):
  url = "https://posit.byui.edu/data/" + name + "/" + name + ".parquet"
  response = requests.get(url)
  if response.status_code == 200:
    parquet_content = BytesIO(response.content)
    pandas_dataframe = pd.read_parquet(parquet_content)
    return pandas_dataframe
  else:
    print(f"Failed to retrieve data. Status code: {response.status_code}")
    return None

# Example usage:
pandas_df = read_url_pin("census_2015")

Authenticated Connection:

Our connect server is https://posit.byui.edu which you assign to your CONNECT_SERVER environment variable. You must create an API key and store it in your environment under CONNECT_API_KEY.

Read more about environment variables and the pins package to understand how these environment variables are stored and accessed in R and Python with pins.

library(pins)
board <- board_connect(auth = "auto")
dat <- pin_read(board, "hathawayj/census_2015")
import os
from pins import board_rsconnect
from dotenv import load_dotenv
load_dotenv()
API_KEY = os.getenv('CONNECT_API_KEY')
SERVER = os.getenv('CONNECT_SERVER')

board = board_rsconnect(server_url=SERVER, api_key=API_KEY)
dat = board.pin_read("hathawayj/census_2015")

Footnotes

  1. Unknown↩︎