Denver residential dwelling sales for 2013 – BYU-I Course Project Data

Data details

There are 22,913 rows and 26 columns. The data source¹ is used to create our data that is stored in our pins table. You can access this pin from a connection to posit.byui.edu using hathawayj/dwellings_denver.

This data is available to all.

Variable description

parcel: The parcel id
nbhd: Neigborhood of the home
abstrprd: No clue
livearea: Square footage that is liveable
finbsmnt: Square footage finished in the basement
basement: Total square footage of the basement
yrbuilt: Year the home was built
condition: Condition of the home (6 levels provided)
quality: A letter ratting
totunits: How many dwelling units in the building
stories: The number of stories
gartype: Details on the garage type
nocars: size of the garage in cars
xtraffic: emtpy
floorlvl: Mostly for apartments. What level the living unit is on
numbdrm: Number of bedrooms
numbaths: Number of bathrooms
arcstyle: Type of home
sprice: Selling price
deduct: Deduction from the selling price
netprice: Net price of home
tasp: Tax assesed selling price
smonth: Month sold
syear: Year sold
qualified: Q or U with 66 percent Q
status: I or V with over 90 percent I

Variable summary

Variable type: numeric

skim_variable	n_missing	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
abstrprd	0	1.00	1118.98	152.60	100	1112	1112	1114	9279	▇▁▁▁▁
livearea	0	1.00	1510.78	787.02	293	975	1302	1823	21503	▇▁▁▁▁
finbsmnt	0	1.00	330.51	469.91	0	0	0	716	4320	▇▂▁▁▁
basement	0	1.00	524.50	567.65	0	0	401	959	9025	▇▁▁▁▁
yrbuilt	0	1.00	1963.69	36.93	1873	1940	1963	2002	2013	▂▃▆▅▇
totunits	0	1.00	1.02	0.26	0	1	1	1	10	▇▁▁▁▁
stories	0	1.00	1.42	0.57	1	1	1	2	4	▇▅▁▁▁
nocars	0	1.00	1.36	1.10	0	0	2	2	17	▇▁▁▁▁
floorlvl	18025	0.21	5.20	7.04	1	1	2	5	44	▇▁▁▁▁
numbdrm	0	1.00	2.52	0.88	0	2	2	3	9	▁▇▁▁▁
numbaths	0	1.00	2.35	1.12	0	2	2	3	11	▇▅▁▁▁
sprice	0	1.00	561859.64	2135231.25	1000	134000	230000	378400	21750000	▇▁▁▁▁
deduct	0	1.00	1336.95	3061.79	0	0	0	2383	101809	▇▁▁▁▁
netprice	0	1.00	560522.69	2135373.06	1000	132000	229000	375500	21750000	▇▁▁▁▁
tasp	0	1.00	563820.73	2135272.35	1000	132890	231000	382184	21750000	▇▁▁▁▁
smonth	0	1.00	6.54	3.27	1	4	6	9	12	▇▇▆▆▇
syear	0	1.00	2011.06	0.71	2010	2011	2011	2012	2012	▃▁▇▁▅

Variable type: character

skim_variable	n_missing	complete_rate	min	max	n_unique
parcel	0	1.00	16	16	20465
condition	0	1.00	3	5	5
quality	0	1.00	1	1	5
gartype	0	1.00	2	7	7
xtraffic	22053	0.04	1	1	1
arcstyle	0	1.00	8	23	12
qualified	0	1.00	1	1	2
status	0	1.00	1	1	2

Explore generating code using Python

# https://www.denvergov.org/opendata/dataset/city-and-county-of-denver-real-property-sales-book-2013
# https://github.com/abresler/realtR

# %%
# packages
import polars as pl
import os
from dotenv import load_dotenv, find_dotenv
from pins import board_connect
import http.cookiejar
import json
import os
import urllib.parse
import urllib.request
from pydrive2.auth import GoogleAuth
from oauth2client.service_account import ServiceAccountCredentials
from pydrive2.drive import GoogleDrive

load_dotenv("../../../../.env")
API_KEY = os.getenv('CONNECT_API_KEY')
SERVER = os.getenv('CONNECT_SERVER')
# %%
SCOPES = ['https://www.googleapis.com/auth/drive.readonly']
gauth = GoogleAuth()
gauth.credentials = ServiceAccountCredentials.from_json("client_secrets.json", SCOPES)

# %%
gauth.LocalWebserverAuth()

drive = GoogleDrive(gauth)


# %%
dat = pl.read_csv(url, ignore_errors=True, truncate_ragged_lines=True)\
  .filter((pl.col("COMMUSE").is_null()) & (pl.col("LIVEAREA") > 0))\
  .drop(['Model', 'Cluster', 'Group', 'CondoComplex', 'CondoComplexName', 'COMMUSE'])\
  .with_columns(pl.col("NBHD").cast(pl.String()).cast(pl.Categorical()))\
  .select(pl.all().name.to_lowercase())

#%%
datp = pd.read_csv(url)\
    .query('~COMMUSE.notna()')\
    .drop(columns = ['Model', 'Cluster', 'Group', 'CondoComplex', 'CondoComplexName', 'COMMUSE'])\
    .assign(NBHD = lambda x: x.NBHD.astype('object'))\
    .query('LIVEAREA > 0')

# need to move the index into a column
# Here is the API to change the vanity name https://posit.co/blog/rstudio-connect-1-8-6-server-api/
# rename columns to lower case
dat.columns = dat.columns.str.lower()

# %%

# Publish the data to the server with Bro. Hathaway as the owner.
pin_name = "dwellings_denver"
board = board_connect(server_url=SERVER, api_key=API_KEY)
board.pin_write(dat.to_pandas(), "hathawayj/" + pin_name, type="parquet")

# %%
meta = board.pin_meta("hathawayj/" + pin_name)
# https://docs.posit.co/connect/user/python-pins/
# https://rstudio.github.io/pins-python/
meta.local.get("content_id")



#%% R code
# Publish the data to the server with Bro. Hathaway as the owner.
board <- board_connect()
pin_write(board, led_study, type = "parquet")

pin_name <- "led_study"
meta <- pin_meta(board, paste0("hathawayj/", pin_name))
client <- connect()
my_app <- content_item(client, meta$local$content_id)
set_vanity_url(my_app, paste0("data/", pin_name))

Access data

This data is available to all.

Direct Download: dwellings_denver.parquet

R and Python Download:

URL Connections:

For public data, any user can connect and read the data using pins::board_connect_url() in R.

library(pins)
url_data <- "https://posit.byui.edu/data/dwellings_denver/"
board_url <- board_connect_url(c("dat" = url_data))
dat <- pin_read(board_url, "dat")

Use this custom function in Python to have the data in a Pandas DataFrame.

import pandas as pd
import requests
from io import BytesIO

def read_url_pin(name):
  url = "https://posit.byui.edu/data/" + name + "/" + name + ".parquet"
  response = requests.get(url)
  if response.status_code == 200:
    parquet_content = BytesIO(response.content)
    pandas_dataframe = pd.read_parquet(parquet_content)
    return pandas_dataframe
  else:
    print(f"Failed to retrieve data. Status code: {response.status_code}")
    return None

# Example usage:
pandas_df = read_url_pin("dwellings_denver")

Authenticated Connection:

Our connect server is https://posit.byui.edu which you assign to your CONNECT_SERVER environment variable. You must create an API key and store it in your environment under CONNECT_API_KEY.

Read more about environment variables and the pins package to understand how these environment variables are stored and accessed in R and Python with pins.

library(pins)
board <- board_connect(auth = "auto")
dat <- pin_read(board, "hathawayj/dwellings_denver")

import os
from pins import board_rsconnect
from dotenv import load_dotenv
load_dotenv()
API_KEY = os.getenv('CONNECT_API_KEY')
SERVER = os.getenv('CONNECT_SERVER')

board = board_rsconnect(server_url=SERVER, api_key=API_KEY)
dat = board.pin_read("hathawayj/dwellings_denver")

Footnotes

https://www.denvergov.org/opendata/dataset/city-and-county-of-denver-real-property-sales-book-2013 ↩︎