Word Health Organization (WHO) Tuberculosis case notifications by country
See source for description of the data. tb_dictionary describes the column names.
health
Author
DS 150
Published
February 20, 2024
Data details
There are 107,875 rows and 8 columns. The data source1 is used to create our data that is stored in our pins table. You can access this pin from a connection to posit.byui.edu using hathawayj/tb_cases.
This data is available to all.
Variable description
See source for description of the data. tb_dictionary describes the column names.
Variable summary
Variable type: numeric
skim_variable
n_missing
complete_rate
mean
sd
p0
p25
p50
p75
p100
hist
year
0
1.00
2010.01
6.87
1980
2006
2010
2015
2022
▁▁▃▇▅
age_middle
3224
0.97
38.05
22.63
2
19
39
59
75
▅▇▅▇▃
cases
0
1.00
945.02
6661.40
0
3
31
234
253232
▇▁▁▁▁
Variable type: character
skim_variable
n_missing
complete_rate
min
max
empty
n_unique
whitespace
country
0
1.00
4
56
0
217
0
g_whoregion
0
1.00
3
3
0
6
0
sex
808
0.99
1
1
0
2
0
age
808
0.99
2
4
0
11
0
var
808
0.99
7
14
0
4
0
Explore generating code using R
library(tidyverse)library(pins)library(connectapi)tb_cases <-read_csv("https://extranet.who.int/tme/generateCSV.asp?ds=notifications")# Wranglingtb_cases <- tb_cases %>%select(-(new_sp:c_newinc), -contains('_fu'), -contains('_mu'), -contains('_sexunk'), -contains('gesex'), -contains('15plus'), -contains('014'),-(rdx_data_available:hiv_reg_new2)) %>%pivot_longer(cols = new_sp_m04:newrel_f65, names_to ="key", values_to ="cases", values_drop_na =TRUE ) %>%mutate(key = stringr::str_replace(key, "newrel", "new_rel") ) %>%separate(key, c("new", "var", "sexage")) %>%select(-new, -iso2, -iso3, -iso_numeric) %>%separate(sexage, c("sex", "age"), sep =1) %>%mutate(age_middle =case_when( age =='04'~2, age =='514'~9, age =='1524'~19, age =='2534'~29, age =='3544'~39, age =='4554'~49, age =='5564'~59, age =='65'~75),var =case_when( var =='sp'~'smear positive', var =='sn'~'smear negative', var =='rel'~'relapse', var =='ep'~'extrapulmonary' )) %>%select(country, g_whoregion, year, sex, age, age_middle, var, cases)# Publish the data to the server with Bro. Hathaway as the owner.board <-board_connect()pin_write(board, tb_cases, type ="parquet")pin_name <-"tb_cases"meta <-pin_meta(board, paste0("hathawayj/", pin_name))client <-connect()my_app <-content_item(client, meta$local$content_id)set_vanity_url(my_app, paste0("data/", pin_name))
Use this custom function in Python to have the data in a Pandas DataFrame.
import pandas as pdimport requestsfrom io import BytesIOdef read_url_pin(name): url ="https://posit.byui.edu/data/"+ name +"/"+ name +".parquet" response = requests.get(url)if response.status_code ==200: parquet_content = BytesIO(response.content) pandas_dataframe = pd.read_parquet(parquet_content)return pandas_dataframeelse:print(f"Failed to retrieve data. Status code: {response.status_code}")returnNone# Example usage:pandas_df = read_url_pin("tb_cases")
Authenticated Connection:
Our connect server is https://posit.byui.edu which you assign to your CONNECT_SERVER environment variable. You must create an API key and store it in your environment under CONNECT_API_KEY.