Word Health Organization (WHO) Tuberculosis case notifications by country

See source for description of the data. tb_dictionary describes the column names.
health
Author

DS 150

Published

February 20, 2024

Data details

There are 107,875 rows and 8 columns. The data source1 is used to create our data that is stored in our pins table. You can access this pin from a connection to posit.byui.edu using hathawayj/tb_cases.

This data is available to all.

Variable description

See source for description of the data. tb_dictionary describes the column names.

Variable summary

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
year 0 1.00 2010.01 6.87 1980 2006 2010 2015 2022 ▁▁▃▇▅
age_middle 3224 0.97 38.05 22.63 2 19 39 59 75 ▅▇▅▇▃
cases 0 1.00 945.02 6661.40 0 3 31 234 253232 ▇▁▁▁▁

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
country 0 1.00 4 56 0 217 0
g_whoregion 0 1.00 3 3 0 6 0
sex 808 0.99 1 1 0 2 0
age 808 0.99 2 4 0 11 0
var 808 0.99 7 14 0 4 0
Explore generating code using R
library(tidyverse)
library(pins)
library(connectapi)

tb_cases <- read_csv("https://extranet.who.int/tme/generateCSV.asp?ds=notifications")

# Wrangling
tb_cases <- tb_cases %>%
  select(-(new_sp:c_newinc), 
         -contains('_fu'), -contains('_mu'), -contains('_sexunk'), 
         -contains('gesex'), -contains('15plus'), -contains('014'),
         -(rdx_data_available:hiv_reg_new2)) %>%
  pivot_longer(
    cols = new_sp_m04:newrel_f65, 
    names_to = "key", 
    values_to = "cases", 
    values_drop_na = TRUE
  ) %>% 
  mutate(
    key = stringr::str_replace(key, "newrel", "new_rel")
  ) %>%
  separate(key, c("new", "var", "sexage")) %>% 
  select(-new, -iso2, -iso3, -iso_numeric) %>% 
  separate(sexage, c("sex", "age"), sep = 1) %>%
  mutate(
    age_middle = case_when(
      age == '04' ~ 2,
      age == '514' ~ 9,
      age == '1524' ~ 19,
      age == '2534' ~ 29,
      age == '3544' ~ 39,
      age == '4554' ~ 49,
      age == '5564' ~ 59,
      age == '65' ~ 75),
    var = case_when(
      var == 'sp' ~ 'smear positive',
      var == 'sn' ~ 'smear negative',
      var == 'rel' ~ 'relapse',
      var == 'ep' ~ 'extrapulmonary'
    )) %>%
  select(country, g_whoregion, year, sex, age, age_middle, var, cases)



# Publish the data to the server with Bro. Hathaway as the owner.
board <- board_connect()
pin_write(board, tb_cases, type = "parquet")

pin_name <- "tb_cases"
meta <- pin_meta(board, paste0("hathawayj/", pin_name))
client <- connect()
my_app <- content_item(client, meta$local$content_id)
set_vanity_url(my_app, paste0("data/", pin_name))

Access data

This data is available to all.

Direct Download: tb_cases.parquet

R and Python Download:

URL Connections:

For public data, any user can connect and read the data using pins::board_connect_url() in R.

library(pins)
url_data <- "https://posit.byui.edu/data/tb_cases/"
board_url <- board_connect_url(c("dat" = url_data))
dat <- pin_read(board_url, "dat")

Use this custom function in Python to have the data in a Pandas DataFrame.

import pandas as pd
import requests
from io import BytesIO

def read_url_pin(name):
  url = "https://posit.byui.edu/data/" + name + "/" + name + ".parquet"
  response = requests.get(url)
  if response.status_code == 200:
    parquet_content = BytesIO(response.content)
    pandas_dataframe = pd.read_parquet(parquet_content)
    return pandas_dataframe
  else:
    print(f"Failed to retrieve data. Status code: {response.status_code}")
    return None

# Example usage:
pandas_df = read_url_pin("tb_cases")

Authenticated Connection:

Our connect server is https://posit.byui.edu which you assign to your CONNECT_SERVER environment variable. You must create an API key and store it in your environment under CONNECT_API_KEY.

Read more about environment variables and the pins package to understand how these environment variables are stored and accessed in R and Python with pins.

library(pins)
board <- board_connect(auth = "auto")
dat <- pin_read(board, "hathawayj/tb_cases")
import os
from pins import board_rsconnect
from dotenv import load_dotenv
load_dotenv()
API_KEY = os.getenv('CONNECT_API_KEY')
SERVER = os.getenv('CONNECT_SERVER')

board = board_rsconnect(server_url=SERVER, api_key=API_KEY)
dat = board.pin_read("hathawayj/tb_cases")