Health Index
Overview
Process of building an index to show the dimension of health with an index. Data used is the life expectancy at birth data from the IHME GBD 2021 Global Burden of Disease Study 2021 (GBD 2021) Mortality and Life Expectancy Forecasts 2022-2050 dataset.

source: https://ghdx.healthdata.org/record/ihme-data/global-life-expectancy-all-cause-mortality-and-cause-specific-mortality-forecasts-2022-2050 you’ll need to login for accessing the data.
Load Libraries and Data
raw_le2022 <- read_excel("data/IHME_GBD_2021_MORT_LE_FORECASTS_2022_2050_TABLES_0/le.XLSX",skip = 1)
head(raw_le2022)Reference scenario life expectancy at birth 2022
Female le data
female_le2022 <- raw_le2022$`Reference scenario life expectancy at birth` %>%
str_replace_all("·", ".") %>%
str_extract_all("\\d+\\.?\\d*", simplify = TRUE)Locations
location_name <- raw_le2022$...1Build data frame
female_le2022 <- data.frame(location_name,female_le2022) %>%
select(location_name, le = X1) %>%
mutate(le = as.numeric(le),
sex="female") %>%
drop_na()
female_le2022Male le data
male_le2022 <- raw_le2022$...5 %>%
str_replace_all("·", ".") %>%
str_extract_all("\\d+\\.?\\d*", simplify = TRUE)male_le2022 <- data.frame(location_name,male_le2022) %>%
select(location_name, le = X1) %>%
mutate(le = as.numeric(le),
sex="male") %>%
drop_na()
male_le2022Combine female and male le
HALE data
raw_le_hale2022 <- read_excel("data/IHME_GBD_2021_MORT_LE_FORECASTS_2022_2050_TABLES_0/le_hale.XLSX")
head(raw_le_hale2022)location_name <- raw_le_hale2022$`Supplemental Results Table S2. Life expectancy and healthy life expectancy (HALE) in 2022 and 2050 (reference scenario) by location for both sexes. Estimates are listed as means with 95% uncertainty intervals in parentheses. Highlighted rows indicate region and super region results from the GBD location hierarchy.`hale2022 <- raw_le_hale2022$...5%>%
str_replace_all("·", ".") %>%
str_extract_all("\\d+\\.?\\d*", simplify = TRUE)hale2022 <- data.frame(location_name,hale2022) %>%
select(location_name, hale = X1) %>%
mutate(hale = as.numeric(hale)) %>%
drop_na()%>%
distinct()
hale2022From results healthdata website yll and yld by age standardized
yld2022_data <- yll_yld2022%>%
filter(measure=="YLDs (Years Lived with Disability)")%>%
rename(yld=val)%>%
select(-measure) %>%
distinct()
yll2022_data <- yll_yld2022%>%
filter(measure=="YLLs (Years of Life Lost)")%>%
rename(yll=val)%>%
select(-measure) %>%
distinct()
yll_yld2022_data<- merge(yld2022_data,yll2022_data)Combine all data
the dimension index is calculated as:
\[ \text{dimension index} = \frac{\alpha\text{le} + (1-\alpha)\text{hale}}{(1-\hat{yll}) + (1-\hat{yld})} * 100 \] where scaled yll and scaled yld are the standardized values of yll and yld respectively.
In literature a dimension index is used to measure the quality of life in a location, taking into account not only life expectancy but also the burden of disease and disability.
In particular, the dimension index combines life expectancy at birth (le_avg2022) and healthy life expectancy (hale) in the numerator, reflecting both the quantity and quality of life. The denominator incorporates the scaled values of years of life lost (yll) and years lived with disability (yld), which represent the burden of disease and disability in the population.
Particular attention is put on the values of yll and yld, which are scaled to ensure comparability across different locations. The scaling process standardizes these values, allowing for a more accurate assessment of the overall health status of the population.
for reference see:
- https://www.frontiersin.org/journals/public-health/articles/10.3389/fpubh.2025.1566469/full
- https://pmc.ncbi.nlm.nih.gov/articles/PMC4140376/
scale between 0 and 1
# scale <- function(x){
# (x - min(x)) / (max(x) - min(x))
# }
# ?scale
\[ health_dim_index = (LE_scaled + HALE_scaled + (1 - YLL_scaled) + (1 - YLD_scaled)) / 4 \]
index_data2022 %>%
mutate(across(where(is.numeric), ~ as.numeric(scale(.x, center = F)),
.names = "{.col}_scaled")) %>%
mutate(dimension_index = round((le_avg2022 + hale)/((1-yll_scaled) + (1-yld_scaled))*100,2),
dimension_index2 = round((le_avg2022_scaled + hale_scaled + (1-yll_scaled) + (1-yld_scaled))/4,2),
dimension_index3 = round(((le_avg2022_scaled + hale_scaled)/ ((1-yll_scaled) + (1-yld_scaled))/2)*100,2),
dimension_index3_geo=((le_avg2022 + hale)/(((1-yll_scaled)+(1-yld_scaled))/2))^(1/4) *100,
.after=location_name) %>%
filter(is.na(dimension_index3_geo))