Skip to content

wpgp/get_wp_global

Repository files navigation

DOI

Get WorldPop Global Demographic Data

This repository contains some Python functions for the following purposes:

  • to locate and download Worldpop Global Demographic Data in raster format. The data can be in country-wise format or global mosaic.
  • to acquire and summarise population count from the WorldPop Global Demographic Data.

Download rasters

usage: get_raster [-h] [-l LAYER] [-t TLC]
                  [-d DATASET] [-v VERSION] [-y YEAR]
                  [-ar AGE_RANGE] [-dst DESTINATION]
                  [-c | --check | --no-check]

Simple program to download the Worldpop GlobalDemographic Data to local storage.

optional arguments:
  -h, --help            show this help message and exit
  -l LAYER, --layer LAYER
                        selected layer to download [pop,
                        female, male, zip]
  -t TLC, --tlc TLC     three letter code of the country to
                        download
  -d DATASET, --dataset DATASET
                        dataset number
  -v VERSION, --version VERSION
                        version number
  -y YEAR, --year YEAR  year
  -ar AGE_RANGE, --age_range AGE_RANGE
                        min and max age group to download,
                        separated by comma
  -dst DESTINATION, --destination DESTINATION
                        destination folder
  -c, --check, --no-check
                        list urls without downloading
                        (default: False)

There are some use cases with different raster targets:

  • A global mosaic of population in 2020 (1-km resolution): python get_raster.py -l pop -t MOS -y 2020 -dst output
  • A raster of 100-m resolution population of Aruba (tlc=ABW) in 2020: python get_raster.py -l pop -t ABW -y 2020 -dst output -res 100m
  • Rasters of 1-km resolution female population (age 0-20) of Aruba in 2020: python get_raster.py -t ABW -l female -ar 0,20 -y 2020 -res 1km -dst output
  • A zip file containing rasters of 1-km resolution male and female population (all age groups) of Aruba in 2020: python get_raster.py -l zip -t ABW -y 2020 -res 1km -dst output

Notes

Acceptable values for the options:

option possible values
layer pop, female, male, zip
tlc valid TLC/Alpha-3 or MOS for global mosaic or ALL for all countries
year 2015 to 2030
res 100m or 1km
age_range 0 to 90

Acquire and summarise

The idea is to get population count in every unit with boundary defined in the input. To achieve this, the function reads vector data defining the region of interests and performs zonal statistics to the relevant raster.

Preparation

VRT file is used as the reference to multiple rasters in the dataset. prep_script.py can be used to create relevant VRT files and put them in vrt folder. Modify dataset value when needed.

In python console, we can run:

exec(open('prep_script.py').read())

Usage

Population count

Obtaining population count inside non-overlapping circular buffers around points defined in adm.pkg.

import get_table as wp

vrt_path = 'vrt/R2024B/mosaic_2020_100m_constrained.vrt'
result = wp.extract('adm.gpkg', vrt_path=vrt_path,
  rad=10, clip_buffer=True,
  return_gdf=True)

# Alternative usage
result1 = wp.get_data('adm.gpkg', dataset='R2024B', 
  year=2020, resolution='1km', vrt_dir='vrt',
  return_gdf=True, rad=5, clip_buffer=False)

Age-sex structure

Extracting female population count with specified age range can be done using get_data_agesex(). The output contains population count at 5-year age interval. Total population count can also be extracted. This total covers the whole population, both sexes and all age intervals.

import get_table as wp

result2 = wp.get_data_agesex('adm.geojson', dataset='R2024B', 
  year=2020, resolution='1km', 
  vrt_dir='vrt', sex='female', get_total=True,
  return_gdf=False)

result2.head()
id f_00 f_05 f_10 pop count
0 1 2291 9623 8266 137168 136
1 2 1428 5996 5151 85479 85
2 3 4200 17637 15150 251403 250
3 4 296 1244 1068 17732 18

Some visualisations

Extracting gridded population count based on level-2 administrative boundaries covering some parts of Ghana, Benin, and Togo. Zonal statistics can be performed to obtain total population inside each administrative unit.

map-1

Extraction of total population using admin boundary (a) and circular buffer (b). The circular buffer is generated from the centroid of each administrative unit, which then clipped to avoid overlap.

map-2

Weighted sum

Suppose we want to estimate the number of people affected by a particular event occuring at a certain coordinate. We can define circular buffer around that point and apply zonal sum based on that buffer. The following code can be used to estimate total population (based on test.tif) over the area of interests which are 4-km circular buffers around the points defined in test_point.gpkg.

pts = gpd.read_file(path)
res,outpt = wp.extract(
  'data/test_point.gpkg',
  'output/test.tif',
  resolution='1km',
  return_all=True,
  rad=4)

fig-3

For some cases where the impact of the event declines by distance from the epicenter, a radial weighting function can be used during aggregation. The following function can be used for this purpose: $$w(r) = \dfrac{1-\exp(-(1-r)^p)}{1-\exp(-1)}$$

fig-4

The following code is an example relevant to this task.

res,outpt = wp.extract(
  'data/test_point.gpkg',
  'output/test.tif',
  resolution='1km',
  return_all=True,
  rad=4, weight=True, p=1)

fig-4

In other occasion, we can implement the same radial weight to other types of geometry, such as LineString defining road or waterway and edges of a Polygon. Additional argument edge=True is used to obtain the edges of the input Polygon. We can break MultiPolygons into multiple Polygons by using explode=True. In this way, the population count in each row is associated with the segregated Polygons.

res,outpt = wp.extract(
  'data/coastline.gpkg',
  'output/test.tif',
  resolution='1km',
  return_all=True,
  edge=True,
  explode=True,
  rad=4, weight=True, p=1)

fig-5

The above schemes can also be implemented in a high-level extraction. We just need to provide additional arguments to the extraction function.

result3 = wp.get_data_agesex('test_point.gpkg', 
  dataset='R2024B', 
  year=2020, resolution='100m', 
  vrt_dir='vrt', sex='both',
  rad=5, weight=True, p=2)

Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Citation

If you use get_wp_global in your research, please cite:

@software{get_wp_global,
  author = {Priyatikanto R., Nosatiuk B., Zhang W., McKeen T., Vataga E., Tejedor-Garavito N, Bondarenko M.},
  title = {get_wp_global: Python package to locate and download Worldpop Global Demographic Data and acquire/summarise population count from the WorldPop Global Demographic Data v1.},
  year = {2025},
  publisher = {GitHub}, 
  url = {https://github.com/wpgp/get_wp_global}
}

Acknowledgments

About

Contains some Python functions that can be used to acquire and summarise population cout from the WorldPop Global Demographic Dataset.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors