How to contribute to our vaccination data

We welcome contributions to our vaccination dataset! Note that due to the nature of our pipeline, we cannot accept pull requests for countries for which our processes are manual. To see which countries have manual processes check this file.

Content

About our vaccination dataset

Read this section to better understand the vaccination data that we are currently collecting.

We currently produce three vaccination datasets:

General dataset

location date vaccine source_url total_vaccinations people_vaccinated people_fully_vaccinated total_boosters
Cambodia 2021-09-10 Johnson&Johnson, Oxford/AstraZeneca, Sinopharm/Beijing, Sinovac https://www.facebook.com/MinistryofHealthofCambodia/photos/a.930887636950343/4376835072355565 20554497 11406989 9350408 742293

Where metrics:

are defined here. Additionally the remaining fields:

Note that for some countries, some metrics can't be reported as these are not be available. This is not ideal but it is OK.

Manufacturer dataset

Along with the main data, we include vaccine data broken down by manufacturer for some countries where this data is available.

Each row in the data gives the cumulative number of doses administered for a given date and vaccine manufacturer.

Fields

Example

date vaccine location total_vaccinations
... ... ... ...
2021-06-01 Moderna Lithuania 151261
2021-06-01 Oxford/AstraZeneca Lithuania 333733
2021-06-01 Johnson&Johnson Lithuania 34974
2021-06-01 Pfizer/BioNTech Lithuania 1133371
... ... ... ...

Notes

We only include manufacturer data for countries for which the process can be automated. No manual reports are currently being accepted. This is to ensure scalability of the project.

Age group dataset

We include vaccine data broken down by age groups for some countries where the data is available.

Each row in the data gives the percentage of people within an age group that have received at least one dose. Note that currently there is no standard for which age groups are accepted, as each country may define different ones. As a general rule, we try to have groups in 10 years chunks but this is optional.

Note that the reported metric is relative, and not absolute.

Fields

Example

location date age_group_min age_group_max people_vaccinated_per_hundred people_fully_vaccinated_per_hundred people_with_booster_per_hundred
... ... ... ... ... ... ...
Slovakia 2021-12-03 18 24 50.41 46.65 1.3
Slovakia 2021-12-03 25 49 51.31 48.26 3.52
Slovakia 2021-12-03 50 59 60.24 57.6 6.14
Slovakia 2021-12-03 60 69 67.12 65.14 16.05
Slovakia 2021-12-03 70 79 77.86 75.99 36.14
Slovakia 2021-12-03 80 63.5 61.1 27.39
... ... ... ... ... ... ...

Notes

We only include age group data for countries for which the process can be automated. No manual reports are currently being accepted. This is to ensure scalability of the project.

Report new data values

To report new values for a country/location, first check if the imports for that country/territory are automated. You can check column automated in this file.

Notes

Add new country automations

To automate the data import for a country, make sure that:

Contribute to general dataset

Next, follow the steps below:

  1. Decide if the import is batch (i.e. all the timeseries) or incremental (last value). See the scripts in src/cowidev/vax/batch and src/cowidev/vax/incremental for more details. Note: Batch is prefered over Incremental.

  2. Create a script and place it based on decision in step 1 either in src/cowidev/vax/batch or src/cowidev/vax/incremental . Note that each source is different and there is no single pattern that works for all sources.

  3. Feel free to add manufacturer/age data if you are automating a batch script and the data is available.

  4. Test that it is working and that it is stable. For this you need to have the library installed. Run

    cowid vax get [country-name]
  5. Issue a pull request and wait for a review.

Find below some scripts for reference based on the source file format and the mode (batch or incremental):

Mode CSV JSON API/JSON Excel PDF HTML HTML (news feed)
Batch Peru (+AM), Romania (+M) Hong Kong Lithuania, Israel (+A), Zimbabwe Luxembourg, New Zealand, South Korea (+A)
Incremental Finland Macao Argentina, Poland Spain Taiwan, Azerbaijan, Kenya Bulgaria, Equatorial Guinea Albania, Monaco

*(+M): Also collects manufacturer data, (+A): Also collects age group data, (+AM): Also collects both manufacturer and age group data.

Additionally, there are some special scripts which collect data from several countries:

More details: #230, #250

Contribute to manufacturer or age group dataset

We only accept scripts that collect the full time series (no support for incremental updates) when it comes to manufacturer and age group vaccination data.

Review all the steps in the previous section to better understand how to add this data. Also, refer to section About our vaccination dataset for more details about the fortmat of this datasets.

Criteria to accept pull requests

Due to how our pipeline operates at the moment, pull requests are only accepted under certain conditions. These include, but are not limited to, the following:

You can of course, and we appreciate it very much, create pull requests for other cases.

Note that files in public folder are not to be manually modified.