Week 2 13 January 2026

The Languages of Africa

The first curated dataset for 2026 is African languages, including country, language family, and speaker count.

I took a lot of different stabs at this one before landing on an approach. I tried a scatter plot looking at total number of speakers versus number of countries for each language, across all… *(checks notes)*

select count(distinct language) from "https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2026/2026-01-13/africa.csv"

…502 languages. That looked like an unreadable cluster in the lower left with a handful of dots scattered around the rest of the chart area (most notably Arabic, spoken natively by 1.8 billion people across 12 countries).

After thinking about it for a minute, I actually wanted to zoom in on the ones closest to (0, 0)—the ones spoken by the fewest people. There’s no defined threshold for an endangered language, but 1000 left me with a manageable 26 languages.

One in a ThousandAfrican languages with 1000 or fewer native speakers

Burkina FasoCameroonCentral African RepublicChadEthiopiaGabonIvory CoastMauritaniaNigeriaSomaliaSouth SudanSudan

Kung

Goundo

Mbre

Boon

Defaka

200

Kelo

200

Mbuʼ

200

Birri

200

Paleni

260

Áncá

300

Aja

400

Missong

400

Shabo

400

Osatu

400

Nding

400

Mbowe

460

Mundabli

500

Imraguen

530

Geme

550

Mbuk

600

Eman

800

Abon

800

Ambo

1000

Buru–Angwe

1000

Sighu

1000

Kwaʼ

1000

↓ Download data • ↓ Download image

A few things I noticed:

All of these are spoken within the borders of a single country, except Aja, split across the Central African Republic and South Sudan.
A lot of these are in Cameroon.
Twelve speakers?!

Queries

The final query I ended up using was:

create or replace table africa as

-- Get languages with up to 1000 speakers, totaled across countries
with languages as (
   select language,
          country,
          native_speakers,
          sum(native_speakers) over (partition by language) as total_speakers

     from "https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2026/2026-01-13/africa.csv"
  qualify total_speakers <= 1000
),

-- Pivot on countries to create an individual series for each
pivot_countries as (
    pivot languages on country
    using sum(native_speakers)
 order by total_speakers
)

-- Drop the total_speakers column from the final table
select * exclude total_speakers from pivot_countries

I copied the resulting africa table to CSV.