The Languages of Africa

View data on TidyTuesday

The first curated dataset for 2026 is African languages, including country, language family, and speaker count.

I took a lot of different stabs at this one before landing on an approach. I tried a scatter plot looking at total number of speakers versus number of countries for each language, across all… *(checks notes)*

select count(distinct language) from "https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2026/2026-01-13/africa.csv"

…502 languages. That looked like an unreadable cluster in the lower left with a handful of dots scattered around the rest of the chart area (most notably Arabic, spoken natively by 1.8 billion people across 12 countries).

After thinking about it for a minute, I actually wanted to zoom in on the ones closest to (0, 0)—the ones spoken by the fewest people. There’s no defined threshold for an endangered language, but 1000 left me with a manageable 26 languages.

One in a ThousandAfrican languages with 1000 or fewer native speakers
  • Burkina Faso
  • Cameroon
  • Central African Republic
  • Chad
  • Ethiopia
  • Gabon
  • Ivory Coast
  • Mauritania
  • Nigeria
  • Somalia
  • South Sudan
  • Sudan
Kung
12
Goundo
30
Mbre
50
Boon
60
Defaka
200
Kelo
200
Mbuʼ
200
Birri
200
Paleni
260
Áncá
300
Aja
400
Missong
400
Shabo
400
Osatu
400
Nding
400
Mbowe
460
Mundabli
500
Imraguen
530
Geme
550
Mbuk
600
Eman
800
Abon
800
Ambo
1000
Buru–Angwe
1000
Sighu
1000
Kwaʼ
1000
↓ Download data

A few things I noticed:

  • All of these are spoken within the borders of a single country, except Aja, split across the Central African Republic and South Sudan.
  • A lot of these are in Cameroon.
  • Twelve speakers?!

Queries

The final query I ended up using was:

create or replace table africa as

-- Get languages with up to 1000 speakers, totaled across countries
with languages as (
   select language,
          country,
          native_speakers,
          sum(native_speakers) over (partition by language) as total_speakers

     from "https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2026/2026-01-13/africa.csv"
  qualify total_speakers <= 1000
),

-- Pivot on countries to create an individual series for each
pivot_countries as (
    pivot languages on country
    using sum(native_speakers)
 order by total_speakers
)

-- Drop the total_speakers column from the final table
select * exclude total_speakers from pivot_countries

I copied the resulting africa table to CSV.