Brazilian Companies
This dataset lists companies in Brazil with their legal nature (sole proprietorship, publicly traded corporation etc.), size (small enterprise, micro enterprise, or other), and qualifications of the owner, as well as their capital stock.
I thought looking at the breakdown of size and type was interesting, as was a comparison by number of companies vs. capital making up each grouping.
Listing all the legal entities in the data overwhelmed the visual a bit, so I did some categorizing myself—all limited liability corporations, all types of partnerships, all co-ops, and then putting a bunch in an “Other” bucket.
Queries
This was my staging table of sorts, where I pulled in the CSV, formatted company_size, and added a column categorizing legal_nature.
create or replace table companies as
select * replace (replace(upper(company_size[1])||company_size[2:], '-', ' ') as company_size),
-- Turn `small-enterprise` into `Small enterprise`
-- Group legal entities into categories
case when legal_nature in (
-- pass-through
'Sole Proprietorship',
'Privately Held Corporation',
'Publicly Traded Corporation',
'State-Owned Enterprise') then legal_nature
-- group types containing key phrases
when legal_nature like '%Limited Liability%' then 'LLC'
when legal_nature like '%Partnership%' then 'Partnership'
when legal_nature like '%Cooperative%' then 'Cooperative'
else 'Other Type'
end as legal_category
from 'https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2026/2026-01-27/companies.csv'
To get the multi-level Sankey diagram, I needed to union all two sets of data—one just grouping by the company size with ‘Brazil’ as the source, and one grouping by size and legal category to count “flow” between them.
create or replace view companies_sankey as
select 'Brazil' as source,
company_size as target,
count(*) as companies
from companies
group by all
union all
select company_size,
legal_category,
count(*) as companies
from companies
group by all
order by 3 desc, company_size, legal_category
I ordered by the third column because this translated between queries without having to change what that column was in the order by clause:
create or replace view capital_sankey as
select 'Brazilian capital' as source,
company_size as target,
sum(capital_stock) as capital_stock
from companies
group by all
union all
select company_size,
legal_category,
sum(capital_stock) as capital_stock
from companies
group by all
order by 3 desc, company_size, legal_category