Skip to Content
ETL Notebooks

ETL Notebooks

Interactive Marimo  notebooks documenting the data pipeline that builds the NAICS database from Census Bureau source files.

These notebooks run in your browser via WebAssembly - no Python installation required. Code and outputs are pre-computed; you’re viewing a read-only snapshot of each analysis.


Data Pipeline

Run these notebooks in order to build the complete NAICS database:


Exploration


Database Schema

After running the full pipeline, the database contains:

TableRowsDescription
naics_nodes2,125Codes with hierarchy and descriptions
naics_index_terms20,398Official search keywords
naics_cross_references4,601Exclusion/inclusion references
naics_embeddings2,125384-dim vectors for semantic search
naics_relationships2,125Pre-computed similarity graph (JSON)

Source Code

The original Marimo notebooks are available in the naics-mcp-server repository .