Data Model
The NAICS MCP Server uses DuckDB with six tables supporting semantic search, hierarchical navigation, and classification workflows.
naics_nodes
The core table containing all 2,125 NAICS 2022 codes with hierarchical references and embedding source text.
| Column | Type | Description |
|---|---|---|
node_code | varchar | Primary key. 2-6 digit NAICS code |
level | varchar | sector, subsector, industry_group, naics_industry, national_industry |
title | varchar | Official classification title |
description | text | Full description (nullable) |
sector_code | varchar | FK to parent sector (2-digit) |
subsector_code | varchar | FK to parent subsector (3-digit) |
industry_group_code | varchar | FK to parent industry group (4-digit) |
naics_industry_code | varchar | FK to parent NAICS industry (5-digit) |
raw_embedding_text | text | Concatenated text used for embedding generation |
change_indicator | varchar | Changes from NAICS 2017 version |
is_trilateral | boolean | True if code is common across US, Canada, Mexico |
naics_embeddings
384-dimensional vector embeddings for semantic similarity search, generated using sentence-transformers (all-MiniLM-L6-v2).
| Column | Type | Description |
|---|---|---|
node_code | varchar | Primary key, FK to naics_nodes |
embedding | float[384] | 384-dimensional vector |
embedding_text | varchar | Text that was embedded |
naics_index_terms
20,398 official index terms from the NAICS specification, enabling broad search coverage.
| Column | Type | Description |
|---|---|---|
term_id | integer | Primary key |
naics_code | varchar | FK to naics_nodes |
index_term | varchar | Official index term (e.g., “Pizza delivery services”) |
term_normalized | varchar | Lowercase version for case-insensitive search |
naics_cross_references
4,601 cross-references linking related codes with exclusions, see-also links, and includes.
| Column | Type | Description |
|---|---|---|
ref_id | integer | Primary key |
source_code | varchar | FK to naics_nodes (code containing reference) |
reference_type | varchar | excludes, see_also, includes |
reference_text | text | Original reference text |
target_code | varchar | FK to naics_nodes (nullable) |
excluded_activity | varchar | Specific activity that is excluded |
sic_naics_crosswalk
Mapping between legacy SIC codes and NAICS codes for migration support.
| Column | Type | Description |
|---|---|---|
sic_code | varchar | Standard Industrial Classification code |
naics_code | varchar | FK to naics_nodes |
relationship_type | varchar | direct (1:1), partial, or split |
classification_workbook
Session-based storage for classification decisions, supporting audit trails and iterative refinement.
| Column | Type | Description |
|---|---|---|
entry_id | varchar | Primary key |
form_type | varchar | Type of classification form |
label | varchar | Human-readable label |
content | json | Structured form content |
metadata | json | Additional metadata |
session_id | varchar | Session identifier |
parent_entry_id | varchar | FK to parent entry for threading |
tags | json | Categorization tags |
confidence_score | float | Classification confidence (0-1) |
created_at | timestamp | Entry creation time |
search_text | varchar | Full-text search field |