The Classification System Every American Business Uses Was Designed Before the Internet

Every business in America has a NAICS code.

If you’ve ever registered for a government contract, applied for an SBA loan, filed a Census survey, or set up a state tax account, you’ve been asked the question: What is your NAICS code?

And if you’re like most people, you Googled it. You landed on census.gov or naics.com. You typed in what your business does. And the results made no sense.

A fintech startup building payment APIs gets told it might be 522320 - “Financial Transactions Processing, Reserve, and Clearinghouse Activities.” Or maybe 511210 - “Software Publishers.” Two completely different sectors. Two different SBA size standards. Two different eligibility thresholds for government contracts. Pick wrong, and it’s not just an administrative inconvenience - it’s a material business consequence.

Here’s the thing nobody says out loud: the system that classifies every business in the United States was designed in the mid-1990s, published as a printed reference manual, and operates on the same conceptual architecture today. The lookup tools haven’t meaningfully changed since they went online. The descriptions are written for statisticians, not for the businesses being classified. And the whole thing updates once every five years - if the government doesn’t shut down first.

How We Got Here

A little history, because it matters.

The Standard Industrial Classification system - the SIC - was created in the 1930s. Manufacturing dominated the American economy. The SIC gave us 10 divisions, 4-digit codes, and a framework that worked well enough for a world where most economic activity involved making physical things in factories.

By the early 1990s, the SIC was under serious criticism. The U.S. economy had fundamentally shifted toward services, information technology, and new forms of healthcare delivery. The SIC couldn’t handle any of it. A 1991 international conference on economic classification essentially concluded that the system was broken, and in 1992 the Office of Management and Budget stood up the Economic Classification Policy Committee with a mandate for a “fresh slate” examination.

The result was NAICS - the North American Industry Classification System. Launched in 1997 as a collaboration between the U.S., Canada, and Mexico. Twenty sectors instead of ten. Six-digit codes instead of four. Over 150 new service-sector industries that the SIC simply didn’t have categories for. It was a genuine improvement. Credit where it’s due.

But here’s what matters: NAICS was designed for statistical agencies. Its purpose is to group establishments for the collection, analysis, and publication of economic data. It was not designed to help businesses classify themselves. It was not designed for search. It was not designed for programmatic access. And it was certainly not designed for a world where an AI assistant might need to look up a NAICS code on your behalf.

The last major conceptual rethinking of the system was 1997. Twenty-eight years ago. The revisions since then - 2002, 2007, 2012, 2017, 2022 - have added and rearranged codes, but they haven’t touched the architecture. The 2027 revision is already delayed because a government shutdown disrupted the ECPC’s timeline.

The foundation hasn’t moved. The world has.

Five Problems, One Root Cause

The root cause is simple: NAICS is a hierarchical taxonomy designed for printed reference materials, being used in a world that runs on search, APIs, and language models. Every problem downstream traces back to this mismatch.

1. Production-Oriented Classification in a Product-Oriented World

NAICS groups businesses by how they produce, not what they produce. This is a deliberate design choice - the idea is that establishments using similar raw material inputs, similar capital equipment, and similar labor should be classified together.

In practice, this creates some genuinely bizarre groupings.

Software publishers sit in the same subsector as magazine and newspaper publishers. Why? Because they all “issue copies of works for sale to the general public.” In 1997, when software came on CD-ROMs and was physically distributed like a magazine, this almost made sense. In 2025, a SaaS company building AI infrastructure has nothing in common with a newspaper - not the business model, not the customer, not the technology stack, not the labor, and frankly not the capital equipment either.

This isn’t an edge case. The production-oriented lens systematically struggles with the modern economy because the modern economy is defined by convergence. A company like Peloton is simultaneously a hardware manufacturer, a software platform, a content studio, and a fitness service. NAICS wants to pick one. The taxonomy forces multi-dimensional businesses into single-path hierarchies, and the result is a classification that feels arbitrary to anyone who actually runs one of these businesses.

The taxonomy forces multi-dimensional businesses into single-path hierarchies, and the result is a classification that feels arbitrary to anyone who actually runs one of these businesses. You’re classifying a business, which is inherently fuzzier than classifying a physical object, and the production-oriented framework makes it fuzzier still.

2. The Five-Year Update Cycle Is Geologically Slow

NAICS revises in years ending in 2 or 7. That’s it. If a new industry emerges in 2023, the earliest it can get its own code is 2027 - and only if someone submits a comment to the Federal Register during the public comment period, the ECPC accepts it, Canada and Mexico agree (since the first five digits are harmonized across all three countries), and nothing delays the revision process.

Think about what’s emerged or fundamentally transformed since the last revision in 2022: generative AI companies, the commercial drone industry, EV charging networks, psychedelic-assisted therapy clinics, carbon credit marketplaces, creator economy platforms. All of these are either shoehorned into codes that don’t describe them or dumped into catch-all categories that lump them with unrelated businesses.

This isn’t just an aesthetic problem. NAICS codes drive economic statistics. If emerging sectors don’t have their own codes, they don’t show up in the data. If they don’t show up in the data, policymakers can’t measure them. If policymakers can’t measure them, they can’t make informed decisions about regulation, funding, or support. The measurement infrastructure shapes what gets seen, and what gets seen shapes what gets done.

Every SaaS company on earth updates its product taxonomy quarterly. NAICS moves at geological speed.

3. Self-Assignment and the “Which Code Am I?” Problem

Here’s something most people don’t realize: businesses self-report their NAICS code. There is no central authority that assigns, audits, or corrects NAICS codes across the government. The Census Bureau assigns one code per establishment based on survey responses. Other agencies maintain their own lists. A single business can have different NAICS codes across different government systems.

This matters enormously for government contracting. In the System for Award Management (SAM), your NAICS code determines your SBA size standard - which determines whether you qualify as a “small business” for set-aside contracts. Wrong code, wrong size standard, wrong eligibility. The GAO has flagged inconsistencies in how contracting officers assign NAICS codes to contracts, and successful appeals remain infrequent because the system makes it hard to challenge.

And what tools do businesses have to get it right? Keyword search. Literally keyword search over bureaucratic descriptions. You type a word, you get a flat list of codes that contain that word somewhere in the title or description. No semantic understanding. No guided disambiguation. No “did you mean this or that?” No explanation of what’s included versus excluded. You get a list of possibilities and a strong sense that you’re guessing.

The Census Bureau’s own lookup tool is a keyword match over a flat file. The third-party tools on naics.com are marginally better but fundamentally the same. In 2025, with embedding models that cost fractions of a cent per query and can handle semantic similarity with high accuracy, we’re still doing exact-match keyword search over government prose. It’s indefensible.

4. The SIC-to-NAICS Legacy Debt

NAICS replaced the SIC in 1997. It’s now 2026. And the SIC still isn’t dead.

The SEC still uses SIC codes. Many private datasets were built on SIC and don’t cleanly map to NAICS - the crosswalk tables involve many-to-many relationships that can’t be resolved algorithmically without losing information. Historical time series break at the boundary: pre-1997 data is on SIC, post-1997 is on NAICS, and comparing across that line requires concordance tables that are incomplete and sometimes contradictory.

Academic researchers doing longitudinal studies - say, tracking the evolution of manufacturing employment over 40 years - have to navigate this crosswalk, and the results are sensitive to which concordance methodology they choose. A Dartmouth study found that the SIC-to-NAICS transition reclassified a significant share of economic activity not just into different industries, but into entirely different sectors. Manufacturing employment looked different overnight - not because the economy changed, but because the categories did.

This is infrastructure debt. The old system doesn’t die; it just accumulates workarounds. And every workaround adds friction, introduces error, and makes the whole ecosystem harder to build on.

5. The Descriptions Are Not Written for Search

Pull up any NAICS code and read the description. It’s written in dense, legalistic, statistical-agency prose. Terms of art that don’t match how businesses describe themselves. Inclusions and exclusions buried in paragraphs that require careful reading to parse. Cross-references to other codes that you have to look up separately.

A cloud hosting company might describe itself as a “cloud infrastructure provider.” The relevant NAICS code talks about “data processing, hosting, and related services.” These are semantically similar - a human reading both would connect them - but keyword search won’t, because the words don’t match.

This is a textbook case of a system designed for one purpose being pressed into service for another. Classification descriptions written for statistical tabulation by trained government analysts are being used for self-classification by business owners using a search box. The descriptions aren’t wrong - they’re precise and carefully defined. They’re just not written for the way people actually look for information in 2025.

No existing NAICS lookup tool uses semantic search. No tool uses embeddings. No tool offers guided classification with disambiguating questions. The state of the art in NAICS lookup is, functionally, a CTRL+F over a PDF.

The Problem Nobody in NLP Wants to Talk About

Here’s where this gets technically interesting - and where the people building “AI-powered classification” tools are either being dishonest or haven’t done the work.

The NAICS is riddled with exclusionary language. Open any page of the manual and you’ll find it: “Commercial and Industrial Machinery and Equipment (except Automotive and Electronic) Repair and Maintenance.” “Supermarkets and Other Grocery (except Convenience) Stores.” “Research and Development in Biotechnology (except Nanobiotechnology).” “Metal Coating, Engraving (except Jewelry and Silverware), and Allied Services to Manufacturers.” “Other Services (except Public Administration).”

And this isn’t limited to the NAICS. Government classification systems in general - tariff schedules, occupation codes, product classifications - are built on exclusionary logic at a foundational level. “Other than” and “except” appear thousands of times across these corpora. Entire classification hierarchies are defined by what they’re not. The correct classification often depends not on what something is, but on confirming it isn’t something else.

Here’s why this matters for anyone building search over these corpora: embeddings don’t understand negation.

This isn’t a minor technical limitation. It’s a well-documented failure mode in the NLP literature. When you embed the sentence “Metal Coating, Engraving (except Jewelry and Silverware),” the embedding model doesn’t suppress “Jewelry and Silverware” from the vector representation. It amplifies it. The words “Jewelry” and “Silverware” contribute positively to the semantic representation - they pull the vector toward the jewelry/silverware region of the embedding space. The “except” is functionally invisible.

Recent research has quantified this. A 2025 paper studying negation awareness across state-of-the-art universal text embeddings found that most models score negated sentence pairs with similarity above 0.9 - meaning the models treat “The treatment improved patient outcomes” and “The treatment did not improve patient outcomes” as nearly identical. One researcher described searching for “laptops without touch screens” and getting results exclusively about touch-screen laptops. The embedding model completely missed the negation.

Now apply this to classification. Someone searches for “jewelry engraving services.” Naive vector search will return NAICS 332812 - “Metal Coating, Engraving (except Jewelry and Silverware)” - as a high-confidence match. The embedding sees “engraving,” sees “jewelry,” sees “silverware,” computes a high cosine similarity, and recommends the one code that explicitly excludes what the user is looking for. The correct answer is somewhere else entirely, but the exclusionary language in the description actively sabotages the search.

This problem compounds at scale. Across the full NAICS corpus, hundreds of codes contain exclusionary terms. Every one of those exclusions is a potential false positive in vector search - a code that looks semantically close but is definitionally wrong.

And this is where most “AI classification” tools stop. They embed the descriptions, stand up a vector index, wrap a chat interface around it, and call it a product. The demo looks great. The benchmarks look fine. But in production, on the edge cases that actually matter - the ambiguous classifications, the codes defined by what they exclude, the queries where getting it wrong means the wrong SBA size standard or the wrong contract eligibility - the embeddings silently fail in exactly the cases where accuracy matters most.

Solving this requires more than better embeddings. You need to decompose the descriptions - separate the inclusive terms from the exclusive terms, embed them independently, and build scoring logic that penalizes matches on excluded concepts rather than rewarding them. You need to treat “except” not as a filler word but as a logical operator that inverts the relevance of everything that follows it. This is hard. It’s unglamorous. It requires actually reading the descriptions and understanding the domain. But it’s the difference between a demo and a tool.

What Modern Looks Like

So what does it look like to actually solve this? Here’s what I’m building.

The Data

NAICS 2022 has roughly 20 sectors, 100 subsectors, 300 industry groups, 400 NAICS industries, and about 1,000 six-digit national industries. That’s a manageable corpus - which means fast indexing, fast search, and room to enrich.

The raw source data comes from the Census Bureau’s reference files: Excel spreadsheets and a 900-page PDF manual. The first step is parsing these into a structured hierarchy with clean parent-child relationships, full descriptions, scope notes, and cross-references. Nothing exotic - just the unglamorous data engineering work of turning government publications into a usable schema.

The enrichment step is where it gets interesting. The official descriptions are precise but narrow. Augmenting them with common business language, concrete examples, and explicit exclusion notes transforms the corpus from something designed for statisticians into something designed for search. “Data processing, hosting, and related services” becomes a document that also contains “cloud hosting,” “AWS competitor,” “managed infrastructure,” and “IaaS/PaaS.” The official definition doesn’t change - you just give the embedding model more surface area to work with.

The Search Architecture

The search stack is straightforward. Sentence-transformers for embeddings - all-MiniLM-L6-v2 runs locally, generates 384-dimensional vectors, and handles the semantic similarity with embarrassing ease. DuckDB with an HNSW index for vector storage and retrieval. Hybrid scoring that combines vector similarity with keyword matching and hierarchical context.

On a corpus this size, the whole thing runs in under 50 milliseconds per query. Embed the query, hit the index, score the candidates, rank the results. Done before the user finishes reading the loading spinner - if you even need one.

The real performance differentiator isn’t speed, though. It’s relevance. When someone types “I run a company that builds AI tools for lawyers,” keyword search gives you noise. Semantic search gives you 511210 (Software Publishers), 541511 (Custom Computer Programming Services), and 541199 (All Other Legal Services) - ranked by similarity, with explanations for why each was suggested and what’s included versus excluded. That’s not a marginal improvement over keyword search. It’s a categorically different experience.

The Interface Problem

The most important thing I’m building isn’t the search engine. It’s the interface.

Current NAICS tools give you a list of codes and say “figure it out.” That’s fine if you’re a Census Bureau analyst who’s been classifying establishments for 20 years. It’s useless if you’re a startup founder filling out a SAM registration for the first time.

The better approach is guided classification. Ask the user what their business does in plain language. Run the semantic search. Present the top candidates with confidence scores. Then ask disambiguating questions: “Does your company primarily develop software for resale, or do you build custom solutions for specific clients?” The answer moves you from 511210 to 541511 - a meaningful distinction with real implications for size standards and contract eligibility.

Layer on an interactive hierarchy visualization - a graph that lets users traverse the NAICS tree - and they can see where their business sits in the broader taxonomy, what’s above them, what’s adjacent, and what the system thinks is similar. This turns classification from a guessing game into an informed decision.

And document the decision. A classification workbook that records the query, the candidates considered, the disambiguating questions answered, the final selection, and the confidence score. An audit trail. Because when someone challenges your NAICS code - and in government contracting, they will - you want receipts.

MCP Integration

Building this as an MCP server means AI assistants can classify businesses programmatically.

The use case writes itself. A business owner asks Claude or ChatGPT: “I’m starting a company that manufactures custom orthopedic insoles using 3D printing. What NAICS code should I use?” The LLM calls the classification tool, gets ranked candidates with similarity scores and scope notes, and presents them conversationally - with enough context for the user to make an informed choice.

This is the future of classification. Not better forms. Not redesigned search pages. Conversational AI backed by purpose-built semantic search infrastructure. The user never needs to know what an embedding is or how HNSW indexing works. They just ask a question and get a good answer.

Who Cares

You should care if you’re in any of these categories:

Small business owners. You’ve been guessing your NAICS code. You’ve been picking the one that seems closest and hoping nobody audits it. Better search and guided classification means you can get a confident, explainable answer in seconds - and actually understand why it’s right.

Government contractors. Your NAICS code determines your SBA size standard, which determines your eligibility for small business set-asides. If you’re classified under a code with a $30 million revenue threshold instead of a $47.5 million threshold, that’s not a bureaucratic nuance - that’s money on the table. Getting the right code matters, and the current tools don’t help you get there.

Economic researchers and data scientists. If you’re building on NAICS data, you need to understand where the taxonomy breaks. The production-oriented classification, the five-year lag, the self-assignment noise - these are systemic biases in your data. A tool that makes the taxonomy’s structure and limitations visible helps you account for them.

Policymakers. The NAICS is the lens through which we measure the American economy. If the lens is distorted - if emerging sectors are undercounted because they don’t have codes, if businesses are misclassified because the lookup tools are inadequate - then the measurements are wrong and the policies built on those measurements are built on sand.

Contracting officers. You’re assigning NAICS codes to contracts and you’re probably not getting training on how to do it well. Better tooling isn’t just a convenience - it’s a risk mitigation.

The Bigger Picture

Here’s where I zoom out.

The NAICS classifies businesses. The SOC classifies occupations. NAPCS classifies products. The Harmonized System classifies commodities internationally. SITC classifies trade transactions. ICD classifies diseases. DSM classifies mental disorders.

All of these systems share the same fundamental design pattern: hierarchical taxonomies, bureaucratic prose, keyword-based lookup, slow update cycles. All of them were designed for a world of printed reference manuals and trained human classifiers. None of them were designed for semantic search, vector embeddings, LLM-based classification, or modern data infrastructure.

The private sector solved analogous problems a long time ago. Amazon has a product taxonomy that updates continuously and uses ML-driven classification. LinkedIn classifies jobs and skills with embeddings. Every social media platform on earth has a content categorization system that would make the Census Bureau weep.

Government classification systems don’t need to be rebuilt from scratch. The taxonomies themselves are carefully designed, rigorously maintained, and serve important purposes. What they need is a modern search and interaction layer - something that takes the existing structure and makes it accessible to the tools and workflows people actually use today.

That’s what open-source infrastructure provides. Transparent, fast, free. No black boxes. No enterprise pricing for commodity technology. No vendor lock-in. Just better tools for working with the classification systems we already have.

I’m building this for the NAICS. Someone should build it for the SOC, and NAPCS, and every other government classification system that’s stuck in 2003.

These systems are infrastructure. Infrastructure should be open. Infrastructure should be fast. And infrastructure should work with the tools people actually use - not the tools people used when the system was designed.

What’s Next

The NAICS search tool and MCP server are in active development. If you want to follow along or contribute, the repos will be public.

If you work with NAICS data and want to help build better tools, let’s connect. If you’re involved in the 2027 NAICS revision - especially if you’re at the Census Bureau or on the ECPC - I’d love to talk about how modern search infrastructure could complement the update process. The taxonomy is yours. The tooling can be ours.

The tools are there. Let’s build them.

Fahad Baig