Engineering RAG: Sourcing Parts with Precision

How VectorData.solutions ingests BOMs and datasheets into agentic RAG for accurate parts lookup. Architecture details, embedding choices, and real-world production lessons.

RAGChromaDBEngineeringParts SourcingFastAPIB2B

The Document Search Struggle

Engineering firms generate massive quantities of structured and semi-structured data: bills of materials, component datasheets, spec sheets, RFQs, and compliance paperwork. Traditionally, finding the specific part or spec for a requirement relies on an employee with 15 years of experience remembering which PDF to open.

That reliance on tribal knowledge is exactly where RAG excels. Instead of scanning filenames, we embed the actual content and retrieve results based on meaning.

VectorData.solutions is the B2B platform we built to address this. Engineering teams upload their specs and BOMs; our system ingests, chunks, and embeds the data to make everything queryable through an agentic interface that understands part relationships.

The Architecture

PDF/Excel/CSV upload → Document parsing → Chunking strategy
                                         → Embedding (per chunk)
                                         → ChromaDB storage
                                         → Metadata extraction (part numbers, categories, tolerances)

User query → Intent detection → Retrieval (semantic + metadata filter)
           → Re-ranking → Answer generation with source attribution

The agentic layer operates above this stack. Rather than returning raw text chunks, the agent interprets the query to decide whether to search by exact part number, by specification semantics, or by category filters, then assembles a response with sourcing recommendations.

Chunking Strategy for Technical Documents

Generic chunking, like splitting every 500 tokens, destroys the integrity of technical documents. A tolerance specification split across two chunks loses its meaning entirely in both.

What works for engineering docs:

Section-aware chunking. We parse the document structure first (headers, tables, figure captions), then chunk within those sections. A "Materials" section stays intact even if it exceeds 800 tokens.

Table preservation. Engineering BOMs are essentially tables. Splitting a table row from its header renders both useless. We detect tables, serialize them as structured text (including column headers in every chunk), and embed the whole table as one unit if it fits.

Part number anchoring. Every chunk mentioning a part number gets that identifier added to its metadata. This enables hybrid retrieval: semantic search for "high-temperature resistant O-ring" filtered by category, then cross-referenced against exact part number matches.

Embedding Selection

We tested three approaches:

OpenAI ada-002 — Delivers solid general performance but struggles with alphanumeric part numbers and engineering abbreviations. Terms like "DN50 PN16 flanged gate valve" and "2-inch 150-class flanged gate valve" should be close in vector space, yet general-purpose embeddings fail to place them near enough.

Domain-tuned sentence-transformers — Better on technical jargon after fine-tuning on a corpus of engineering documents. However, this requires maintaining the model and running inference locally.

Hybrid approach (what we use) — We utilize general embeddings for semantic content, combined with exact-match indexes on part numbers, manufacturer codes, and standardized specifications (ASTM, ISO numbers). The retrieval pipeline queries both systems and merges the results.

The Re-ranking Challenge

First-pass retrieval by cosine similarity returns relevant chunks, but the ranking isn't production-ready for high-stakes part sourcing. A slight error in part selection can lead to a failed inspection.

We added a re-ranking step that verifies:

Specification match — Do the retrieved parts meet the dimensional, material, and tolerance requirements in the query?
Compliance alignment — If the query mentions a standard (ASME B16.5, API 6A), do the results comply?
Availability signal — Parts appearing in recent BOMs from the same firm receive a boost, as they are likely in stock or on approved vendor lists.

This re-ranking involves an LLM call, which adds latency. For interactive queries, we display initial results immediately and refine them in the background.

Production Pitfalls

PDF parsing is the hardest hurdle. It is not the AI or the embeddings that struggle, but extracting clean text from engineering PDFs. Scanned documents require OCR. CAD drawings embedded in PDFs need special handling. Multi-column layouts break naive parsers. We spent more time on document ingestion than on the entire RAG pipeline.

Version control is critical. Engineering specs get revised. Rev B supersedes Rev A. If both reside in the index, the agent might source from an obsolete spec. We tag documents with revision metadata and default to the latest version unless the query specifies otherwise.

Confidence thresholds block bad recommendations. If semantic similarity drops below a threshold, the agent states "I found related but not exact matches" rather than guessing. In parts sourcing, a confident wrong answer is far worse than no answer.

Results

For firms utilizing the system:

Part lookup time dropped from 15-30 minutes of manual searching to under 60 seconds
Cross-reference accuracy (correct part for a given spec requirement) exceeded 92% on validated test sets
Document ingestion handles PDFs, Excel BOMs, and CSV parts lists with automatic format detection

The current bottleneck isn't the AI — it is getting firms to digitize and upload their document libraries in the first place.

Stack

FastAPI — API layer
ChromaDB — Vector storage
Python — Document parsing, embedding pipeline
LLM (multi-provider) — Agentic query handling, re-ranking
Docker — Deployment

VectorData.solutions is a team project. Learn more about our approach at vectordata.solutions.