Full-Text Search with TF-IDF Ranking

OxiDB includes a built-in full-text search engine with TF-IDF ranking. No need for a separate search service like Elasticsearch — search is part of the database.

Creating a Text Index

Specify which fields to index for text search:

from oxidb import OxiDbClient
db = OxiDbClient("127.0.0.1", 4444)

# Create a text index on title and content
db.create_text_index("articles", ["title", "body"])

# Insert some articles
db.insert("articles", {
    "title": "Introduction to Rust Programming",
    "body": "Rust is a systems programming language focused on safety and performance..."
})
db.insert("articles", {
    "title": "Building Web Services in Go",
    "body": "Go excels at building concurrent web services with its goroutine model..."
})

Searching

Search with natural language queries. Results are ranked by TF-IDF relevance:

# Basic text search
results = db.text_search("articles", "rust programming", limit=10)

for doc in results:
    print(f"{doc['title']} (score: {doc.get('_score', 0):.2f})")

# Output:
# Introduction to Rust Programming (score: 0.85)
# Building Web Services in Go (score: 0.12)

How TF-IDF Works

TF-IDF (Term Frequency — Inverse Document Frequency) ranks results based on:

TF (Term Frequency) — how often the search term appears in a document. More occurrences = higher relevance.
IDF (Inverse Document Frequency) — how rare the term is across all documents. Rare terms are weighted higher than common ones.

The score is TF × IDF. A document that uses a rare, specific term frequently will rank highest.

Document Parsing

OxiDB's FTS engine can extract and index text from various document formats stored in blob storage:

Format	Support
Plain text	Built-in
HTML / XML	Built-in (strips tags, extracts text)
JSON	Built-in (extracts string values)
PDF	Built-in
DOCX (Word)	Built-in
XLSX (Excel)	Built-in
Images (OCR)	Requires `ocr` feature flag

Background Indexing

Full-text indexing runs in a background worker thread that receives indexing jobs via a bounded channel (sync_channel(256)). This means:

Insert/update operations return immediately
Text indexing happens asynchronously without blocking writes
The index is persisted as _fts/index.json in the data directory

For document management applications where you need to search across PDFs, Word documents, and scanned images, OxiDB's built-in FTS eliminates the need for a separate search infrastructure.