OxiDB includes a built-in full-text search engine with TF-IDF ranking. No need for a separate search service like Elasticsearch — search is part of the database.
Creating a Text Index
Specify which fields to index for text search:
from oxidb import OxiDbClient
db = OxiDbClient("127.0.0.1", 4444)
# Create a text index on title and content
db.create_text_index("articles", ["title", "body"])
# Insert some articles
db.insert("articles", {
"title": "Introduction to Rust Programming",
"body": "Rust is a systems programming language focused on safety and performance..."
})
db.insert("articles", {
"title": "Building Web Services in Go",
"body": "Go excels at building concurrent web services with its goroutine model..."
})
Searching
Search with natural language queries. Results are ranked by TF-IDF relevance:
# Basic text search
results = db.text_search("articles", "rust programming", limit=10)
for doc in results:
print(f"{doc['title']} (score: {doc.get('_score', 0):.2f})")
# Output:
# Introduction to Rust Programming (score: 0.85)
# Building Web Services in Go (score: 0.12)
How TF-IDF Works
TF-IDF (Term Frequency — Inverse Document Frequency) ranks results based on:
- TF (Term Frequency) — how often the search term appears in a document. More occurrences = higher relevance.
- IDF (Inverse Document Frequency) — how rare the term is across all documents. Rare terms are weighted higher than common ones.
The score is TF × IDF. A document that uses a rare, specific term frequently will rank highest.
Document Parsing
OxiDB's FTS engine can extract and index text from various document formats stored in blob storage:
| Format | Support |
|---|---|
| Plain text | Built-in |
| HTML / XML | Built-in (strips tags, extracts text) |
| JSON | Built-in (extracts string values) |
| Built-in | |
| DOCX (Word) | Built-in |
| XLSX (Excel) | Built-in |
| Images (OCR) | Requires ocr feature flag |
Background Indexing
Full-text indexing runs in a background worker thread that receives indexing jobs via a bounded channel (sync_channel(256)). This means:
- Insert/update operations return immediately
- Text indexing happens asynchronously without blocking writes
- The index is persisted as
_fts/index.jsonin the data directory
For document management applications where you need to search across PDFs, Word documents, and scanned images, OxiDB's built-in FTS eliminates the need for a separate search infrastructure.
Discussion 0
No comments yet. Start the conversation.