Knowledge Base

Upload documents, ingest web content, and manage your document library — the foundation of your AI-powered search.

Who can access this?

editor manager superadmin

Upload Files

Upload documents that will be automatically processed, chunked, and indexed for both semantic (vector) and keyword (BM25) search.

How to Upload

  1. Navigate to Knowledge Base from the sidebar.
  2. In the Upload File card, either drag and drop a file onto the upload area, or click Browse to select a file.
  3. The file will be uploaded, processed, chunked, and indexed automatically.
  4. A progress indicator shows the processing status.
  5. Once complete, the document appears in the Document Library below.
File upload area
Screenshot: Upload file card with drag-and-drop area and browse button

Supported File Types & Limits

PropertyValue
Supported formats.pdf, .txt
Maximum file size50 MB
Text encoding (TXT)UTF-8, UTF-16, Latin-1, CP1252 (auto-detected)
File Requirements
  • PDF files must contain extractable text (scanned images without OCR will result in empty content).
  • Only .pdf and .txt extensions are accepted.
  • Files larger than 50 MB will be rejected with an error message.

How Chunking Works

When a file is uploaded, the text content is split into smaller pieces called chunks. Each chunk is then:

flowchart LR A[ Uploaded File] --> B[Text Extraction] B --> C[Chunking Engine] C -->|"Chunk 1, Chunk 2, ..."| D[Cohere Embedding
1024-dim vectors] C -->|"Chunk 1, Chunk 2, ..."| E[FTS5 Indexing
BM25 keywords] D --> F[(ChromaDB)] E --> G[(SQLite)] style A fill:#3B82F6,color:#fff style D fill:#8B5CF6,color:#fff style E fill:#F59E0B,color:#fff style F fill:#8B5CF6,color:#fff style G fill:#F59E0B,color:#fff
SettingDefaultRangeDescription
Chunk Size 500 characters 100 – 10,000 Number of characters per chunk. Larger chunks provide more context but may reduce precision.
Chunk Overlap 50 characters 0 – 5,000 Characters shared between consecutive chunks to prevent information loss at boundaries.
Tip: Chunk Settings

Chunk size and overlap are configured in Settings. Changing these values only affects newly uploaded documents. Use Re-index to re-chunk existing documents with new settings.

Ingest URLs

Scrape and ingest content from web pages. BABEH automatically extracts the main article content, strips navigation/ads, and processes it the same way as uploaded files.

How to Ingest URLs

  1. Navigate to Knowledge Base from the sidebar.
  2. In the Ingest URLs card, paste one or more URLs into the text area — one URL per line.
  3. Click Ingest. Each URL is processed sequentially with a progress bar.
  4. Successfully ingested pages appear in the Document Library with type "URL".
URL ingestion
Screenshot: Ingest URLs card with multi-line textarea and progress bar
Web Scraping Engine

BABEH uses trafilatura as the primary content extractor with a BeautifulSoup4 fallback. It intelligently extracts the main article content while filtering out navigation bars, footers, ads, and other boilerplate.

Document Library

The Document Library is a searchable, sortable table of all uploaded files and ingested URLs.

Document Library
Screenshot: Document Library table with search, filters, and action buttons

Table Fields

ColumnDescription
FilenameOriginal filename or URL title
TypeFile type badge: PDF, TXT, or URL
ChunksNumber of text chunks created from this document
SizeOriginal file size (in KB or MB)
Upload DateWhen the document was uploaded or ingested
ActionsEdit, Re-index, and Delete buttons

Filtering & Sorting

Document Actions

Edit

Opens the document in an edit modal where you can:

Re-index

Re-processes the document using the current chunk settings. Useful when you've changed chunk size or overlap in Settings and want existing documents to use the new values.

Stale Chunk Warning

When your current chunk settings differ from those used when a document was ingested, a yellow warning banner appears suggesting you re-index. This ensures all documents use consistent chunking for optimal search quality.

Delete

Permanently removes the document from all stores:

Irreversible Action

Deleting a document cannot be undone. You will need to re-upload the original file if needed.

KBC Content Improvement (AI)

The KBC Content Improvement feature uses AI to analyze your document content and suggest rewrites that improve search retrieval quality.

How to Use

  1. Open a document via the Edit button.
  2. In the edit modal, find the KBC Improvement panel.
  3. Click "Generate Suggestion" — the AI analyzes the content and proposes improvements.
  4. Review the suggestion. Click "Use Suggestion" to apply it to the editor, or "Use & Save" to apply and save immediately.
sequenceDiagram actor User participant Editor as Edit Modal participant LLM as AI (LLM) participant DB as Database User->>Editor: Click "Generate Suggestion" Editor->>LLM: Send current content
(temperature: 0.3) LLM-->>Editor: Return improved content Editor->>User: Display suggestion preview User->>Editor: Click "Use & Save" Editor->>DB: Save updated content DB-->>Editor: Re-chunk + Re-embed Editor->>User: Yes Document updated
How It Helps

The AI rewrites content to be more structured, keyword-rich, and retrieval-friendly — meaning your search engine will find more relevant results. It uses a low temperature (0.3) to stay faithful to the original content while improving clarity.

Collection Info

Below the Document Library, a Collection Info section shows VectorDB statistics: