Appendix
Quick-reference tables for file types, keyword lists, environment variables,
database paths, ports, and troubleshooting FAQ.
File Types & Upload Limits
| Setting | Value |
| Allowed extensions | .pdf, .txt |
| Max upload size | 50 MB |
| Filename truncation | 50 characters (stem) |
| Unique suffix | 8-char hex UUID appended |
| TXT encoding detection | utf-8 → utf-16 → latin-1 → cp1252 |
| Nginx max body | 100M |
| URL ingestion | Stored as file_type = "url" (uses trafilatura + BeautifulSoup fallback) |
Product Specs additionally supports .json, .csv, and .xlsx (Excel) for bulk import.
Pricing supports .json and .csv.
Pricing Keywords
When any of these 13 keywords appear in a search query, the pricing database is
included in the search pipeline:
harga, price, berapa, biaya, cost, tarif, pricing,
promo, diskon, discount, budget, kisaran, sekitar
Price Range Trigger Words
| Pattern | Trigger Words | Behavior |
| Range | antara, sampai, s/d, hingga, to, dan, and | Sets both min and max price |
| Upper bound | budget, maksimal, maks, max, dibawah, di bawah, under, kurang dari | Sets max price only |
| Lower bound | minimal, min, diatas, di atas, above, lebih dari | Sets min price only |
| Approximate | sekitar, kisaran, around, kira-kira, approximately, approx | Sets ±20% range around value |
Indonesian Price Units
| Unit | Aliases | Multiplier |
| Miliar | miliar, milyar, billion | 1,000,000,000 |
| Juta | juta, jt, million | 1,000,000 |
| Ribu | ribu, rb, thousand | 1,000 |
Product Spec Keywords
When any of these 36 keywords/phrases appear in a search query, the product specs
database is included in the search pipeline:
spesifikasi, spec, fitur, feature, transmisi, transmission,
mesin, engine, cc, tenaga, hp, hybrid, electric, listrik,
bensin, petrol, 4wd, awd, fwd, sunroof, carplay, kamera,
kamera 360, 360 camera, wireless, kursi, kapasitas, seat,
cooling seat, heated seat, cruise control, lane assist,
android auto, keyless, push start, rear camera, electric seat,
punya, ada fitur, apakah ada
Stop Words
Stop words are filtered out in Search Insights analytics and in query cleaning
before database searches.
English Stop Words (~96 words)
the, a, an, and, or, but, in, on, at, to, for, of, with, by,
from, as, is, was, are, were, been, be, have, has, had, do, does,
did, will, would, could, should, may, might, can, this, that,
these, those, i, you, he, she, it, we, they, what, which, who,
when, where, why, how, my, your, his, her, its, our, their, me,
him, us, them, am, if, then, than, so, also, just, some, any,
all, each, every, both, few, more, most, other, such, no, nor,
not, only, own, same, too, very, much, many, about, above, after,
again, against, among, another, before, below, between, down,
during, into, off, once, out, over, through, under, until, up,
upon, within, without
Indonesian Stop Words (~65+ words)
yang, dan, untuk, atau, dalam, adalah, dengan, dari, pada, ke,
ini, itu, ada, juga, akan, saya, kamu, kami, mereka, dia, ia,
sudah, belum, masih, oleh, tersebut, dapat, bisa, harus, perlu,
seperti, karena, sebagai, antara, setiap, salah, satu, bagi,
agar, sekitar, hingga, namun, tetapi, bahwa, maka, bila, kalau,
jika, apakah, dimana, bagaimana, mengapa, kapan, siapa, apa,
mana, berapa, telah, sedang, lagi, pula, pun, saja, demi, mau,
biar, kok, sih, dong, lho, tapi, kalo, gak, nggak, engga, tak,
nah, yah, kan, deh, nih, tuh, nya, kah, lah, pasti, mungkin
Pricing & Spec Noise Words
Additional noise words are stripped from queries before searching the pricing
and product spec databases. These include common filler words like:
sekarang, terbaru, terkini, saat, ini, itu, yang, dan, atau,
untuk, dari, dengan, ada, apa, mana, bisa, beli, jual, cari,
mau, ingin, tolong, mohon, dong, ya, nih, deh, sih, kan, lah,
the, is, are, what, how, much, does, can, latest, current, now,
today, new, please, mobil, motor, kendaraan, vehicle, car, list,
daftar, info, informasi, detail, type, tipe, model, varian, variant
Environment Variables
All variables are loaded from a .env file in the backend/ directory.
LLM & AI Providers
| Variable | Default | Description |
AWS_REGION | us-east-1 | AWS region for Bedrock |
AWS_ACCESS_KEY_ID | None | AWS access key |
AWS_SECRET_ACCESS_KEY | None | AWS secret key |
BEDROCK_MODEL_ID | amazon.nova-lite-v1:0 | Bedrock model for LLM |
GOOGLE_API_KEY | None | Google API key for Gemini |
GEMINI_MODEL | gemini-2.0-flash-exp | Gemini model ID |
COHERE_API_KEY | None | Cohere API key for embeddings |
EMBED_MODEL | cohere.embed-multilingual-v3 | Embedding model name |
EMBED_DIMENSIONS | 1024 | Embedding vector dimensions |
Authentication
| Variable | Default | Description |
JWT_SECRET | change-this-secret-in-production-via-env | JWT signing secret (HS256) |
JWT_EXPIRATION_HOURS | 8 | Token expiry in hours |
API_KEY | empty string | Public API key (unset = no auth required) |
Database
| Variable | Default | Description |
MYSQL_HOST | localhost | MySQL server hostname |
MYSQL_USER | required | MySQL username |
MYSQL_PASSWORD | empty | MySQL password |
MYSQL_DB | kbaas_moladin | MySQL database name |
MYSQL_PORT | 3306 | MySQL port |
Search & Chunking
| Variable | Default | Description |
ACTIVITY_LOG_RETENTION_DAYS | 120 | Days to keep activity logs |
DEFAULT_CHUNK_SIZE | 500 | Default text chunk size |
DEFAULT_CHUNK_OVERLAP | 50 | Default chunk overlap |
RELEVANCE_THRESHOLD | 0.5 | Minimum relevance score |
LLM_TEMPERATURE | 0.7 | Default LLM temperature |
MAX_TOKENS | 2000 | Max LLM output tokens |
TOP_K | 50 | Top-K results for retrieval |
VECTOR_WEIGHT | 0.5 | Weight for vector search in hybrid |
BM25_WEIGHT | 0.5 | Weight for BM25 search in hybrid |
PRICING_WEIGHT | 0.25 | Weight for pricing results in fusion |
Default Port Numbers
| Service | Port | Notes |
| Backend (FastAPI) | 8000 | Uvicorn default |
| Frontend (dev) | 3000 | Python HTTP server |
| Docs Site | 8001 | Separate FastAPI instance |
| MySQL | 3306 | Standard MySQL port |
Database File Locations
All paths are relative to backend/data/:
| Database | Path | Engine |
| Knowledge Base | sqlite/knowledge.db | SQLite + FTS5 |
| Vector Store | chroma/ | ChromaDB (persistent) |
| Settings | sqlite/settings.db | SQLite |
| Pricing | sqlite/pricing.db | SQLite + FTS5 |
| Product Specs | sqlite/product_specs.db | SQLite + FTS5 |
| Admin Users | sqlite/users.db | SQLite |
| Query History | MySQL kbaas_moladin | MySQL |
| Activity Log | MySQL kbaas_moladin | MySQL |
| File Uploads | uploads/ | Filesystem |
Keyboard Shortcuts
| Shortcut | Location | Action |
| Enter | Debug Search input | Run debug search |
| Ctrl + Enter | Widget textarea | Submit search query |
Query Topic Categories
The query decomposer maps ~107 keywords to search-hint categories for improved retrieval.
Below are the category groupings:
| Category | Keywords (EN & ID) |
| Price / Cost | harga, price, murah, mahal, cheap, expensive, cost, biaya, tarif, rate, pricing, budget, afford |
| Specs | spesifikasi, spec, specifications, specs, spek |
| Features | fitur, feature, features, keunggulan, advantage, capability, fungsi, function |
| Pros / Cons | kelebihan, kekurangan, pros, cons, pro, con, disadvantage, drawback |
| Reviews | review, ulasan, rating, penilaian, feedback, testimoni, testimonial |
| Performance | performa, performance, kinerja, benchmark, speed, kecepatan |
| Dimensions | dimensi, dimension, ukuran, size, berat, weight, tinggi, height, lebar, width, panjang, length |
| Consumption | konsumsi, consumption, efisiensi, efficiency, bbm, fuel, daya, power, usage |
| Design | interior, exterior, desain, design, tampilan, appearance, warna, color, colour, style |
| Technical | mesin, engine, motor, processor, prosesor, cpu, gpu, ram, memori, memory, storage, penyimpanan, disk, ssd, baterai, battery, kamera, camera, layar, screen, display, sensor, chip, kapasitas, capacity |
| Safety | keamanan, safety, security, aman, safe, protection, perlindungan |
| Warranty | garansi, warranty, servis, service, maintenance, perawatan, support, dukungan |
| Quality | kualitas, quality, terbaik, best, terburuk, worst, reliable, reliability, durability, ketahanan |
| Availability | tersedia, available, availability, ketersediaan, stok, stock |
| Compatibility | kompatibel, compatible, compatibility, support |
| General | bagus, good, bad, jelek, worth, value, recommend, rekomendasi |
Security Configuration
| Mechanism | Details |
| CORS | All origins (*), all methods, all headers, credentials allowed |
| API Key | X-API-Key header for public endpoints; optional if not configured |
| JWT | HS256 in Authorization: Bearer header for admin endpoints |
| Password Hashing | bcrypt (72-byte limit) |
| Activity Logging | All POST/PUT/DELETE requests logged; login payloads redacted |
| Rate Limiting | None (must be enforced via reverse proxy) |
Troubleshooting FAQ
Search returns no results
- Check that documents have been uploaded and indexed
- Verify the relevance threshold isn't too high (default 0.5)
- Try lowering the similarity threshold in Settings
- Check ChromaDB is running (VectorDB Viewer → collection info)
- Use Debug Search to inspect raw scores
Pricing data not found in search
- Ensure your query contains a pricing keyword (e.g., "harga", "price")
- Verify pricing records exist in the Pricing Database page
- Check that VectorDB sync has been run after adding pricing data
- Use Debug Search to see if pricing pipeline was triggered
Cannot log in
- Default credentials:
admin / admin - Check that the backend is running on port 8000
- Clear
localStorage and try again - Check browser console for CORS errors
LLM response fails or times out
- Verify LLM provider credentials (AWS for Bedrock, Google API key for Gemini)
- Check Settings → LLM Provider is set correctly
- Try switching to a different provider
- Check backend logs for detailed error messages
- Ensure a
<div id="kbaas-widget"> element exists - Verify the script src path to
kbaas-widget.js - Check browser console for initialization errors
- Try calling
KBaaSWidget.init() manually in console
VectorDB shows 0 vectors
- Ensure Cohere API key is set in environment variables
- Upload documents and wait for processing to complete
- Run VectorDB Sync from the Pricing or Product Specs page
- Check backend logs for embedding errors