Skip to main content
The Knowledge Base lets administrators upload internal documents — company financial policies, investment guides, product FAQs, or any reference material — and have SAFIA answer user questions using the content of those files. When a user asks something covered by an uploaded document, SAFIA searches the knowledge base first and grounds its reply in the actual text, citing the source naturally in the response. This is particularly useful for organisations deploying SAFIA to their employees or customers: you can upload your own investment policy, expense reimbursement guide, or financial product documentation once, and SAFIA will answer questions about it accurately without requiring users to read the files themselves.

Supported File Formats

PDF

Standard PDF documents. Text is extracted automatically — scanned image-only PDFs without embedded text are not supported.

TXT

Plain text files. Any .txt file with readable content is fully supported.

DOCX

Microsoft Word documents (.docx format). Legacy .doc files are not supported.

Uploading Documents

Only administrators can upload documents to the knowledge base. Uploads are managed through the Admin Dashboard — a local web interface served at http://127.0.0.1:5454 (available after running safia start).
1

Open the Admin Dashboard

Navigate to http://127.0.0.1:5454 in your browser and log in with your admin credentials (ADMIN_USERNAME and ADMIN_PASSWORD from your .env file).
2

Go to Knowledge Base

Select the Knowledge Base section from the admin sidebar.
3

Upload a file

Click Upload Document, choose your PDF, TXT, or DOCX file, optionally add a display title, and confirm. SAFIA will extract the text, split it into chunks, generate embeddings locally, and store everything in Qdrant.
The maximum upload size defaults to 200 MB and is controlled by the KB_MAX_UPLOAD_MB environment variable in your .env file.
4

Confirm indexing

Once upload completes, the dashboard shows the document status as ready and displays the number of chunks indexed. The document is immediately searchable by users.

How SAFIA Answers Questions from Documents

When a user asks a question, SAFIA automatically checks the knowledge base to see if any uploaded document is relevant before answering. If a match is found, SAFIA uses that content to ground its reply.
You:    Berapa batas pengeluaran entertainment yang bisa di-reimburse?

SAFIA:  Berdasarkan kebijakan perusahaan (Panduan Keuangan Karyawan 2024),
        batas reimburse hiburan klien adalah Rp 1.500.000 per acara,
        dengan persetujuan manajer untuk pengeluaran di atas Rp 750.000.
        Bukti transaksi wajib dilampirkan dalam 7 hari kerja.
You:    What does our investment policy say about crypto allocation?

SAFIA:  According to the Investment Policy Statement (uploaded March 2024),
        the maximum allocation to crypto assets is capped at 10% of total
        investable assets. Direct purchases of individual tokens are
        permitted only through OJK-registered exchanges.
SAFIA cites the document source naturally in its reply — users always know when an answer comes from an uploaded file versus SAFIA’s general knowledge.
The knowledge base is designed for internal documents and policies, not live market data. For real-time prices, news, or portfolio values, SAFIA uses its dedicated market data tools instead.

How Documents Are Processed

Understanding how SAFIA processes your files helps you get the best retrieval quality.
When you upload a file, SAFIA extracts all readable text from the document. For PDFs, this requires the PDF to have embedded text (not just scanned images). DOCX files are parsed directly from the Word XML structure. TXT files are read as-is.
The extracted text is split into overlapping chunks for search. Default settings:
SettingDefaultEnvironment variable
Chunk size450 wordsKB_CHUNK_WORDS
Chunk overlap70 wordsKB_CHUNK_OVERLAP_WORDS
The 70-word overlap ensures that sentences spanning chunk boundaries are still found by search queries. You can tune both values in your .env file.
Each chunk is converted into a vector embedding using a local ONNX model (paraphrase-multilingual-MiniLM-L12-v2, ~120 MB) that runs entirely on your server’s CPU. No text is sent to an external embedding API by default. To switch to a remote embedding provider, set EMBEDDING_LOCAL=false and configure EMBEDDING_BASE_URL, EMBEDDING_API_KEY, and EMBEDDING_MODEL in your .env.
Embeddings are stored in Qdrant, a local on-disk vector database (data/qdrant/). No separate Qdrant server or Docker container is required by default. When a user asks a question, SAFIA embeds the query and performs a vector similarity search to find the most relevant chunks, then passes them to the LLM as context for a grounded reply.To use a remote Qdrant instance, set QDRANT_URL in your .env.

Configuration Reference

VariableDefaultDescription
KB_MAX_UPLOAD_MB200Maximum file size allowed for uploads (MB)
KB_CHUNK_WORDS450Words per document chunk
KB_CHUNK_OVERLAP_WORDS70Overlap words between consecutive chunks
KB_UPLOAD_DIR(set in config)Directory where uploaded files are stored
QDRANT_URLLocal on-diskRemote Qdrant URL (optional)
EMBEDDING_LOCALtrueSet to false to use a remote embedding API

Limitations

Keep the following constraints in mind when using the knowledge base:
  • Scanned PDFs (image-only, no embedded text) cannot be processed. Use OCR software to convert them to searchable PDFs first.
  • File size is limited by KB_MAX_UPLOAD_MB. Very large documents should be split before uploading.
  • Retrieval quality depends on chunking. If answers feel incomplete, try splitting large documents into smaller, topic-focused files.
  • The knowledge base answers questions from static uploaded content only — it cannot browse the web or access live market data.
  • Only administrators can add or remove documents. End users interact with the knowledge base only through chat questions.