Skip to main content
SAFIA’s knowledge base (KB) lets you feed your own documents into the assistant so it can answer questions about their content. Once a document is indexed, you can ask SAFIA about it in plain conversation — just as you would ask any other question — and it retrieves the most relevant passages before composing a response. Documents are processed and stored entirely on your machine: no content leaves your server.

Supported formats and limits

FormatExtension
PDF.pdf
Plain text.txt
Word document.docx
The maximum file size per upload is controlled by the KB_MAX_UPLOAD_MB setting in your .env (default: 200 MB). You can upload multiple files in a single operation.

Uploading documents

1

Open the Knowledge Base page

In the dashboard sidebar, click Knowledge to navigate to /knowledge.
2

Choose your files

Under Upload Documents, click the file input and select one or more .pdf, .txt, or .docx files. You can select multiple files at once.
3

Add an optional title

If you are uploading a single file and want to give it a human-readable label (e.g. Internal privacy policy), enter it in the Title field. For multi-file uploads the title field is ignored — each file is indexed under its own filename.
4

Submit

Click Index into Qdrant. SAFIA processes each file in sequence and shows a status message for each one. When all files finish, a summary banner shows how many succeeded and how many failed.

How documents are processed

Every uploaded document goes through a four-stage pipeline before it is searchable:
1

Parse

The file is read and converted to plain text. PDFs are parsed page by page; DOCX files are read paragraph by paragraph; TXT files are loaded as-is.
2

Chunk

The text is split into overlapping chunks of 450 words with a 70-word overlap between consecutive chunks. Overlap ensures that sentences spanning a chunk boundary are still retrieved correctly.
3

Embed

Each chunk is converted to a dense vector using the paraphrase-multilingual-MiniLM-L12-v2 model running locally via ONNX Runtime. No text is sent to a remote API unless you have configured remote embeddings in Settings.
4

Store

The vectors are written to your local vector store and a metadata record is saved to the database, so the document appears in the indexed list and is immediately searchable.
Chunking parameters (KB_CHUNK_WORDS, KB_CHUNK_OVERLAP_WORDS) and embedding settings are configurable via SettingsKnowledge Base and Vector & Embeddings sections.

Asking SAFIA about uploaded content

After a document is indexed, simply ask SAFIA about it in your Telegram chat — no special command is needed:
“What does the privacy policy say about data retention?” “Summarize the investment strategy document I uploaded.”
SAFIA automatically searches the knowledge base when your question appears to relate to a document topic, retrieves the most relevant chunks, and uses them to ground its answer.

Viewing indexed documents

The Indexed Documents table on the Knowledge page shows every document currently in the knowledge base:
ColumnDescription
TitleOptional label you provided at upload time, or the filename
FileThe original filename
ChunksThe number of vector chunks stored for this document
UploadedTimestamp when the document was indexed

Deleting a document

To remove a document, click Delete in its row. SAFIA removes all of its stored vectors and deletes its metadata record from the database. The document is no longer searchable after deletion.
Deletion is permanent. There is no undo — you will need to re-upload the file if you want to restore it.

Troubleshooting

When the admin dashboard starts, it warms up the local ONNX embedding model (~120 MB). This one-time load can take 10–30 seconds on first run or after a cold start. Subsequent uploads are significantly faster because the model stays in memory.
Very large files require more memory and processing time. Try these steps:
  1. Check KB_MAX_UPLOAD_MB in Settings — increase it if your file exceeds the current limit.
  2. Split large PDFs into smaller files (under 50 MB each) and upload them separately.
  3. Reduce KB_EMBED_BATCH_SIZE (default: 32) in SettingsKnowledge Base if you are on a memory-constrained machine.
  • Confirm the document appears in the Indexed Documents table with a non-zero Chunks count.
  • Make sure your question is phrased in a way that matches the document’s content. Try quoting a specific phrase from the document.
  • If you recently uploaded the file, wait a few seconds for indexing to complete before querying.
A chunk count of zero means the parser could not extract any text. This is common with:
  • Scanned PDFs (image-only, no text layer) — SAFIA cannot OCR images.
  • Password-protected PDFs — remove the password before uploading.
  • Empty or corrupt files — verify the file opens correctly on your machine.