Supported formats and limits
| Format | Extension |
|---|---|
.pdf | |
| Plain text | .txt |
| Word document | .docx |
KB_MAX_UPLOAD_MB setting in your .env (default: 200 MB). You can upload multiple files in a single operation.
Uploading documents
Choose your files
Under Upload Documents, click the file input and select one or more
.pdf, .txt, or .docx files. You can select multiple files at once.Add an optional title
If you are uploading a single file and want to give it a human-readable label (e.g.
Internal privacy policy), enter it in the Title field. For multi-file uploads the title field is ignored — each file is indexed under its own filename.How documents are processed
Every uploaded document goes through a four-stage pipeline before it is searchable:Parse
The file is read and converted to plain text. PDFs are parsed page by page; DOCX files are read paragraph by paragraph; TXT files are loaded as-is.
Chunk
The text is split into overlapping chunks of 450 words with a 70-word overlap between consecutive chunks. Overlap ensures that sentences spanning a chunk boundary are still retrieved correctly.
Embed
Each chunk is converted to a dense vector using the
paraphrase-multilingual-MiniLM-L12-v2 model running locally via ONNX Runtime. No text is sent to a remote API unless you have configured remote embeddings in Settings.Chunking parameters (
KB_CHUNK_WORDS, KB_CHUNK_OVERLAP_WORDS) and embedding settings are configurable via Settings → Knowledge Base and Vector & Embeddings sections.Asking SAFIA about uploaded content
After a document is indexed, simply ask SAFIA about it in your Telegram chat — no special command is needed:“What does the privacy policy say about data retention?” “Summarize the investment strategy document I uploaded.”SAFIA automatically searches the knowledge base when your question appears to relate to a document topic, retrieves the most relevant chunks, and uses them to ground its answer.
Viewing indexed documents
The Indexed Documents table on the Knowledge page shows every document currently in the knowledge base:| Column | Description |
|---|---|
| Title | Optional label you provided at upload time, or the filename |
| File | The original filename |
| Chunks | The number of vector chunks stored for this document |
| Uploaded | Timestamp when the document was indexed |
Deleting a document
To remove a document, click Delete in its row. SAFIA removes all of its stored vectors and deletes its metadata record from the database. The document is no longer searchable after deletion.Troubleshooting
The first upload after starting the dashboard is slow
The first upload after starting the dashboard is slow
When the admin dashboard starts, it warms up the local ONNX embedding model (~120 MB). This one-time load can take 10–30 seconds on first run or after a cold start. Subsequent uploads are significantly faster because the model stays in memory.
A large PDF times out or fails partway through
A large PDF times out or fails partway through
Very large files require more memory and processing time. Try these steps:
- Check
KB_MAX_UPLOAD_MBin Settings — increase it if your file exceeds the current limit. - Split large PDFs into smaller files (under 50 MB each) and upload them separately.
- Reduce
KB_EMBED_BATCH_SIZE(default:32) in Settings → Knowledge Base if you are on a memory-constrained machine.
SAFIA does not seem to use the uploaded document
SAFIA does not seem to use the uploaded document
- Confirm the document appears in the Indexed Documents table with a non-zero Chunks count.
- Make sure your question is phrased in a way that matches the document’s content. Try quoting a specific phrase from the document.
- If you recently uploaded the file, wait a few seconds for indexing to complete before querying.
Uploaded a file but it shows 0 chunks
Uploaded a file but it shows 0 chunks
A chunk count of zero means the parser could not extract any text. This is common with:
- Scanned PDFs (image-only, no text layer) — SAFIA cannot OCR images.
- Password-protected PDFs — remove the password before uploading.
- Empty or corrupt files — verify the file opens correctly on your machine.