Skip to main content
SAFIA understands more than just text. You can record a quick voice note to capture an expense hands-free, or snap a photo of a receipt and let SAFIA read it for you. Both features count as regular messages, so you never have to retype information that’s already in front of you.

Voice messages

When you send a voice message, SAFIA transcribes the audio using Whisper (via Groq) and then processes the transcript exactly as if you had typed it. The response always comes back as text.
1

Record and send

Hold the microphone button in Telegram and speak your message naturally. SAFIA accepts any length, but shorter messages tend to transcribe more accurately.
🎙 “I just paid 120 thousand for a taxi to the airport”
2

SAFIA downloads the file

SAFIA shows Listening… while it retrieves the audio file from Telegram’s servers. The file is saved temporarily on disk.
3

Transcription runs

The audio is sent to Whisper via Groq for speech-to-text. SAFIA’s status message updates to Thinking… once transcription completes. The local file is deleted immediately after.
4

Transcript is processed

The transcribed text enters SAFIA’s conversation pipeline the same way a typed message would. SAFIA calls any relevant tools — for example, recording the expense — and replies in text.
Noted! I’ve recorded Rp 120.000 for Transportasi 🚕. Your transport spending this week is now Rp 340.000.

Requirements

Voice transcription requires a Groq API key to be configured by the bot administrator. If voice messages are not being processed, contact the person who set up the bot to verify the transcription service is enabled.

Tips for clear transcription

Speak the amounts clearly

Say “seratus dua puluh ribu” or “one hundred twenty thousand” — Whisper handles both Indonesian and English well.

Mention the category

“Taxi fare” or “lunch at the canteen” gives SAFIA enough context to pick the right expense category automatically.

Keep it focused

One topic per voice note produces the cleanest results. If you want to log three separate expenses, three short notes beat one long ramble.

Avoid heavy background noise

Transcription accuracy drops significantly in loud environments like traffic or crowded restaurants.

What SAFIA cannot do with voice

  • Identify speakers — if multiple people speak in one recording, the result may be jumbled.
  • Play audio back — SAFIA always responds in text, never in voice.
  • Process non-financial speech — SAFIA will still reply, but it’s optimised for financial tasks and may not be the right tool for general conversation.

Photo scanning

Send a photo of a receipt, invoice, or payslip and SAFIA will extract the key financial details and record the transaction for you — no typing required.
1

Take or choose a photo

Open Telegram’s attachment menu and send a photo of your document. A direct camera shot works best; screenshots of digital receipts are also fine.📸 [Photo of a café receipt — subtotal Rp 75.000, discount voucher –Rp 10.000, total Rp 65.000]
2

SAFIA scans the document

SAFIA shows Scanning document… while it sends the image to a vision model. The model reads the text, identifies amounts, and calculates the correct final figure.For receipts with discounts or vouchers, SAFIA uses the net amount you actually paid — not the pre-discount subtotal. For payslips, it uses your take-home (net) salary, not the gross figure.
3

Extracted data enters the conversation

The structured text from the scan is injected into the conversation as if you had typed it. SAFIA then records the transaction and summarises what it found.
I’ve recorded an expense of Rp 65.000 for Makanan ☕. (Discount of Rp 10.000 applied.)

What makes a good photo

Good lighting

Shoot in daylight or a well-lit room. Shadows across printed text are the most common cause of extraction errors.

Flat and straight

Lay the receipt on a flat surface and shoot from directly above. Angled shots cause the vision model to misread digits.

Full receipt in frame

Make sure the total line at the bottom is visible. Cropped receipts may cause SAFIA to use a subtotal instead of the final amount.

Avoid glare

Shiny thermal paper (the kind used by most point-of-sale printers) reflects light. Tilt the paper slightly or use diffused light.

Supported document types

DocumentWhat SAFIA extracts
Supermarket / café receiptItems, discounts, final total
Restaurant billTotal charged, any service fee
Online-shopping invoiceOrder total after vouchers
Payslip / salary slipNet take-home amount, pay period
Utility or phone billAmount due

What SAFIA cannot process

SAFIA’s photo scanner is built exclusively for financial documents. If you send a non-financial image — a selfie, a landscape, a meme — SAFIA will respond with:
Couldn’t read a document from this photo. Send a clear photo of an invoice, payslip, or receipt.
It will not describe or comment on non-financial images.
  • Very blurry or dark photos — the vision model needs legible text. Re-take the photo in better conditions.
  • Handwritten totals with unclear handwriting — printed text extracts reliably; messy handwriting may be misread.
  • Multi-page documents sent as a single photo — send each page separately if the relevant total is on a different page.
After SAFIA records an expense from a photo, you can immediately follow up with a text message to correct any detail — for example, “actually that was for work travel, change the category to Transport.” SAFIA will update the record in the same conversation turn.