Telugu voice AI (prepaid, per-second billing)
A Telugu-first voice assistant over a phone call, metered and billed by the second from a prepaid balance.
- Python
- FastAPI
- Telephony (Exotel)
- STT/TTS
- Anthropic API
- Postgres
Telugu lo nerchuko · Watch in Telugu
This is the most ambitious project in the track: a voice assistant a caller talks to in Telugu over a normal phone call, where every second of the call is metered against a prepaid balance. It combines telephony, speech-to-text, an LLM, text-to-speech, and a billing ledger that must never let a caller spend money they do not have.
Overview
A caller dials a number, speaks in Telugu, and hears a spoken reply. Behind the call, audio is transcribed to text, the text goes to an LLM, the reply is synthesised back to Telugu speech, and the whole exchange is metered per second against the caller's prepaid wallet. When the balance runs out, the call ends cleanly. The hard parts are latency, language handling, and getting the billing exactly right.
The problem
Many users are more comfortable speaking Telugu than typing English, and prefer a phone call to an app. Per-minute postpaid billing surprises people with bills; prepaid per-second billing is fair and predictable, but it demands a ledger that meters live and cuts off precisely at zero. Overrun means you lose money on every call; cutting off early means you cheat the customer.
Architecture
Telephony bridges the call audio to your server. A pipeline transcribes, reasons, and synthesises, while a billing loop debits the wallet each second and ends the call when it hits zero.
Caller <--> Telephony (Exotel) <--> media bridge (FastAPI + websocket)
|
STT (Telugu) --> LLM --> TTS (Telugu)
|
per-second billing loop --> Postgres wallet ledger
|
balance <= 0 --> end callTech stack
- Python + FastAPI for the media/control server
- A telephony provider (Exotel or similar) for the phone leg
- Telugu speech-to-text and text-to-speech (a provider such as Sarvam)
- Anthropic API for the conversation
- Postgres for the wallet ledger and call records
Build steps
- Model the wallet as an append-only ledger, never a single mutable balance.
CREATE TABLE wallet_entries (
id BIGSERIAL PRIMARY KEY,
user_id TEXT NOT NULL,
delta_paise INTEGER NOT NULL, -- positive = top-up, negative = usage
reason TEXT NOT NULL,
call_id TEXT,
created_at TIMESTAMPTZ DEFAULT now()
);
-- Balance is always the sum, computed, not stored:
-- SELECT COALESCE(SUM(delta_paise), 0) FROM wallet_entries WHERE user_id = $1;- Gate the call at answer time. No balance, no call.
PER_SECOND_PAISE = 14 # set from your cost + margin
async def can_start_call(user_id: str) -> bool:
balance = await wallet_balance(user_id)
return balance >= PER_SECOND_PAISE * 10 # require at least ~10 seconds- Run the billing loop alongside the conversation, debiting each second.
import asyncio
async def billing_loop(user_id: str, call_id: str, hangup):
while True:
await asyncio.sleep(1)
await debit(user_id, PER_SECOND_PAISE, reason="voice_second", call_id=call_id)
if await wallet_balance(user_id) < PER_SECOND_PAISE:
await speak_telugu("మీ బ్యాలెన్స్ అయిపోయింది. కాల్ ముగుస్తోంది.")
await hangup()
return- Wire the speech pipeline for one turn.
async def handle_turn(audio_chunk: bytes) -> bytes:
text_te = await stt_telugu(audio_chunk) # speech -> Telugu text
reply = client.messages.create( # reason
model="claude-sonnet-4-6", max_tokens=300,
system="You are a helpful Telugu voice assistant. Keep replies short and spoken.",
messages=[{"role": "user", "content": text_te}],
).content[0].text
return await tts_telugu(reply) # text -> Telugu speechSource code
Reference scaffold with the ledger, billing loop, and a mock telephony harness for local testing: AnythingWithSandy/telugu-voice-ai.
Demo
No public demo — a live demo needs a provisioned phone number and KYC-approved telephony and speech accounts. The repo ships a local simulator that runs the full pipeline over text and fake audio so you can test billing without a phone line.
Deployment
Provision the telephony number and point its voice webhook at your FastAPI server. Keep all provider keys server-side. Run the speech and LLM calls close to your users to cut latency, and load-test the billing loop to confirm it debits accurately under concurrency and ends calls exactly at zero balance.
Bonus challenge
Add barge-in (let the caller interrupt the assistant mid-sentence), a daily reconciliation job that checks ledger sums against provider invoices, and a fraud guard that flags abnormal call patterns. Then add an SMS top-up flow so callers can recharge mid-conversation.