Speech Recognition (ASR)¶
Transcribe spoken audio to text using transcribe() or atranscribe().
Basic usage¶
from khaya import KhayaClient
with KhayaClient(api_key) as khaya:
result = khaya.transcribe("recording.wav", "tw")
print(result.text) # "me ho yɛ"
The second argument is the language code of the spoken language in the audio.
transcribe() returns a TranscriptionResult with:
| Attribute | Type | Description |
|---|---|---|
text |
str |
The transcribed string |
language |
str |
Language code of the audio (e.g. "tw") |
Supported languages¶
| Code | Language |
|---|---|
ada |
Adangme |
en_gh |
African English |
atw |
Akuapem Twi |
tw |
Asante Twi |
dga |
Dagaare |
dag |
Dagbani |
ee |
Ewe |
fat |
Fante |
fra |
French |
gaa |
Ga |
gon |
Gonja |
gur |
Gurene |
ha |
Hausa |
ig |
Igbo |
kas |
Kasem |
ki |
Kikuyu |
kon_k |
Konkomba (Likoonli) |
kon_l |
Konkomba (Likpakpaanl) |
kri |
Krio |
kus |
Kusaal |
luo |
Luo |
mam |
Mampruli |
men |
Mende |
mer |
Meru/Kimeru |
nzi |
Nzema |
pid |
Pidgin |
sn |
Shona |
sw |
Swahili |
tem |
Temne |
wal |
Wali |
wo |
Wolof |
yo |
Yoruba |
Audio requirements¶
- Format: WAV (
.wav) - Encoding: PCM (uncompressed)
- Sample rate: 16 kHz recommended
- Channels: Mono
Convert to the correct format with ffmpeg if needed:
Saving the transcript¶
with KhayaClient(api_key) as khaya:
result = khaya.transcribe("speech.wav", "tw")
with open("transcript.txt", "w") as f:
f.write(result.text)
Error handling¶
from khaya.exceptions import ASRTranscriptionError, AuthenticationError, APIError
try:
result = khaya.transcribe("speech.wav", "tw")
except ASRTranscriptionError as e:
# Raised when the file is not found or input is invalid
print(f"Transcription error: {e.message}")
except AuthenticationError:
print("Check your API key.")
except APIError as e:
print(f"API error {e.status_code}: {e.message}")
See Error Handling for the full exception reference.