Skip to content

Speech Recognition (ASR)

Transcribe spoken audio to text using transcribe() or atranscribe().

Basic usage

from khaya import KhayaClient

with KhayaClient(api_key) as khaya:
    result = khaya.transcribe("recording.wav", "tw")
    print(result.text)  # "me ho yɛ"

The second argument is the language code of the spoken language in the audio.

transcribe() returns a TranscriptionResult with:

Attribute Type Description
text str The transcribed string
language str Language code of the audio (e.g. "tw")

Supported languages

Code Language
ada Adangme
en_gh African English
atw Akuapem Twi
tw Asante Twi
dga Dagaare
dag Dagbani
ee Ewe
fat Fante
fra French
gaa Ga
gon Gonja
gur Gurene
ha Hausa
ig Igbo
kas Kasem
ki Kikuyu
kon_k Konkomba (Likoonli)
kon_l Konkomba (Likpakpaanl)
kri Krio
kus Kusaal
luo Luo
mam Mampruli
men Mende
mer Meru/Kimeru
nzi Nzema
pid Pidgin
sn Shona
sw Swahili
tem Temne
wal Wali
wo Wolof
yo Yoruba

Audio requirements

  • Format: WAV (.wav)
  • Encoding: PCM (uncompressed)
  • Sample rate: 16 kHz recommended
  • Channels: Mono

Convert to the correct format with ffmpeg if needed:

ffmpeg -i input.mp3 -ar 16000 -ac 1 output.wav

Saving the transcript

with KhayaClient(api_key) as khaya:
    result = khaya.transcribe("speech.wav", "tw")
    with open("transcript.txt", "w") as f:
        f.write(result.text)

Error handling

from khaya.exceptions import ASRTranscriptionError, AuthenticationError, APIError

try:
    result = khaya.transcribe("speech.wav", "tw")
except ASRTranscriptionError as e:
    # Raised when the file is not found or input is invalid
    print(f"Transcription error: {e.message}")
except AuthenticationError:
    print("Check your API key.")
except APIError as e:
    print(f"API error {e.status_code}: {e.message}")

See Error Handling for the full exception reference.