Text-to-Speech (TTS)¶
Convert text to spoken audio using synthesize() or asynthesize().
Basic usage¶
from khaya import KhayaClient
with KhayaClient(api_key) as khaya:
result = khaya.synthesize("Maakye", "twi")
result.save("output.wav")
Or access the raw bytes directly via result.audio:
with KhayaClient(api_key) as khaya:
result = khaya.synthesize("Maakye", "twi")
with open("output.wav", "wb") as f:
f.write(result.audio)
synthesize() returns a SynthesisResult with:
| Attribute | Type | Description |
|---|---|---|
audio |
bytes |
Raw audio bytes |
save(path) |
method | Write audio to a file |
Note
TTS language codes differ from ASR codes for the same language.
For example, Asante Twi is "tw" in ASR but "twi" in TTS.
Supported languages¶
| Code | Language |
|---|---|
ada |
Adangme |
atw |
Akuapem Twi |
twi |
Asante Twi |
dag |
Dagbani |
dga |
Dagaare |
ewe |
Ewe |
fat |
Fante |
fra |
French |
gaa |
Ga |
gjn |
Gonja |
gur |
Gurene |
hau |
Hausa |
ibo |
Igbo |
xsm |
Kasem |
kik |
Kikuyu |
xon |
Konkomba (Likpakpaanl) |
lxn |
Konkomba (Likoonli) |
kri |
Krio |
kus |
Kusaal |
luo |
Luo |
maw |
Mampruli |
men |
Mende |
mer |
Meru/Kimeru |
nzi |
Nzema |
pcm |
Pidgin |
sna |
Shona |
swa |
Swahili |
tem |
Temne |
wlx |
Wali |
wol |
Wolof |
yor |
Yoruba |
Speaker voices¶
All languages share the same multilingual speaker pool. Pass a speaker to control the voice:
| Speaker | Description |
|---|---|
"male_low" |
Male, lower pitch |
"male_high" |
Male, higher pitch |
"female" |
Female |
with KhayaClient(api_key) as khaya:
result = khaya.synthesize("Maakye", "twi", speaker="female")
result.save("output.wav")
The speaker argument is optional — the API uses a default voice when omitted.
Playing audio directly¶
Use any audio library to play back without saving to disk:
# with sounddevice + soundfile
import io
import soundfile as sf
import sounddevice as sd
with KhayaClient(api_key) as khaya:
result = khaya.synthesize("Maakye", "twi")
data, samplerate = sf.read(io.BytesIO(result.audio))
sd.play(data, samplerate)
sd.wait()
Synthesizing longer text¶
The API has a per-request character limit. For longer content, split into sentences:
import re
def split_sentences(text: str) -> list[str]:
return [s.strip() for s in re.split(r"(?<=[.!?])\s+", text) if s.strip()]
with KhayaClient(api_key) as khaya:
chunks = split_sentences(long_text)
audio_parts = [khaya.synthesize(chunk, "twi").audio for chunk in chunks]
combined = b"".join(audio_parts)
with open("output.wav", "wb") as f:
f.write(combined)
Error handling¶
from khaya.exceptions import TTSGenerationError, AuthenticationError, APIError
try:
result = khaya.synthesize("Maakye", "twi")
except TTSGenerationError as e:
# Raised when text or language is empty
print(f"TTS error: {e.message}")
except AuthenticationError:
print("Check your API key.")
except APIError as e:
print(f"API error {e.status_code}: {e.message}")
See Error Handling for the full exception reference.