VoxCPM 1.5 is an end-to-end text-to-speech (TTS) model from ModelBest. It features zero-shot voice cloning and high-quality speech synthesis capabilities.
Repository: localaiLicense: apache-2.0
voxcpm-1.5
VoxCPM 1.5 is an end-to-end text-to-speech (TTS) model from ModelBest. It features zero-shot voice cloning and high-quality speech synthesis capabilities.
NeuTTS Air is the world's first super-realistic, on-device TTS speech language model with instant voice cloning. Built on a 0.5B LLM backbone, it brings natural-sounding speech, real-time performance, and speaker cloning to local devices.
Repository: localaiLicense: apache-2.0
neutts-air
NeuTTS Air is the world's first super-realistic, on-device TTS speech language model with instant voice cloning. Built on a 0.5B LLM backbone, it brings natural-sounding speech, real-time performance, and speaker cloning to local devices.
Qwen3-TTS-12Hz-1.7B-CustomVoice via vLLM-Omni - Text-to-speech model from Alibaba Qwen team with custom voice cloning capabilities. Generates natural-sounding speech with voice personalization.
Repository: localaiLicense: apache-2.0
vllm-omni-qwen3-tts-custom-voice
Qwen3-TTS-12Hz-1.7B-CustomVoice via vLLM-Omni - Text-to-speech model from Alibaba Qwen team with custom voice cloning capabilities. Generates natural-sounding speech with voice personalization.
ACE-Step 1.5 Turbo is a music generation model that can create music from text descriptions,
lyrics, or audio samples. Supports both simple text-to-music and advanced music generation
with metadata like BPM, key scale, and time signature.
Repository: localaiLicense: mit
ace-step-turbo
ACE-Step 1.5 Turbo is a music generation model that can create music from text descriptions,
lyrics, or audio samples. Supports both simple text-to-music and advanced music generation
with metadata like BPM, key scale, and time signature.
Kokoro is an open-weight TTS model with 82 million parametrs. Despite its lightweight architecture, it delivers comparable quality to larger models while being significantly faster and more cost-efficient. With Apache-licensed weights, Kokoro can be deployed anywhere from production environments to personal projects.
Repository: localaiLicense: apache-2.0
kokoro
Kokoro is an open-weight TTS model with 82 million parametrs. Despite its lightweight architecture, it delivers comparable quality to larger models while being significantly faster and more cost-efficient. With Apache-licensed weights, Kokoro can be deployed anywhere from production environments to personal projects.
Kokoros is a pure Rust TTS backend using the Kokoro v1.0 ONNX model (82M parameters).
Fast, streaming TTS with high quality. American English with af_heart voice.
Repository: localaiLicense: apache-2.0
kokoros
Kokoros is a pure Rust TTS backend using the Kokoro v1.0 ONNX model (82M parameters).
Fast, streaming TTS with high quality. American English with af_heart voice.
Kitten TTS is an open-source realistic text-to-speech model with just 15 million parameters, designed for lightweight deployment and high-quality voice synthesis.
Repository: localaiLicense: apache-2.0
kitten-tts
Kitten TTS is an open-source realistic text-to-speech model with just 15 million parameters, designed for lightweight deployment and high-quality voice synthesis.
Chatterbox, Resemble AI's first production-grade open source TTS model. Licensed under MIT, Chatterbox has been benchmarked against leading closed-source systems like ElevenLabs, and is consistently preferred in side-by-side evaluations.
Repository: localaiLicense: mit
chatterbox
Chatterbox, Resemble AI's first production-grade open source TTS model. Licensed under MIT, Chatterbox has been benchmarked against leading closed-source systems like ElevenLabs, and is consistently preferred in side-by-side evaluations.
Parler-TTS is a lightweight text-to-speech (TTS) model that can generate high-quality, natural sounding speech in the style of a given speaker (gender, pitch, speaking style, etc). It is a reproduction of work from the paper Natural language guidance of high-fidelity text-to-speech with synthetic annotations by Dan Lyth and Simon King, from Stability AI and Edinburgh University respectively.
Repository: localaiLicense: apache-2.0
parler-tts-mini-v0.1
Parler-TTS is a lightweight text-to-speech (TTS) model that can generate high-quality, natural sounding speech in the style of a given speaker (gender, pitch, speaking style, etc). It is a reproduction of work from the paper Natural language guidance of high-fidelity text-to-speech with synthetic annotations by Dan Lyth and Simon King, from Stability AI and Edinburgh University respectively.
A fast, local neural text to speech system that sounds great and is optimized for the Raspberry Pi 4. Piper is used in a variety of [projects](https://github.com/rhasspy/piper#people-using-piper).
Repository: localaiLicense: mit
voice-en-us-kathleen-low
A fast, local neural text to speech system that sounds great and is optimized for the Raspberry Pi 4. Piper is used in a variety of [projects](https://github.com/rhasspy/piper#people-using-piper).
A fast, local neural text to speech system that sounds great and is optimized for the Raspberry Pi 4. Piper is used in a variety of [projects](https://github.com/rhasspy/piper#people-using-piper).
Repository: localaiLicense: mit
voice-ca-upc_ona-x-low
A fast, local neural text to speech system that sounds great and is optimized for the Raspberry Pi 4. Piper is used in a variety of [projects](https://github.com/rhasspy/piper#people-using-piper).