Model Gallery

Discover and install AI models from our curated collection

11 models available

1 repositories

Find Your Perfect Model

Filter by Model Type

Browse by Tags

voxcpm-1.5

VoxCPM 1.5 is an end-to-end text-to-speech (TTS) model from ModelBest. It features zero-shot voice cloning and high-quality speech synthesis capabilities.

Repository: localaiLicense: apache-2.0

neutts-air

NeuTTS Air is the world's first super-realistic, on-device TTS speech language model with instant voice cloning. Built on a 0.5B LLM backbone, it brings natural-sounding speech, real-time performance, and speaker cloning to local devices.

Repository: localaiLicense: apache-2.0

vllm-omni-qwen3-tts-custom-voice

Qwen3-TTS-12Hz-1.7B-CustomVoice via vLLM-Omni - Text-to-speech model from Alibaba Qwen team with custom voice cloning capabilities. Generates natural-sounding speech with voice personalization.

Repository: localaiLicense: apache-2.0

vibevoice-cpp

VibeVoice Realtime 0.5B (C++ / GGML, Q8_0) - native C++ port of Microsoft VibeVoice via the vibevoice-cpp backend. 24kHz mono TTS with voice cloning from a single reference voice prompt. Default voice prompt: en-Carter_man.

Repository: localaiLicense: mit

qwen3-tts-cpp

Qwen3-TTS 0.6B Base (C++ / GGML, qwentts.cpp). Native C++ text-to-speech with streaming output and zero-shot voice cloning (set `voice` to a 24kHz reference .wav). 24kHz mono, 11 languages with Mandarin dialects. Q8_0 (~0.95 GB talker).

Repository: localaiLicense: mit

qwen3-tts-cpp-0.6b-base-q4

Qwen3-TTS 0.6B Base (C++ / GGML, qwentts.cpp), Q4_K_M (~0.6 GB talker). Streaming + voice cloning, 24kHz mono, 11 languages.

Repository: localaiLicense: mit

qwen3-tts-cpp-1.7b-base

Qwen3-TTS 1.7B Base (C++ / GGML, qwentts.cpp), Q8_0 (~2.0 GB talker). Higher-quality streaming + voice cloning, 24kHz mono, 11 languages.

Repository: localaiLicense: mit

qwen3-tts-cpp-1.7b-base-q4

Qwen3-TTS 1.7B Base (C++ / GGML, qwentts.cpp), Q4_K_M (~1.2 GB talker). Streaming + voice cloning, 24kHz mono, 11 languages.

Repository: localaiLicense: mit

omnivoice-cpp

OmniVoice (C++ / GGML) - native text-to-speech with voice cloning and voice design. 24kHz mono output, 646 languages, streaming synthesis. Q8_0 GGUFs (~945 MB total): 612M Qwen3 backbone + RVQ audio codec.

Repository: localaiLicense: apache-2.0

omnivoice-cpp-hq

OmniVoice (C++ / GGML), BF16 high-quality variant - text-to-speech with voice cloning and voice design. 24kHz mono, 646 languages, streaming. BF16 GGUFs (~1.6 GB total).

Repository: localaiLicense: apache-2.0

fish-speech-s2-pro

Fish Speech S2-Pro is a high-quality text-to-speech model supporting voice cloning via reference audio. Uses a two-stage pipeline: text to semantic tokens (LLaMA-based) then semantic to audio (DAC decoder).

Repository: localaiLicense: fish-audio-research-license