Model Gallery

Discover and install AI models from our curated collection

90 models available

1 repositories

Find Your Perfect Model

Filter by Model Type

Browse by Tags

lfm2.5-8b-a1b

Try LFM • Docs • LEAP • Discord # LFM2.5-8B-A1B LFM2.5 is a new family of hybrid models designed for on-device deployment. It builds on the LFM2 architecture with extended pre-training and reinforcement learning. - **On-device personal assistant**: Designed to power real-life applications, chaining tool calls, and following complex instructions on all devices. - **Compressed performance**: Competitive with much larger dense and MoE models on instruction following and agentic tasks. - **Unmatched throughput**: Fastest in its size class on both CPU and GPU inference, with day-one support for llama.cpp, MLX, vLLM, and SGLang. Find more information about LFM2.5-8B-A1B in our blog post. **AA-Omniscience Index (higher is better) rewards correct answers and penalizes hallucinations. Scores range from -100 to 100. See more results on Artificial Analysis.* ## 🗒️ Model Details LFM2.5-8B-A1B is a general-purpose text-only model with the following features: ...

Repository: localaiLicense: other

qwen_qwen3.5-35b-a3b

Qwen3.5-35B-A3B is a quantized multimodal language model with 35B parameters using an A3B MoE architecture. It supports image-text understanding and chat interactions via llama-cpp backend.

Repository: localaiLicense: apache-2.0

qwen_qwen3.5-0.8b

Qwen 3.5 0.8B parameter model quantized for llama-cpp backend. Supports chat interactions and multimodal image-text inputs.

Repository: localaiLicense: apache-2.0

qwen_qwen3.5-2b

Qwen3.5-2B is a highly efficient, instruction-tuned multilingual language model available in various quantized GGUF formats. Optimized for llama-cpp inference, it supports chat and completion tasks with strong performance on low-RAM hardware. The model is available in multiple quantization levels ranging from Q8_0 to IQ2_M to balance quality and resource usage.

Repository: localaiLicense: apache-2.0

qwen_qwen3.5-4b

Qwen3.5-4B is a multimodal LLM with 4 billion parameters, optimized for chat and vision tasks. This GGUF quantized version enables efficient local inference via llama-cpp backend. Supports both text and image input for enhanced conversational capabilities.

Repository: localaiLicense: apache-2.0

qwen_qwen3-next-80b-a3b-thinking

Repository: localaiLicense: apache-2.0

acestep-cpp-turbo

ACE-Step 1.5 Turbo (C++ / GGML) — native C++ music generation from text descriptions and lyrics. Two-stage pipeline: text-to-code (Qwen3 LM) + code-to-audio (DiT-VAE). Stereo 48kHz output. Uses Q8_0 quantized models for a good balance of quality and speed.

Repository: localaiLicense: mit

acestep-cpp-turbo-4b

ACE-Step 1.5 Turbo (C++ / GGML) with 4B LM — higher quality music generation from text and lyrics. Uses the larger 4B parameter LM for better metadata/code generation. Stereo 48kHz output.

Repository: localaiLicense: mit

vibevoice-cpp

VibeVoice Realtime 0.5B (C++ / GGML, Q8_0) - native C++ port of Microsoft VibeVoice via the vibevoice-cpp backend. 24kHz mono TTS with voice cloning from a single reference voice prompt. Default voice prompt: en-Carter_man.

Repository: localaiLicense: mit

vibevoice-cpp-asr

VibeVoice ASR 7B (C++ / GGML, Q4_K) - long-form speech-to-text with speaker diarization. Returns per-speaker JSON segments with start/end timestamps. English-only. ~10 GB download.

Repository: localaiLicense: mit

qwen3-tts-cpp

Qwen3-TTS 0.6B Base (C++ / GGML, qwentts.cpp). Native C++ text-to-speech with streaming output and zero-shot voice cloning (set `voice` to a 24kHz reference .wav). 24kHz mono, 11 languages with Mandarin dialects. Q8_0 (~0.95 GB talker).

Repository: localaiLicense: mit

qwen3-tts-cpp-0.6b-base-q4

Qwen3-TTS 0.6B Base (C++ / GGML, qwentts.cpp), Q4_K_M (~0.6 GB talker). Streaming + voice cloning, 24kHz mono, 11 languages.

Repository: localaiLicense: mit

qwen3-tts-cpp-1.7b-base

Qwen3-TTS 1.7B Base (C++ / GGML, qwentts.cpp), Q8_0 (~2.0 GB talker). Higher-quality streaming + voice cloning, 24kHz mono, 11 languages.

Repository: localaiLicense: mit

qwen3-tts-cpp-1.7b-base-q4

Qwen3-TTS 1.7B Base (C++ / GGML, qwentts.cpp), Q4_K_M (~1.2 GB talker). Streaming + voice cloning, 24kHz mono, 11 languages.

Repository: localaiLicense: mit

qwen3-tts-cpp-customvoice

Qwen3-TTS 0.6B CustomVoice (C++ / GGML, qwentts.cpp), Q8_0. Named speakers selected via the `voice` field: serena, vivian, uncle_fu, ryan, aiden, ono_anna, sohee, eric (sichuan dialect), dylan (beijing dialect). Streaming, 24kHz mono, 11 languages.

Repository: localaiLicense: mit

qwen3-tts-cpp-customvoice-q4

Qwen3-TTS 0.6B CustomVoice (C++ / GGML, qwentts.cpp), Q4_K_M. Named speakers via the `voice` field (serena, vivian, ryan, aiden, eric, dylan, ...). Streaming, 24kHz mono, 11 languages.

Repository: localaiLicense: mit

qwen3-tts-cpp-1.7b-customvoice

Qwen3-TTS 1.7B CustomVoice (C++ / GGML, qwentts.cpp), Q8_0. Named speakers via the `voice` field (serena, vivian, ryan, aiden, eric, dylan, ...). Streaming, 24kHz mono, 11 languages.

Repository: localaiLicense: mit

qwen3-tts-cpp-1.7b-customvoice-q4

Qwen3-TTS 1.7B CustomVoice (C++ / GGML, qwentts.cpp), Q4_K_M. Named speakers via the `voice` field. Streaming, 24kHz mono, 11 languages.

Repository: localaiLicense: mit

qwen3-tts-cpp-1.7b-voicedesign

Qwen3-TTS 1.7B VoiceDesign (C++ / GGML, qwentts.cpp), Q8_0. Synthesises a speaker from a free-text attribute instruction - REQUIRES the OpenAI `instructions` field (e.g. "male, young adult, moderate pitch"); requests without it are rejected. Streaming, 24kHz mono, 11 languages.

Repository: localaiLicense: mit

qwen3-tts-cpp-1.7b-voicedesign-q4

Qwen3-TTS 1.7B VoiceDesign (C++ / GGML, qwentts.cpp), Q4_K_M. Synthesises a speaker from a free-text attribute instruction - REQUIRES the `instructions` field. Streaming, 24kHz mono, 11 languages.

Repository: localaiLicense: mit

omnivoice-cpp

OmniVoice (C++ / GGML) - native text-to-speech with voice cloning and voice design. 24kHz mono output, 646 languages, streaming synthesis. Q8_0 GGUFs (~945 MB total): 612M Qwen3 backbone + RVQ audio codec.

Repository: localaiLicense: apache-2.0

Page 1 of many