LocalAI - Models

serenity-26b-a4b

.mc-wrap{background:#0d1117;color:#c9d1d9;font-family:'Inter',sans-serif;max-width:920px;margin:0 auto;padding:24px;border-radius:16px;box-sizing:border-box} .mc-wrap *{box-sizing:border-box} .mc-wrap h1,.mc-wrap h2,.mc-wrap h3,.mc-wrap h4{color:#e6edf3;border:none} .mc-wrap p{color:#c9d1d9} .mc-wrap strong{color:#7ee8d0} .mc-wrap a{color:#7ee8d0;text-decoration:none} .mc-wrap ul{list-style:none;padding-left:0;margin:0} .mc-wrap li{color:#c9d1d9;margin-bottom:8px;padding-left:4px} .mc-wrap code{background:#161b22;color:#7ee8d0;padding:2px 8px;border-radius:4px;font-family:'JetBrains Mono',monospace;font-size:.88em;border:1px solid rgba(126,232,208,.15)} .mc-hdr{text-align:center;padding:40px 32px;background:#0d1117;border:1px solid #21262d;border-radius:24px;margin-bottom:20px;position:relative;overflow:hidden} .mc-hdr::before{content:'';position:absolute;top:0;left:0;right:0;height:3px;background:linear-gradient(135deg,#7ee8d0,#a78bfa,#c4b5fd)} .mc-name{font-family:'Space Grotesk',sans-serif;font-size:2.8em;font-weight:800;margin:0;letter-spacing:-.02em;background:linear-gradient(135deg,#7ee8d0,#a78bfa,#c4b5fd);-webkit-background-clip:text;-webkit-text-fill-color:transparent;backg ...

Links

https://huggingface.co/ReadyArt/Serenity-26B-A4B-GGUF

Tags

melody1437-26b-a4b-v2.0

@import url('https://fonts.googleapis.com/css2?family=Poppins:wght@400;600&family=Playfair+Display:ital,wght@0,400;0,700&family=Roboto+Mono:wght@400;500&display=swap'); body { font-family: 'Poppins', sans-serif; background: #1a1a2e; background-image: radial-gradient(circle at 50% 50%, rgba(76, 201, 240, 0.05) 0%, transparent 70%), url('https://www.transparenttextures.com/patterns/cubes.png'); color: #e0e0e0; margin: 0; padding: 20px; line-height: 1.6; } .container { max-width: 900px; margin: 0 auto; background: rgba(26, 32, 44, 0.95); border-radius: 8px; padding: 40px; box-shadow: 0 4px 30px rgba(0, 0, 0, 0.5), 0 0 0 1px #2a3b55; border: 1px solid #2a3b55; position: relative; overflow: hidden; backdrop-filter: blur(5px); } .header { text-align: center; margin-bottom: 30px; position: relative; z-index: 1; border-bottom: 1px solid #2a3b55; padding-bottom: 15px; } ...

Links

https://huggingface.co/ReadyArt/Melody1437-26B-A4B-v2.0-GGUF

Tags

dark-scarlett-v0.3-26b-a4b

Hugging Face | GitHub | Launch Blog | Documentation License: Apache 2.0 | Authors: Google DeepMind Gemma is a family of open models built by Google DeepMind. Gemma 4 models are multimodal, handling text and image input (with audio supported on small models) and generating text output. This release includes open-weights models in both pre-trained and instruction-tuned variants. Gemma 4 features a context window of up to 256K tokens and maintains multilingual support in over 140 languages. Featuring both Dense and Mixture-of-Experts (MoE) architectures, Gemma 4 is well-suited for tasks like text generation, coding, and reasoning. The models are available in four distinct sizes: **E2B**, **E4B**, **26B A4B**, and **31B**. Their diverse sizes make them deployable in environments ranging from high-end phones to laptops and servers, democratizing access to state-of-the-art AI. Gemma 4 introduces key **capability and architectural advancements**: * **Reasoning** – All models in the family are designed as highly capable reasoners, with configurable thinking modes. ...

Links

https://huggingface.co/ReadyArt/Dark-Scarlett-v0.3-26B-A4B-GGUF

Tags

gemma-4-26b-a4b-it-qat

Hugging Face | GitHub | Launch Blog | Documentation License: Apache 2.0 | Authors: Google DeepMind > [!Note] > This model card is for the new versions of the Gemma 4 family optimized with Quantization-Aware Training (QAT), which allows preserving similar quality to bfloat16 while dramatically reducing the memory requirements to load the model. > Four versions of the QAT checkpoints are available: > * **Unquantized QAT checkpoints** (Q4_0): Half-precision weights extracted from the QAT pipeline, ideal for custom downstream compilation and research. Available for Gemma 4 E2B, E4B, 12B, 26B A4B, and 31B, and their drafter models. > * **GGUF** (Q4_0): Ready-to-deploy formats for broad ecosystem compatibility. Available for Gemma 4 E2B, E4B, 12B, 26B A4B, and 31B. > * **Mobile-optimized** (wNa8o8): A custom schema engineered explicitly for mobile hardware efficiency. It features targeted 2-bit decoding layers, optimized KV caches, and static activations to maximize VRAM savings. Available for Gemma 4 E2B and E4B. > * **Compressed Tensors** (w4a16): QAT checkpoints serialized in the compressed-tensors format for native, optimized inference with vLLM. Available for Gemma 4 E2B, E4B, 12B ...

Links

https://huggingface.co/unsloth/gemma-4-26B-A4B-it-qat-GGUF

Tags

gemma-4-26b-a4b-it-qat-q4_0

Gemma 4 26B-A4B is a multimodal (text + image) instruction-tuned Mixture-of-Experts model from Google DeepMind, optimized with Quantization-Aware Training (QAT) to preserve bfloat16-level quality at a fraction of the memory. With 26B total parameters and ~4B active per token, it delivers large-model quality at a much lower inference cost. This is the official Google Q4_0 GGUF, shipped with its multimodal projector. License: Apache 2.0 | Authors: Google DeepMind

Links

https://huggingface.co/google/gemma-4-26B-A4B-it-qat-q4_0-gguf

Tags

supergemma4-26b-uncensored-v2

Hugging Face | GitHub | Launch Blog | Documentation License: Apache 2.0 | Authors: Google DeepMind Gemma is a family of open models built by Google DeepMind. Gemma 4 models are multimodal, handling text and image input (with audio supported on small models) and generating text output. This release includes open-weights models in both pre-trained and instruction-tuned variants. Gemma 4 features a context window of up to 256K tokens and maintains multilingual support in over 140 languages. Featuring both Dense and Mixture-of-Experts (MoE) architectures, Gemma 4 is well-suited for tasks like text generation, coding, and reasoning. The models are available in four distinct sizes: **E2B**, **E4B**, **26B A4B**, and **31B**. Their diverse sizes make them deployable in environments ranging from high-end phones to laptops and servers, democratizing access to state-of-the-art AI. Gemma 4 introduces key **capability and architectural advancements**: * **Reasoning** – All models in the family are designed as highly capable reasoners, with configurable thinking modes. ...

Links

https://huggingface.co/Jiunsong/supergemma4-26b-uncensored-gguf-v2

Tags

gemma-4-26b-a4b-it-apex

AI model: gemma-4-26b-a4b-it-apex

Links

https://huggingface.co/mudler/gemma-4-26B-A4B-it-APEX-GGUF

gemma-4-26b-a4b-it

Google Gemma 4 26B-A4B-IT is an open-source multimodal Mixture-of-Experts model with 26B total parameters and 4B active parameters. It handles text and image input, generating text output, with a 256K context window and support for 140+ languages. The MoE architecture provides strong performance with efficient inference. Well-suited for question answering, summarization, reasoning, and image understanding tasks.

Links

Tags

nemo-parakeet-tdt-0.6b

NVIDIA NeMo Parakeet TDT 0.6B v3 is an automatic speech recognition (ASR) model from NVIDIA's NeMo toolkit. Parakeet models are state-of-the-art ASR models trained on large-scale English audio data.

Links

Tags

qwen3-tts-cpp-0.6b-base-q4

Qwen3-TTS 0.6B Base (C++ / GGML, qwentts.cpp), Q4_K_M (~0.6 GB talker). Streaming + voice cloning, 24kHz mono, 11 languages.

Links

Tags

qwen3-tts-0.6b-custom-voice

Qwen3-TTS is a high-quality text-to-speech model supporting custom voice, voice design, and voice cloning.

Links

https://huggingface.co/Qwen/Qwen3-TTS-12Hz-0.6B-CustomVoice

Tags

qwen3-asr-0.6b

Qwen3-ASR is an automatic speech recognition model supporting multiple languages and batch inference.

Links

https://huggingface.co/Qwen/Qwen3-ASR-0.6B

Tags

liquidai.lfm2-2.6b-transcript

This is a large language model (2.6B parameters) designed for text-generation tasks. It is a quantized version of the original model `LiquidAI/LFM2-2.6B-Transcript`, optimized for efficiency while retaining strong performance. The model is built on the foundation of the base model, with additional optimizations for deployment and use cases like transcription or language modeling. It is trained on large-scale text data and supports multiple languages.

Links

https://huggingface.co/DevQuasar/LiquidAI.LFM2-2.6B-Transcript-GGUF

Tags

qwen3-asr-0.6b

Qwen3-ASR is an automatic speech recognition model supporting multiple languages and batch inference.

Links

https://huggingface.co/Qwen/Qwen3-ASR-0.6B

Tags

lfm2-vl-1.6b

LFM2‑VL is Liquid AI's first series of multimodal models, designed to process text and images with variable resolutions. Built on the LFM2 backbone, it is optimized for low-latency and edge AI applications. We're releasing the weights of two post-trained checkpoints with 450M (for highly constrained devices) and 1.6B (more capable yet still lightweight) parameters. 2× faster inference speed on GPUs compared to existing VLMs while maintaining competitive accuracy Flexible architecture with user-tunable speed-quality tradeoffs at inference time Native resolution processing up to 512×512 with intelligent patch-based handling for larger images, avoiding upscaling and distortion

Links

Tags

qwen3-reranker-0.6b

The Qwen3 Embedding model series is the latest proprietary model of the Qwen family, specifically designed for text embedding and ranking tasks. Building upon the dense foundational models of the Qwen3 series, it provides a comprehensive range of text embeddings and reranking models in various sizes (0.6B, 4B, and 8B). This series inherits the exceptional multilingual capabilities, long-text understanding, and reasoning skills of its foundational model. The Qwen3 Embedding series represents significant advancements in multiple text embedding and ranking tasks, including text retrieval, code retrieval, text classification, text clustering, and bitext mining. **Exceptional Versatility**: The embedding model has achieved state-of-the-art performance across a wide range of downstream application evaluations. The 8B size embedding model ranks No.1 in the MTEB multilingual leaderboard (as of June 5, 2025, score 70.58), while the reranking model excels in various text retrieval scenarios. **Comprehensive Flexibility**: The Qwen3 Embedding series offers a full spectrum of sizes (from 0.6B to 8B) for both embedding and reranking models, catering to diverse use cases that prioritize efficiency and effectiveness. Developers can seamlessly combine these two modules. Additionally, the embedding model allows for flexible vector definitions across all dimensions, and both embedding and reranking models support user-defined instructions to enhance performance for specific tasks, languages, or scenarios. **Multilingual Capability**: The Qwen3 Embedding series offer support for over 100 languages, thanks to the multilingual capabilites of Qwen3 models. This includes various programming languages, and provides robust multilingual, cross-lingual, and code retrieval capabilities. **Qwen3-Reranker-0.6B** has the following features: - Model Type: Text Reranking - Supported Languages: 100+ Languages - Number of Paramaters: 0.6B - Context Length: 32k - Quantization: q4_K_M, q5_0, q5_K_M, q6_K, q8_0, f16

Links

https://huggingface.co/Qwen/Qwen3-Reranker-0.6B

Tags

qwen3-0.6b

Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support, with the following key features: Uniquely support of seamless switching between thinking mode (for complex logical reasoning, math, and coding) and non-thinking mode (for efficient, general-purpose dialogue) within single model, ensuring optimal performance across various scenarios. Significantly enhancement in its reasoning capabilities, surpassing previous QwQ (in thinking mode) and Qwen2.5 instruct models (in non-thinking mode) on mathematics, code generation, and commonsense logical reasoning. Superior human preference alignment, excelling in creative writing, role-playing, multi-turn dialogues, and instruction following, to deliver a more natural, engaging, and immersive conversational experience. Expertise in agent capabilities, enabling precise integration with external tools in both thinking and unthinking modes and achieving leading performance among open-source models in complex agent-based tasks. Support of 100+ languages and dialects with strong capabilities for multilingual instruction following and translation. Qwen3-0.6B has the following features: Type: Causal Language Models Training Stage: Pretraining & Post-training Number of Parameters: 0.6B Number of Paramaters (Non-Embedding): 0.44B Number of Layers: 28 Number of Attention Heads (GQA): 16 for Q and 8 for KV Context Length: 32,768

Links

Tags

kalomaze_qwen3-16b-a3b

A man-made horror beyond your comprehension. But no, seriously, this is my experiment to: measure the probability that any given expert will activate (over my personal set of fairly diverse calibration data), per layer prune 64/128 of the least used experts per layer (with reordered router and indexing per layer) It can still write semi-coherently without any additional training or distillation done on top of it from the original 30b MoE. The .txt files with the original measurements are provided in the repo along with the exported weights. Custom testing to measure the experts was done on a hacked version of vllm, and then I made a bespoke script to selectively export the weights according to the measurements.

Links

Tags

qwen3-embedding-0.6b

The Qwen3 Embedding model series is the latest proprietary model of the Qwen family, specifically designed for text embedding and ranking tasks. Building upon the dense foundational models of the Qwen3 series, it provides a comprehensive range of text embeddings and reranking models in various sizes (0.6B, 4B, and 8B). This series inherits the exceptional multilingual capabilities, long-text understanding, and reasoning skills of its foundational model. The Qwen3 Embedding series represents significant advancements in multiple text embedding and ranking tasks, including text retrieval, code retrieval, text classification, text clustering, and bitext mining. **Exceptional Versatility**: The embedding model has achieved state-of-the-art performance across a wide range of downstream application evaluations. The 8B size embedding model ranks **No.1** in the MTEB multilingual leaderboard (as of June 5, 2025, score **70.58**), while the reranking model excels in various text retrieval scenarios. **Comprehensive Flexibility**: The Qwen3 Embedding series offers a full spectrum of sizes (from 0.6B to 8B) for both embedding and reranking models, catering to diverse use cases that prioritize efficiency and effectiveness. Developers can seamlessly combine these two modules. Additionally, the embedding model allows for flexible vector definitions across all dimensions, and both embedding and reranking models support user-defined instructions to enhance performance for specific tasks, languages, or scenarios. **Multilingual Capability**: The Qwen3 Embedding series offer support for over 100 languages, thanks to the multilingual capabilites of Qwen3 models. This includes various programming languages, and provides robust multilingual, cross-lingual, and code retrieval capabilities. **Qwen3-Embedding-0.6B-GGUF** has the following features: - Model Type: Text Embedding - Supported Languages: 100+ Languages - Number of Paramaters: 0.6B - Context Length: 32k - Embedding Dimension: Up to 1024, supports user-defined output dimensions ranging from 32 to 1024 - Quantization: q8_0, f16

Links

https://huggingface.co/Qwen/Qwen3-Embedding-0.6B-GGUF

Tags

qwen3-deckard-large-almost-human-6b-i1

A love letter to all things Philip K Dick, trained and fine tuned on an in house dataset. This is V1, "Light", "Large" and "Almost Human". "Almost Human" is about adding (back) the humanity, the real person called Philip K Dick back into the model - with tone, thinking, and a touch of prose. "Deckard" is the main character in Blade Runner.

Links

Tags

gustavecortal_beck-0.6b

A language model that handles delicate life situations and tries to really help you. Beck is based on Piaget and was finetuned on psychotherapeutic preferences from PsychoCounsel-Preference. Methodology Beck was trained using preference optimization (ORPO) and LoRA. You can reproduce the results using my repo for lightweight preference optimization using this config that contains the hyperparameters. This work was performed using HPC resources (Jean Zay supercomputer) from GENCI-IDRIS (Grant 20XX-AD011014205). Inspiration Beck aims to reason about psychological and philosophical concepts such as self-image, emotion, and existence. Beck was inspired by my position paper on emotion analysis: Improving Language Models for Emotion Analysis: Insights from Cognitive Science.

Links

Tags

Model Gallery

Find Your Perfect Model

Filter by Model Type

Browse by Tags

serenity-26b-a4b

melody1437-26b-a4b-v2.0

dark-scarlett-v0.3-26b-a4b

gemma-4-26b-a4b-it-qat

gemma-4-26b-a4b-it-qat-q4_0

supergemma4-26b-uncensored-v2

gemma-4-26b-a4b-it-apex

gemma-4-26b-a4b-it

nemo-parakeet-tdt-0.6b

qwen3-tts-cpp-0.6b-base-q4

qwen3-tts-0.6b-custom-voice

qwen3-asr-0.6b

liquidai.lfm2-2.6b-transcript

qwen3-asr-0.6b

lfm2-vl-1.6b

qwen3-reranker-0.6b

qwen3-0.6b

kalomaze_qwen3-16b-a3b

qwen3-embedding-0.6b

qwen3-deckard-large-almost-human-6b-i1

gustavecortal_beck-0.6b