LocalAI - Models

gemma-4-26b-a4b-it-qat

Hugging Face | GitHub | Launch Blog | Documentation License: Apache 2.0 | Authors: Google DeepMind > [!Note] > This model card is for the new versions of the Gemma 4 family optimized with Quantization-Aware Training (QAT), which allows preserving similar quality to bfloat16 while dramatically reducing the memory requirements to load the model. > Four versions of the QAT checkpoints are available: > * **Unquantized QAT checkpoints** (Q4_0): Half-precision weights extracted from the QAT pipeline, ideal for custom downstream compilation and research. Available for Gemma 4 E2B, E4B, 12B, 26B A4B, and 31B, and their drafter models. > * **GGUF** (Q4_0): Ready-to-deploy formats for broad ecosystem compatibility. Available for Gemma 4 E2B, E4B, 12B, 26B A4B, and 31B. > * **Mobile-optimized** (wNa8o8): A custom schema engineered explicitly for mobile hardware efficiency. It features targeted 2-bit decoding layers, optimized KV caches, and static activations to maximize VRAM savings. Available for Gemma 4 E2B and E4B. > * **Compressed Tensors** (w4a16): QAT checkpoints serialized in the compressed-tensors format for native, optimized inference with vLLM. Available for Gemma 4 E2B, E4B, 12B ...

Links

https://huggingface.co/unsloth/gemma-4-26B-A4B-it-qat-GGUF

Tags

gemma-4-12b-it-qat-q4_0

Hugging Face | GitHub | Launch Blog | Documentation License: Apache 2.0 | Authors: Google DeepMind > [!Note] > This model card is for the new versions of the Gemma 4 family optimized with Quantization-Aware Training (QAT), which allows preserving similar quality to bfloat16 while dramatically reducing the memory requirements to load the model. > Four versions of the QAT checkpoints are available: > * **Unquantized QAT checkpoints** (Q4_0): Half-precision weights extracted from the QAT pipeline, ideal for custom downstream compilation and research. Available for Gemma 4 E2B, E4B, 12B, 26B A4B, and 31B, and their drafter models. > * **GGUF** (Q4_0): Ready-to-deploy formats for broad ecosystem compatibility. Available for Gemma 4 E2B, E4B, 12B, 26B A4B, and 31B. > * **Mobile-optimized** (wNa8o8): A custom schema engineered explicitly for mobile hardware efficiency. It features targeted 2-bit decoding layers, optimized KV caches, and static activations to maximize VRAM savings. Available for Gemma 4 E2B and E4B. > * **Compressed Tensors** (w4a16): QAT checkpoints serialized in the compressed-tensors format for native, optimized inference with vLLM. Available for Gemma 4 E2B, E4B, 12B ...

Links

https://huggingface.co/google/gemma-4-12B-it-qat-q4_0-gguf

Tags

nousresearch_hermes-4-14b

Hermes 4 14B is a frontier, hybrid-mode reasoning model based on Qwen 3 14B by Nous Research that is aligned to you. Read the Hermes 4 technical report here: Hermes 4 Technical Report Chat with Hermes in Nous Chat: https://chat.nousresearch.com Training highlights include a newly synthesized post-training corpus emphasizing verified reasoning traces, massive improvements in math, code, STEM, logic, creativity, and format-faithful outputs, while preserving general assistant quality and broadly neutral alignment. What’s new vs Hermes 3 Post-training corpus: Massively increased dataset size from 1M samples and 1.2B tokens to ~5M samples / ~60B tokens blended across reasoning and non-reasoning data. Hybrid reasoning mode with explicit … segments when the model decides to deliberate, and options to make your responses faster when you want. Reasoning that is top quality, expressive, improves math, code, STEM, logic, and even creative writing and subjective responses. Schema adherence & structured outputs: trained to produce valid JSON for given schemas and to repair malformed objects. Much easier to steer and align: extreme improvements on steerability, especially on reduced refusal rates.

Links

Tags

nousresearch_hermes-4-70b

Hermes 4 70B is a frontier, hybrid-mode reasoning model based on Llama-3.1-70B by Nous Research that is aligned to you. Read the Hermes 4 technical report here: Hermes 4 Technical Report Chat with Hermes in Nous Chat: https://chat.nousresearch.com Training highlights include a newly synthesized post-training corpus emphasizing verified reasoning traces, massive improvements in math, code, STEM, logic, creativity, and format-faithful outputs, while preserving general assistant quality and broadly neutral alignment. What’s new vs Hermes 3 Post-training corpus: Massively increased dataset size from 1M samples and 1.2B tokens to ~5M samples / ~60B tokens blended across reasoning and non-reasoning data. Hybrid reasoning mode with explicit … segments when the model decides to deliberate, and options to make your responses faster when you want. Reasoning that is top quality, expressive, improves math, code, STEM, logic, and even creative writing and subjective responses. Schema adherence & structured outputs: trained to produce valid JSON for given schemas and to repair malformed objects. Much easier to steer and align: extreme improvements on steerability, especially on reduced refusal rates.

Links

Tags

thoughtless-fallen-abomination-70b-r1-v4.1-i1

ReadyArt/Thoughtless-Fallen-Abomination-70B-R1-v4.1 benefits from the coherence and well rounded roleplay experience of TheDrummer/Fallen-Llama-3.3-R1-70B-v1. We've: 🔁 Re-integrated your favorite V1.2 scenarios (now with better kink distribution) 🧪 Direct-injected the Abomination dataset into the model's neural pathways ⚖️ Achieved perfect balance between "oh my" and "oh my"

Links

Tags

fallen-safeword-70b-r1-v4.1

ReadyArt/Fallen-Safeword-70B-R1-v4.1 isn't just a model - is the event horizon of depravity trained on TheDrummer/Fallen-Llama-3.3-R1-70B-v1. We've: 🔁 Re-integrated your favorite V1.2 scenarios (now with better kink distribution) 🧪 Direct-injected the Safeword dataset into the model's neural pathways ⚖️ Achieved perfect balance between "oh my" and "oh my"

Links

Tags

gemma-3-the-grand-horror-27b

The **Gemma-3-The-Grand-Horror-27B-GGUF** model is a **fine-tuned version** of Google's **Gemma 3 27B** language model, specifically optimized for **extreme horror-themed text generation**. It was trained using the **Unsloth framework** on a custom in-house dataset of horror content, resulting in a model that produces vivid, graphic, and psychologically intense narratives—featuring gore, madness, and disturbing imagery—often even when prompts don't explicitly request horror. Key characteristics: - **Base Model**: Gemma 3 27B (original by Google, not the quantized version) - **Fine-tuned For**: High-intensity horror storytelling, long-form narrative generation, and immersive scene creation - **Use Case**: Creative writing, horror RP, dark fiction, and experimental storytelling - **Not Suitable For**: General use, children, sensitive audiences, or content requiring neutral/positive tone - **Quantization**: Available in GGUF format (e.g., q3k, q4, etc.), making it accessible for local inference on consumer hardware > ✅ **Note**: The model card you see is for a **quantized, fine-tuned derivative**, not the original. The true base model is **Gemma 3 27B**, available at: https://huggingface.co/google/gemma-3-27b This model is not for all audiences — it generates content with a consistently dark, unsettling tone. Use responsibly.

Links

https://huggingface.co/DavidAU/Gemma-3-The-Grand-Horror-27B-GGUF

Tags

qwen3-tts-customvoice-crispasr

Qwen3-TTS CustomVoice 0.6B (12 Hz) text-to-speech synthesized through the CrispASR backend. Fixed-speaker fine-tune driven via an explicit backend selector plus a tokenizer codec companion. Ships baked speakers (vivian, aiden, dylan, eric, ono_anna, ryan, serena, sohee, uncle_fu); the default config selects vivian. Runs end-to-end on CPU and produces 24 kHz mono audio. Default GGUF sizes ~968 MB (talker) + ~358 MB (tokenizer).

Links

https://huggingface.co/cstr/qwen3-tts-0.6b-customvoice-GGUF

Tags

hubert-crispasr

HuBERT Large (LS960 fine-tune) CTC speech recognition, English. Runs via the CrispASR backend with an explicit backend selector. Default GGUF size ~200 MB.

Links

https://huggingface.co/cstr/hubert-large-ls960-ft-GGUF

Tags

data2vec-crispasr

data2vec Audio Base (960h) CTC speech recognition, English. Runs via the CrispASR backend with an explicit backend selector. Default GGUF size ~60 MB.

Links

https://huggingface.co/cstr/data2vec-audio-960h-GGUF

Tags

glm-asr-crispasr

GLM-ASR Nano speech recognition. Runs via the CrispASR backend with an explicit backend selector. Default GGUF size ~1.2 GB.

Links

https://huggingface.co/cstr/glm-asr-nano-GGUF

Tags

kyutai-stt-crispasr

Kyutai STT 1B (Moshi-style) speech recognition. Runs via the CrispASR backend with an explicit backend selector. Default GGUF size ~636 MB.

Links

https://huggingface.co/cstr/kyutai-stt-1b-GGUF

Tags

firered-asr-crispasr

FireRed-ASR2 AED speech recognition. Runs via the CrispASR backend with an explicit backend selector. Default GGUF size ~918 MB.

Links

https://huggingface.co/cstr/firered-asr2-aed-GGUF

Tags

moonshine-crispasr

Moonshine Tiny speech recognition, English. Runs via the CrispASR backend with an explicit backend selector and a companion tokenizer. Default GGUF size ~20 MB.

Links

https://huggingface.co/cstr/moonshine-tiny-GGUF

Tags

moonshine-de-crispasr

Moonshine Base German fine-tune (fidoriel), best-quality German Moonshine. Runs via the CrispASR backend with an explicit backend selector and a companion tokenizer. Default GGUF size ~39 MB.

Links

https://huggingface.co/cstr/moonshine-base-de-fidoriel-GGUF

Tags

moonshine-tiny-de-crispasr

Moonshine Tiny German fine-tune (fidoriel), smaller/faster German Moonshine. Runs via the CrispASR backend with an explicit backend selector and a companion tokenizer. Default GGUF size ~17 MB.

Links

https://huggingface.co/cstr/moonshine-tiny-de-fidoriel-GGUF

Tags

moonshine-streaming-crispasr

Moonshine Streaming Tiny speech recognition. Runs via the CrispASR backend with an explicit backend selector and a companion tokenizer. Default GGUF size ~31 MB.

Links

https://huggingface.co/cstr/moonshine-streaming-tiny-GGUF

Tags

mimo-asr-crispasr

MiMo-ASR speech recognition. Runs via the CrispASR backend with an explicit backend selector and a companion tokenizer GGUF. Default GGUF size ~4.2 GB.

Links

https://huggingface.co/cstr/mimo-asr-GGUF

Tags

Model Gallery

Find Your Perfect Model

Filter by Model Type

Browse by Tags

gemma-4-26b-a4b-it-qat

gemma-4-12b-it-qat-q4_0

nousresearch_hermes-4-14b

nousresearch_hermes-4-70b

thoughtless-fallen-abomination-70b-r1-v4.1-i1

fallen-safeword-70b-r1-v4.1

gemma-3-the-grand-horror-27b

qwen3-tts-customvoice-crispasr

hubert-crispasr

data2vec-crispasr

glm-asr-crispasr

kyutai-stt-crispasr

firered-asr-crispasr

moonshine-crispasr

moonshine-de-crispasr

moonshine-tiny-de-crispasr

moonshine-streaming-crispasr

mimo-asr-crispasr