LocalAI - Models

qwen3.5-27b-claude-4.6-opus-reasoning-distilled-heretic-i1

Links

https://huggingface.co/mradermacher/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-heretic-i1-GGUF

Tags

lfm2.5-audio-1.5b-tts

LFM2.5-Audio-1.5B in TTS mode. Four baked voices: us_male, us_female, uk_male, uk_female — pick the default at load time via `voice:` option, or override per-request via the OpenAI `/v1/audio/speech` `voice` field.

Links

https://huggingface.co/LiquidAI/LFM2.5-Audio-1.5B

Tags

allenai_olmo-3.1-32b-think

The **Olmo-3.1-32B-Think** model is a large language model (LLM) optimized for efficient inference using quantized versions. It is a quantized version of the original **allenai/Olmo-3.1-32B-Think** model, developed by **bartowski** using the **imatrix** quantization method. ### Key Features: - **Base Model**: `allenai/Olmo-3.1-32B-Think` (unquantized version). - **Quantized Versions**: Available in multiple formats (e.g., `Q6_K_L`, `Q4_1`, `bf16`) with varying precision (e.g., Q8_0, Q6_K_L, Q5_K_M). These are derived from the original model using the **imatrix calibration dataset**. - **Performance**: Optimized for low-memory usage and efficient inference on GPUs/CPUs. Recommended quantization types include `Q6_K_L` (near-perfect quality) or `Q4_K_M` (default, balanced performance). - **Downloads**: Available via Hugging Face CLI. Split into multiple files if needed for large models. - **License**: Apache-2.0. ### Recommended Quantization: - Use `Q6_K_L` for highest quality (near-perfect performance). - Use `Q4_K_M` for balanced performance and size. - Avoid lower-quality options (e.g., `Q3_K_S`) unless specific hardware constraints apply. This model is ideal for deploying on GPUs/CPUs with limited memory, leveraging efficient quantization for practical use cases.

Links

https://huggingface.co/bartowski/allenai_Olmo-3.1-32B-Think-GGUF

Tags

qwen3-vl-8b-instruct

Qwen3-VL-8B-Instruct is the 8B parameter model of the Qwen3-VL series. Uses recommended default parameters according to Unsloth documentation for Qwen 3 VL.

Links

https://huggingface.co/unsloth/Qwen3-VL-8B-Instruct-GGUF

Tags

qwen3-vl-8b-thinking

Qwen3-VL-8B-Thinking is the 8B parameter model of the Qwen3-VL series that is thinking. Uses recommended default parameters according to Unsloth documentation for Qwen 3 VL.

Links

https://huggingface.co/unsloth/Qwen3-VL-8B-Thinking-GGUF

Tags

rfdetr-cpp-nano

RF-DETR Nano object detection model, served via the native rfdetr.cpp backend (ggml + purego, no Python). Q8_0 quantization is the recommended default for CPU: same accuracy as F16/F32, ~20MB on disk, fastest CPU latency. Pure C++/ggml runtime; no Python dependencies. Drop-in for the /v1/detection endpoint.

Links

Tags

locate-anything-3b

NVIDIA LocateAnything-3B open-vocabulary object detection (visual grounding), served via the native locate-anything.cpp backend (C++/ggml + purego, no Python). Describe what to find in a text prompt and get labeled boxes back; separate multiple categories with . Q8_0 is the recommended default: box-identical to F16/F32, ~6.3GB, fastest CPU latency. Drop-in for the /v1/detection endpoint (pass the prompt).

Links

Tags

depth-anything-3-base

Depth Anything 3 (base) monocular metric depth + camera pose, served via the native depth-anything.cpp backend (C++/ggml + purego, no Python at inference). Given an image it returns a dense depth map plus the recovered camera extrinsics (3x4) and intrinsics (3x3). Use GenerateImage (src -> normalized depth PNG at dst) or Predict (JSON depth stats + pose). q4_k is the recommended CPU default.

Links

Tags

depth-anything-3-base-q8_0

Depth Anything 3 (base), q8_0 — near-lossless 8-bit quant (~149 MB). Same depth + camera pose output as the q4_k default at higher fidelity.

Links

Tags

rfdetr-cpp-small

RF-DETR Small object detection model (DINOv2-small backbone, 512px input, 3 decoder layers), served via the native rfdetr.cpp backend (ggml + purego, no Python). A step up from Nano in accuracy while staying lightweight on CPU. F16 quantization is the recommended default: identical accuracy to F32 at roughly half the size. Drop-in for the /v1/detection endpoint.

Links

Tags

rfdetr-cpp-medium

RF-DETR Medium object detection model (DINOv2-small backbone, 576px input, 4 decoder layers), served via the native rfdetr.cpp backend. Balanced detection quality vs. CPU latency — recommended when Base is not accurate enough but Large is too slow. F16 quantization is the recommended default: identical accuracy to F32, half the size. Drop-in for the /v1/detection endpoint.

Links

Tags

rfdetr-cpp-large

RF-DETR Large object detection model (DINOv2-small backbone, 704px input, 4 decoder layers), served via the native rfdetr.cpp backend. Highest-accuracy detection variant — best for offline workflows and high-resolution inputs where CPU latency is secondary to recall. F16 quantization is the recommended default: identical accuracy to F32, half the size. Drop-in for the /v1/detection endpoint.

Links

Tags

rfdetr-cpp-seg-nano

RF-DETR Seg-Nano instance segmentation model (DINOv2-small backbone, 312px input, 4 decoder layers, 100 queries), served via the native rfdetr.cpp backend. Smallest segmentation variant — fastest CPU latency, ideal for edge deployment. Returns both bounding boxes and per-instance masks via the /v1/detection endpoint. F16 quantization is the recommended default: identical accuracy to F32, half the size.

Links

Tags

rfdetr-cpp-seg-small

RF-DETR Seg-Small instance segmentation model (DINOv2-small backbone, 384px input, 4 decoder layers, 100 queries), served via the native rfdetr.cpp backend. Step up from Seg-Nano in mask quality while staying CPU-friendly. Returns both bounding boxes and per-instance masks via the /v1/detection endpoint. F16 quantization is the recommended default: identical accuracy to F32, half the size.

Links

Tags

rfdetr-cpp-seg-medium

RF-DETR Seg-Medium instance segmentation model (DINOv2-small backbone, 432px input, 5 decoder layers, 200 queries), served via the native rfdetr.cpp backend. Balanced segmentation quality vs. CPU latency — recommended for everyday segmentation workloads. Returns both bounding boxes and per-instance masks via the /v1/detection endpoint. F16 quantization is the recommended default.

Links

Tags

rfdetr-cpp-seg-large

RF-DETR Seg-Large instance segmentation model (DINOv2-small backbone, 504px input, 5 decoder layers, 200 queries), served via the native rfdetr.cpp backend. Higher-resolution input than Seg-Medium for sharper mask boundaries. Returns both bounding boxes and per-instance masks via the /v1/detection endpoint. F16 quantization is the recommended default: identical accuracy to F32, half the size.

Links

Tags

rfdetr-cpp-seg-xlarge

RF-DETR Seg-XLarge instance segmentation model (DINOv2-small backbone, 624px input, 6 decoder layers, 300 queries), served via the native rfdetr.cpp backend. High-capacity segmentation variant with more queries and deeper decoder — best for dense scenes with many instances. Returns both bounding boxes and per-instance masks via the /v1/detection endpoint. F16 quantization is the recommended default.

Links

Tags

rfdetr-cpp-seg-2xlarge

RF-DETR Seg-2XLarge instance segmentation model (DINOv2-small backbone, 768px input, 6 decoder layers, 300 queries), served via the native rfdetr.cpp backend. Highest-accuracy segmentation variant — best for offline workflows and high-resolution inputs where CPU latency is secondary to mask quality. Returns both bounding boxes and per-instance masks via the /v1/detection endpoint. F16 quantization is the recommended default: identical accuracy to F32, half the size.

Links

Tags

qwen3-42b-a3b-stranger-thoughts-deep20x-abliterated-uncensored-i1

WARNING: NSFW. Vivid prose. INTENSE. Visceral Details. Violence. HORROR. GORE. Swearing. UNCENSORED... humor, romance, fun. Qwen3-42B-A3B-Stranger-Thoughts-Deep20x-Abliterated-Uncensored This repo contains the full precision source code, in "safe tensors" format to generate GGUFs, GPTQ, EXL2, AWQ, HQQ and other formats. The source code can also be used directly. ABOUT: Qwen's excellent "Qwen3-30B-A3B", abliterated by "huihui-ai" then combined Brainstorm 20x (tech notes at bottom of the page) in a MOE (128 experts) at 42B parameters (up from 30B). This pushes Qwen's abliterated/uncensored model to the absolute limit for creative use cases. Prose (all), reasoning, thinking ... all will be very different from reg "Qwen 3s". This model will generate horror, fiction, erotica, - you name it - in vivid, stark detail. It will NOT hold back. Likewise, regen(s) of the same prompt - even at the same settings - will create very different version(s) too. See FOUR examples below. Model retains full reasoning, and output generation of a Qwen3 MOE ; but has not been tested for "non-creative" use cases. Model is set with Qwen's default config: 40 k context 8 of 128 experts activated. Chatml OR Jinja Template (embedded) IMPORTANT: See usage guide / repo below to get the most out of this model, as settings are very specific. USAGE GUIDE: Please refer to this model card for Specific usage, suggested settings, changing ACTIVE EXPERTS, templates, settings and the like: How to maximize this model in "uncensored" form, with specific notes on "abliterated" models. Rep pen / temp settings specific to getting the model to perform strongly. https://huggingface.co/DavidAU/Qwen3-18B-A3B-Stranger-Thoughts-Abliterated-Uncensored-GGUF GGUF / QUANTS / SPECIAL SHOUTOUT: Special thanks to team Mradermacher for making the quants! https://huggingface.co/mradermacher/Qwen3-42B-A3B-Stranger-Thoughts-Deep20x-Abliterated-Uncensored-GGUF KNOWN ISSUES: Model may "mis-capitalize" word(s) - lowercase, where uppercase should be - from time to time. Model may add extra space from time to time before a word. Incorrect template and/or settings will result in a drop in performance / poor performance.

Links

Tags

qwen3-22b-a3b-the-harley-quinn

WARNING: MADNESS - UN HINGED and... NSFW. Vivid prose. INTENSE. Visceral Details. Violence. HORROR. GORE. Swearing. UNCENSORED... humor, romance, fun. Qwen3-22B-A3B-The-Harley-Quinn This repo contains the full precision source code, in "safe tensors" format to generate GGUFs, GPTQ, EXL2, AWQ, HQQ and other formats. The source code can also be used directly. ABOUT: A stranger, yet radically different version of Kalmaze's "Qwen/Qwen3-16B-A3B" with the experts pruned to 64 (from 128, the Qwen 3 30B-A3B version) and then I added 19 layers expanding (Brainstorm 20x by DavidAU info at bottom of this page) the model to 22B total parameters. The goal: slightly alter the model, to address some odd creative thinking and output choices. Then... Harley Quinn showed up, and then it was a party! A wild, out of control (sometimes) but never boring party. Please note that the modifications affect the entire model operation; roughly I adjusted the model to think a little "deeper" and "ponder" a bit - but this is a very rough description. That being said, reasoning and output generation will be altered regardless of your use case(s). These modifications pushes Qwen's model to the absolute limit for creative use cases. Detail, vividiness, and creativity all get a boost. Prose (all) will also be very different from "default" Qwen3. Likewise, regen(s) of the same prompt - even at the same settings - will create very different version(s) too. The Brainstrom 20x has also lightly de-censored the model under some conditions. However, this model can be prone to bouts of madness. It will not always behave, and it will sometimes go -wildly- off script. See 4 examples below. Model retains full reasoning, and output generation of a Qwen3 MOE ; but has not been tested for "non-creative" use cases. Model is set with Qwen's default config: 40 k context 8 of 64 experts activated. Chatml OR Jinja Template (embedded) Four example generations below. IMPORTANT: See usage guide / repo below to get the most out of this model, as settings are very specific. If not set correctly, this model will not work the way it should. Critical settings: Chatml or Jinja Template (embedded, but updated version at repo below) Rep pen of 1.01 or 1.02 ; higher (1.04, 1.05) will result in "Harley Mode". Temp range of .6 to 1.2. ; higher you may need to prompt the model to "output" after thinking. Experts set at 8-10 ; higher will result in "odder" output BUT it might be better. That being said, "Harley Quinn" may make her presence known at any moment. USAGE GUIDE: Please refer to this model card for Specific usage, suggested settings, changing ACTIVE EXPERTS, templates, settings and the like: How to maximize this model in "uncensored" form, with specific notes on "abliterated" models. Rep pen / temp settings specific to getting the model to perform strongly. https://huggingface.co/DavidAU/Qwen3-18B-A3B-Stranger-Thoughts-Abliterated-Uncensored-GGUF GGUF / QUANTS / SPECIAL SHOUTOUT: Special thanks to team Mradermacher for making the quants! https://huggingface.co/mradermacher/Qwen3-22B-A3B-The-Harley-Quinn-GGUF KNOWN ISSUES: Model may "mis-capitalize" word(s) - lowercase, where uppercase should be - from time to time. Model may add extra space from time to time before a word. Incorrect template and/or settings will result in a drop in performance / poor performance. Can rant at the end / repeat. Most of the time it will stop on its own. Looking for the Abliterated / Uncensored version? https://huggingface.co/DavidAU/Qwen3-23B-A3B-The-Harley-Quinn-PUDDIN-Abliterated-Uncensored In some cases this "abliterated/uncensored" version may work better than this version. EXAMPLES Standard system prompt, rep pen 1.01-1.02, topk 100, topp .95, minp .05, rep pen range 64. Tested in LMStudio, quant Q4KS, GPU (CPU output will differ slightly). As this is the mid range quant, expected better results from higher quants and/or with more experts activated to be better. NOTE: Some formatting lost on copy/paste. WARNING: NSFW. Vivid prose. INTENSE. Visceral Details. Violence. HORROR. GORE. Swearing. UNCENSORED... humor, romance, fun.

Links

Tags

planetoid_27b_v.2

This is a merge of pre-trained gemma3 language models Goal of this merge was to create good uncensored gemma 3 model good for assistant and roleplay, with uncensored vision. First, vision: i dont know is it normal, but it slightly hallucinate (maybe q3 is too low?), but lack any refusals and otherwise work fine. I used default gemma 3 27b mmproj. Second, text: it is slow on my hardware, slower than 24b mistral, speed close to 32b QWQ. Model is smart even on q3, responses are adequate in length and are interesting to read. Model is quite attentive to context, tested up to 8k - no problems or degradation spotted. (beware of your typos, it will copy yours mistakes) Creative capabilities are good too, model will create good plot for you, if you let it. Model follows instructions fine, it is really good in "adventure" type of cards. Russian is supported, is not too great, maybe on higher quants is better. Refusals was not encountered. However, i find this model not unbiased enough. It is close to neutrality, but i want it more "dark". Positivity highly depends on prompts. With good enough cards model can do wonders. Tested on Q3_K_L, t 1.04.

Links

Tags

Model Gallery

Find Your Perfect Model

Filter by Model Type

Browse by Tags

qwen3.5-27b-claude-4.6-opus-reasoning-distilled-heretic-i1

lfm2.5-audio-1.5b-tts

allenai_olmo-3.1-32b-think

qwen3-vl-8b-instruct

qwen3-vl-8b-thinking

rfdetr-cpp-nano

locate-anything-3b

depth-anything-3-base

depth-anything-3-base-q8_0

rfdetr-cpp-small

rfdetr-cpp-medium

rfdetr-cpp-large

rfdetr-cpp-seg-nano

rfdetr-cpp-seg-small

rfdetr-cpp-seg-medium

rfdetr-cpp-seg-large

rfdetr-cpp-seg-xlarge

rfdetr-cpp-seg-2xlarge

qwen3-42b-a3b-stranger-thoughts-deep20x-abliterated-uncensored-i1

qwen3-22b-a3b-the-harley-quinn

planetoid_27b_v.2