Backend Management

Discover and install AI backends to power your models

780 backends available
Documentation

Find Backend Components

Filter by Backend Type

llama-cpp
LLM inference in C/C++

Repository: localaiLicense: mit

ik-llama-cpp
Fork of llama.cpp optimized for CPU performance by ikawrakow

Repository: localaiLicense: mit

turboquant
Fork of llama.cpp adding the TurboQuant KV-cache quantization scheme. Reuses the LocalAI llama.cpp gRPC server sources against the fork's libllama.

Repository: localaiLicense: mit

ds4
antirez/ds4 - DeepSeek V4 Flash inference engine. Single-model, optimized for Metal (Darwin) and CUDA (Linux). Requires the GGUFs published at huggingface.co/antirez/deepseek-v4-gguf.

Repository: localaiLicense: mit

whisper
Port of OpenAI's Whisper model in C/C++

Repository: localaiLicense: mit

crispasr
CrispASR unified speech engine (whisper.cpp fork on ggml) supporting many ASR architectures (Parakeet, Canary, Voxtral, Qwen3-ASR, Granite, Wav2Vec2, Moonshine, OmniASR, FireRedASR, and more).

Repository: localaiLicense: mit

parakeet-cpp
parakeet.cpp is a C++/ggml port of NVIDIA NeMo Parakeet automatic speech recognition (ASR) models. It supports the tdt, ctc, rnnt and hybrid decoder families as well as cache-aware streaming transcription, and runs on CPU, NVIDIA CUDA, AMD ROCm/HIP, Intel SYCL and NVIDIA Jetson (L4T) targets.

Repository: localaiLicense: mit

voxtral
Voxtral Realtime 4B Pure C speech-to-text inference engine

Repository: localaiLicense: mit

stablediffusion-ggml
Stable Diffusion and Flux in pure C/C++

Repository: localaiLicense: mit

rfdetr
RF-DETR is a real-time, transformer-based object detection model architecture developed by Roboflow and released under the Apache 2.0 license. RF-DETR is the first real-time model to exceed 60 AP on the Microsoft COCO benchmark alongside competitive performance at base sizes. It also achieves state-of-the-art performance on RF100-VL, an object detection benchmark that measures model domain adaptability to real world problems. RF-DETR is fastest and most accurate for its size when compared current real-time objection models. RF-DETR is small enough to run on the edge using Inference, making it an ideal model for deployments that need both strong accuracy and real-time performance.

Repository: localaiLicense: apache-2.0

insightface
Face recognition backend powered by `insightface` (ONNX Runtime). Provides face verification (/v1/face/verify), face analysis (/v1/face/analyze), face embedding (/v1/embeddings), face detection (/v1/detection), and 1:N identification (/v1/face/{register,identify,forget}). Ships two engines in a single image: one that drives the insightface model packs (buffalo_l/s/m/sc, antelopev2 — non-commercial research use only) and one that drives OpenCV Zoo's YuNet + SFace pair (Apache 2.0 — commercial-safe). Select via `options: ["engine:..."]` in your model YAML, or install one of the ready-made model-gallery entries under the `insightface-*` prefix. The backend image contains only code and Python deps; all model weights are managed by LocalAI's gallery download mechanism.

Repository: localaiLicense: mixed

sam3-cpp
Segment Anything Model (SAM 3/2/EdgeTAM) in C/C++ using GGML. Supports text-prompted and point/box-prompted image segmentation.

Repository: localaiLicense: mit

rfdetr-cpp
Native RF-DETR object detection and instance segmentation in C/C++ using GGML. Loads pre-built GGUF weights from the mudler/rfdetr-cpp-* family (Nano/Small/Base/Medium/Large + SegNano/SegSmall/SegMedium) and returns bounding boxes, class labels, confidence scores, and (for segmentation variants) PNG-encoded per-detection masks.

Repository: localaiLicense: apache-2.0

locate-anything
Open-vocabulary object detection and visual grounding (NVIDIA LocateAnything-3B) in C/C++ using GGML. Loads pre-built GGUF weights and, given an image and a free-form text prompt, returns bounding boxes, class labels, and confidence scores for the referred objects.

Repository: localaiLicense: apache-2.0

locate-anything-development
Open-vocabulary object detection and visual grounding (NVIDIA LocateAnything-3B) in C/C++ using GGML. Loads pre-built GGUF weights and, given an image and a free-form text prompt, returns bounding boxes, class labels, and confidence scores for the referred objects.

Repository: localaiLicense: apache-2.0

cpu-locate-anything-cpp
Open-vocabulary object detection and visual grounding (NVIDIA LocateAnything-3B) in C/C++ using GGML. Loads pre-built GGUF weights and, given an image and a free-form text prompt, returns bounding boxes, class labels, and confidence scores for the referred objects.

Repository: localaiLicense: apache-2.0

cpu-locate-anything-cpp-development
Open-vocabulary object detection and visual grounding (NVIDIA LocateAnything-3B) in C/C++ using GGML. Loads pre-built GGUF weights and, given an image and a free-form text prompt, returns bounding boxes, class labels, and confidence scores for the referred objects.

Repository: localaiLicense: apache-2.0

cuda12-locate-anything-cpp
Open-vocabulary object detection and visual grounding (NVIDIA LocateAnything-3B) in C/C++ using GGML. Loads pre-built GGUF weights and, given an image and a free-form text prompt, returns bounding boxes, class labels, and confidence scores for the referred objects.

Repository: localaiLicense: apache-2.0

cuda12-locate-anything-cpp-development
Open-vocabulary object detection and visual grounding (NVIDIA LocateAnything-3B) in C/C++ using GGML. Loads pre-built GGUF weights and, given an image and a free-form text prompt, returns bounding boxes, class labels, and confidence scores for the referred objects.

Repository: localaiLicense: apache-2.0

cuda13-locate-anything-cpp
Open-vocabulary object detection and visual grounding (NVIDIA LocateAnything-3B) in C/C++ using GGML. Loads pre-built GGUF weights and, given an image and a free-form text prompt, returns bounding boxes, class labels, and confidence scores for the referred objects.

Repository: localaiLicense: apache-2.0

cuda13-locate-anything-cpp-development
Open-vocabulary object detection and visual grounding (NVIDIA LocateAnything-3B) in C/C++ using GGML. Loads pre-built GGUF weights and, given an image and a free-form text prompt, returns bounding boxes, class labels, and confidence scores for the referred objects.

Repository: localaiLicense: apache-2.0

Page 1 of 38