Model Gallery

Discover and install AI models from our curated collection

367 models available

1 repositories

Documentation

Find Your Perfect Model

Filter by Model Type

Browse by Tags

nemotron-3-nano-omni-30b-a3b-reasoning-apex

# Model Overview ### Description: NVIDIA Nemotron 3 Nano Omni is a multimodal large language model that unifies video, audio, image, and text understanding to support enterprise-grade Q&A, summarization, transcription, and document intelligence workflows. It extends the Nemotron Nano family with integrated video+speech comprehension, Graphical User Interface (GUI), Optical Character Recognition (OCR), and speech transcription capabilities, enabling end-to-end processing of rich enterprise content such as meeting recordings, M&E assets, training videos, and complex business documents. NVIDIA Nemotron 3 Nano Omni was developed by NVIDIA as part of the Nemotron model family. This model is available for commercial use. This model was improved using Qwen3-VL-30B-A3B-Instruct, Qwen3.5-122B-A10B, Qwen3.5-397B-A17B, Qwen2.5-VL-72B-Instruct, and gpt-oss-120b. For more information, please see the Training Dataset section below. ### License/Terms of Use Governing Terms: Use of this model is governed by the NVIDIA Open Model Agreement ### Deployment Geography: Global ...

Repository: localaiLicense: other

nemo-parakeet-tdt-0.6b

NVIDIA NeMo Parakeet TDT 0.6B v3 is an automatic speech recognition (ASR) model from NVIDIA's NeMo toolkit. Parakeet models are state-of-the-art ASR models trained on large-scale English audio data.

Repository: localaiLicense: cc-by-4.0

voxtral-mini-4b-realtime

Voxtral Mini 4B Realtime is a speech-to-text model from Mistral AI. It is a 4B parameter model optimized for fast, accurate audio transcription with low latency, making it ideal for real-time applications. The model uses the Voxtral architecture for efficient audio processing.

Repository: localaiLicense: apache-2.0

moonshine-tiny

Moonshine Tiny is a lightweight speech-to-text model optimized for fast transcription. It is designed for efficient on-device ASR with high accuracy relative to its size.

Repository: localaiLicense: apache-2.0

whisperx-tiny

WhisperX Tiny is a fast and accurate speech recognition model with speaker diarization capabilities. Built on OpenAI's Whisper with additional features for alignment and speaker segmentation.

Repository: localaiLicense: mit

omnilingual-0.3b-ctc-q8-sherpa

Omnilingual ASR CTC 300M (int8) is a multilingual automatic speech recognition model supporting 1,600+ languages. Based on Meta's omniASR_CTC_300M architecture (Wav2Vec2 with CTC head), quantized to int8 for efficient inference. Uses the sherpa-onnx backend with ONNX Runtime.

Repository: localaiLicense: apache-2.0

streaming-zipformer-en-sherpa

Streaming English ASR: sherpa-onnx zipformer transducer (int8, chunk-16 left-128). Low-latency real-time transcription with endpoint detection via sherpa-onnx's online recognizer. English-only; for multilingual offline ASR see omnilingual-0.3b-ctc-q8-sherpa.

Repository: localaiLicense: apache-2.0

silero-vad-sherpa

Silero VAD served through the sherpa-onnx backend. Uses the same ONNX weights as the dedicated silero-vad backend, loaded through sherpa-onnx's C VAD API. Pairs with the sherpa-onnx ASR entries for round-trip audio pipelines.

Repository: localaiLicense: mit

vits-ljs-sherpa

VITS-LJS English single-speaker TTS served through the sherpa-onnx backend. Trained on the LJSpeech corpus at 22.05 kHz. Pairs with the sherpa-onnx ASR entries for round-trip audio pipelines.

Repository: localaiLicense: apache-2.0

vits-piper-it_IT-paola-sherpa

Italian (it_IT) single-speaker Piper VITS voice "paola" (medium quality, 22.05 kHz), served through the sherpa-onnx backend with native streaming TTS. Ships espeak-ng phonemization data, so it works for Italian out of the box.

Repository: localaiLicense: other

vits-piper-it_IT-dii-high-sherpa

Italian (it_IT) single-speaker Piper VITS voice "dii" (high quality, 22.05 kHz), served through the sherpa-onnx backend with native streaming TTS. Ships espeak-ng phonemization data. Non-commercial use only (CC BY-NC-SA 4.0).

Repository: localaiLicense: cc-by-nc-sa-4.0

vits-piper-it_IT-miro-high-sherpa

Italian (it_IT) single-speaker Piper VITS voice "miro" (high quality, 22.05 kHz), served through the sherpa-onnx backend with native streaming TTS. Ships espeak-ng phonemization data. Non-commercial use only (CC BY-NC-SA 4.0).

Repository: localaiLicense: cc-by-nc-sa-4.0

vits-piper-it_IT-riccardo-x_low-sherpa

Italian (it_IT) single-speaker Piper VITS voice "riccardo" (x-low quality, 16 kHz), served through the sherpa-onnx backend with native streaming TTS. Ships espeak-ng phonemization data.

Repository: localaiLicense: other

vits-piper-en_US-amy-sherpa

English (en_US) single-speaker Piper VITS voice "amy" (medium quality, 22.05 kHz), served through the sherpa-onnx backend with native streaming TTS. Ships espeak-ng phonemization data.

Repository: localaiLicense: other

vits-piper-es_ES-davefx-sherpa

Spanish (es_ES) single-speaker Piper VITS voice "davefx" (medium quality, 22.05 kHz), served through the sherpa-onnx backend with native streaming TTS. Ships espeak-ng phonemization data.

Repository: localaiLicense: cc0-1.0

vits-piper-fr_FR-siwis-sherpa

French (fr_FR) single-speaker Piper VITS voice "siwis" (medium quality, 22.05 kHz), served through the sherpa-onnx backend with native streaming TTS. Ships espeak-ng phonemization data.

Repository: localaiLicense: cc-by-4.0

vits-piper-de_DE-thorsten-sherpa

German (de_DE) single-speaker Piper VITS voice "thorsten" (medium quality, 22.05 kHz), served through the sherpa-onnx backend with native streaming TTS. Ships espeak-ng phonemization data.

Repository: localaiLicense: cc0-1.0

vits-piper-en_GB-alan-low-sherpa

English (en_GB) single-speaker Piper VITS voice "alan" (low quality, 16 kHz), served through the sherpa-onnx backend with native streaming TTS. Ships espeak-ng phonemization data.

Repository: localaiLicense: other

vits-piper-en_GB-alan-medium-sherpa

English (en_GB) single-speaker Piper VITS voice "alan" (medium quality, 22.05 kHz), served through the sherpa-onnx backend with native streaming TTS. Ships espeak-ng phonemization data.

Repository: localaiLicense: other

vits-piper-en_GB-alba-medium-sherpa

English (en_GB) single-speaker Piper VITS voice "alba" (medium quality, 22.05 kHz), served through the sherpa-onnx backend with native streaming TTS. Ships espeak-ng phonemization data.

Repository: localaiLicense: cc-by-4.0

vits-piper-en_GB-aru-medium-sherpa

English (en_GB) multi-speaker (12 voices) Piper VITS voice "aru" (medium quality, 22.05 kHz), served through the sherpa-onnx backend with native streaming TTS. Ships espeak-ng phonemization data. Pick a speaker with the numeric voice/speaker id.

Repository: localaiLicense: cc-by-4.0

Page 1 of many