Discover and install AI models from our curated collection

A text-to-image model that uses Stable Diffusion 1.5 to generate images from text prompts. This model is DreamShaper model by Lykon.
Links
Tags
Stable Diffusion 3 Medium is a Multimodal Diffusion Transformer (MMDiT) text-to-image model that features greatly improved performance in image quality, typography, complex prompt understanding, and resource-efficiency.
Links
Tags
Repository: localaiLicense: apache-2.0
Wan 2.1 T2V 1.3B — text-to-video diffusion model, GGUF-quantized for the stable-diffusion.cpp backend. Generates short (33-frame) 832x480 clips from a text prompt. Cheapest Wan variant, suitable for CPU-offloaded inference with ~10 GB of usable RAM.
Links
Tags
Stable Diffusion 3.5 Medium is a Multimodal Diffusion Transformer (MMDiT) text-to-image model that features improved performance in image quality, typography, complex prompt understanding, and resource-efficiency.
Links
Tags
Stable Diffusion 3.5 Large is a Multimodal Diffusion Transformer (MMDiT) text-to-image model that features improved performance in image quality, typography, complex prompt understanding, and resource-efficiency.
Links
Tags
Repository: localaiLicense: ideogram-non-commercial-model-agreement
Ideogram 4 is a text-to-image diffusion model known for state-of-the-art prompt adherence and exceptional, accurate text rendering inside images. It is driven by a Qwen3-VL-8B text encoder and performs real classifier-free guidance from a separate unconditional diffusion model. This is the iQ4_NL (4-bit) quantization, a good balance of quality and footprint (~5.8GB diffusion + ~5.8GB unconditional). The bundle also pulls the Qwen3-VL-8B-Instruct text encoder and the FLUX.2 VAE. Quantized GGUF weights by stduhpf for use with stable-diffusion.cpp.
Links
Tags
Repository: localaiLicense: ideogram-non-commercial-model-agreement
Ideogram 4 is a text-to-image diffusion model known for state-of-the-art prompt adherence and exceptional, accurate text rendering inside images. It is driven by a Qwen3-VL-8B text encoder and performs real classifier-free guidance from a separate unconditional diffusion model. This is the Q8_0 (8-bit) quantization for highest quality (~10.1GB diffusion + ~10.1GB unconditional). The bundle also pulls the Qwen3-VL-8B-Instruct text encoder and the FLUX.2 VAE. Quantized GGUF weights by stduhpf for use with stable-diffusion.cpp.
Links
Tags
Repository: localaiLicense: ltx-2-community-license-agreement
LTX-2.3 22B dev - DiT-based audio-video foundation model from Lightricks, GGUF-quantized for the stable-diffusion.cpp backend. Generates synchronized video and audio from a text prompt (T2V), a reference image (I2V), or first/last frame pairs (FLF2V). Uses gemma-3-12b-it as the text encoder and ships dedicated video and audio VAEs plus an embeddings_connectors safetensors that bridges the LLM hidden states to the diffusion model. This entry uses the dynamic (UD) Q4_K_M quantization of the 22B model (~16 GB) paired with the UD-Q4_K_XL QAT Gemma encoder (~7.4 GB). Recommended generation: width=1280, height=720, video_frames=33, fps=24, sampler=euler, cfg_scale=6.0.
Links
Tags
Repository: localaiLicense: ltx-2-community-license-agreement
LTX-2.3 22B distilled - faster student of the dev model, GGUF-quantized for the stable-diffusion.cpp backend. Trades a small amount of quality for substantially fewer sampling steps, making it the right pick for iterative previews and CPU-offloaded inference. Same input modalities as the dev entry (T2V / I2V / FLF2V) and the same gemma-3-12b-it text encoder. This entry uses the dynamic (UD) Q4_K_M quantization of the 22B distilled model (~16.3 GB). Recommended generation: width=1280, height=720, video_frames=33, fps=24, sampler=euler, cfg_scale=6.0.
Links
Tags