Discover and install AI models from our curated collection
Repository: localaiLicense: apache-2.0
๐ช Qwopus3.6-27B-v2 SFT Release Reasoning-Enhanced Dense Language Model Fine-Tuned on Qwen3.6-27B ๐งฌ Trace Inversion & Negentropy ๐ง 27B Parameters ๐ฅ 3-Stage Curriculum SFT ๐ ๏ธ Vision & Tool-use Support ๐ก What is Qwopus3.6-27B-v2? ๐ช Qwopus3.6-27B-v2 is a reasoning-enhanced dense language model built on top of Qwen3.6-27B. By leveraging a multi-stage curriculum learning pipeline and augmented with Trace Inversion datasets (claude-opus-4.6/4.7-traceInversion), it reverse-engineers the compressed "Reasoning Bubbles" of commercial LLMs into structured, step-by-step synthetic reasoning traces, successfully eliminating logical shortcuts and knowledge fractures. ๐งฉ Structured Reasoning Injects reconstructed deep CoT chains to eliminate logical shortcuts via Trace Inversion. ๐ชถ Style Consistency Enforces strict constraints on the format and convergence of <think> tags. ๐ Distillation Alignment Ensures high-quality cross-source SFT data alignment to narrow the capacity gap. โก RL Scalability Sets up a stable formatting pipeline optimized for downstream Reinforcement Learning (RL). ## ๐ก 1. Base Model, Training Library & Cooperation ...
Links
Tags
Repository: localaiLicense: apache-2.0

# ๐ Qwopus3.5-9B-v3.5 ## ๐ก Model Overview & v3.5 Design Qwopus3.5-9B-v3.5 is a **data-scaled continuation** of the Qwopus3.5-9B-v3 model. The training data in v3.5 is expanded to cover a broader range of domains, including mathematics, programming, puzzle-solving, multilingual dialogue, instruction-following, multi-turn interactions, and STEM-related tasks. Qwopus3.5-9B-v3.5 is a reasoning-enhanced model based on **Qwen3.5-9B**, designed for: - ๐งฉ Structured reasoning - ๐ง Tool-augmented workflows - ๐ Multi-step agentic tasks - โก Token-efficient inference Compared with Qwopus3.5-9B-v3, **3.5 version does not introduce a new architecture, RL stage, or template redesign**. This version is trained with approximately **2ร more SFT data**. ## ๐ฏ Motivation & Generalization Insight The motivation behind v3.5 comes from a simple observation: > This work is motivated by the hypothesis that scaling high-quality SFT data may further enhance the generalization ability of large language models. In earlier Qwopus3.5 experiments, structured reasoning was observed to improve both **accuracy and efficiency**: ...
Links
Tags
Repository: localaiLicense: apache-2.0
๐ช Qwopus3.6-27B-v2-MTP MTP Release Multi-Token Prediction reasoning model fine-tuned from Qwen3.6-27B ๐งฌ Trace Inversion & Negentropy ๐ง 27B Parameters โก Speculative Decoding ๐ ๏ธ Coding / DevOps / Math ๐ก What is Qwopus3.6-27B-v2-MTP? ๐ช Qwopus3.6-27B-v2-MTP is a speed-oriented reasoning release built on top of Qwen3.6-27B. It keeps the Qwopus line's focus on reconstructed reasoning traces, coding discipline, DevOps procedures, and mathematical derivations, while adding Multi-Token Prediction for faster generation. The goal is simple: preserve the depth and structure of a 27B reasoning model while making real interactive use noticeably faster. โก MTP DecodingAuxiliary future-token prediction improves throughput on long reasoning, code, math, and strict-format prompts. ๐งฉ Structured ReasoningInherits the Qwopus training recipe built around reconstructed step-by-step reasoning trajectories. ๐งช GB10 TestedValidated on a 30-question local benchmark across Logic, Coding, DevOps, Math, and Edge tasks. ๐ Practical SpeedDesigned for workflows where strong answers matter, but waiting several extra minutes per task does not. ...
Links
Tags
Repository: localaiLicense: gemma

Google Gemma 4 E2B-IT served by SGLang with Multi-Token Prediction (MTP) speculative decoding. The companion drafter google/gemma-4-E2B-it-assistant lets the target accept several tokens per step. Flags are a 1:1 transcription of the SGLang cookbook's MTP command (NEXTN algorithm, num_steps=5, num_draft_tokens=6, eagle_topk=1, mem_fraction_static=0.85). The E2B variant has 5B total / 2B effective parameters and targets the smaller end of consumer GPUs.
Links
Tags
Repository: localaiLicense: gemma

Google Gemma 4 E4B-IT served by SGLang with Multi-Token Prediction (MTP) speculative decoding. The companion drafter google/gemma-4-E4B-it-assistant lets the target accept several tokens per step. Flags are a 1:1 transcription of the SGLang cookbook's MTP command (NEXTN algorithm, num_steps=5, num_draft_tokens=6, eagle_topk=1, mem_fraction_static=0.85). The E4B variant has 8B total / 4B effective parameters โ the natural pick for consumer GPUs in the 16โ24 GB range.
Links
Tags
Repository: localaiLicense: mit

Xiaomi MiMo-7B-RL served by SGLang with built-in Multi-Token Prediction (MTP) heads (no separate drafter needed) plus online fp8 weight quantization to fit on a 16 GB consumer GPU. ~90% acceptance per the model card. Verified end-to-end at ~88 tok/s on an RTX 5070 Ti (16 GB). Note: mem_fraction_static is dropped to 0.7 (vs sglang's 0.85 default) because the MTP draft worker's vocab embedding is loaded unquantised (~1.2 GiB) and OOMs the static reservation otherwise.
Links
Tags