Krea 2: Open-Weights Image Model That Caught the Frontier

The closed frontier just got company. On June 22, 2026, Krea released the weights of Krea 2 — a 12.9-billion-parameter diffusion transformer trained from scratch on billions of real images — and the Hacker News thread hit 348 points within hours.

📖 Read the full version with charts and embedded sources on ComputeLeap →

The release ships as two complementary checkpoints: Krea 2 Raw, an undistilled base model built for fine-tuning and LoRA training, and Krea 2 Turbo, an 8-step distilled engine that generates 2K images in roughly two seconds on consumer hardware. Both are available on Hugging Face under a community license that allows free commercial use for individuals and small teams.

What makes this release different from the usual open-weights drop is the depth of what came with it. Krea published a full technical report detailing everything from data curation philosophy to distributed training infrastructure — the kind of document that frontier labs typically keep behind closed doors.

What Krea 2 Actually Is

At its core, Krea 2 is a single-stream diffusion transformer. The architecture uses a 12.9B dense DiT backbone with 28 transformer blocks at width 6144, grouped-query attention with gated sigmoid attention, SwiGLU MLPs at 4x expansion, and 3D axial RoPE for positional encoding.

The two-checkpoint system is intentional. Raw is the undistilled mid-training checkpoint — diverse, malleable, and designed specifically for researchers and fine-tuners to customize. Turbo is the production engine: an 8-step distilled version that runs with zero classifier-free guidance overhead.

ℹ️ Krea 2 ranks #1 among text-to-image models from independent labs on Artificial Analysis, and sits within 0.14 points of GPT Image 2 on style fidelity.

What the Technical Report Reveals

No Synthetic Data, by Design

The team explicitly rejects synthetic training data. Their position: "even a small proportion of AI-generated images introduces biases" that degrade output diversity. Instead, they built a multi-stage pipeline that processes billions of real images through increasingly selective filters.

A Six-Stage Training Pipeline

Pretraining — progressive resolution from 256px to 1024px, using 8-bit training at lower resolutions for 15–20% speed gains
Midtraining — bridges pretraining to SFT
Supervised Fine-Tuning — small, hand-curated datasets
Preference Optimization — STPO (Stabilized Temporal Preference Optimization)
Reinforcement Learning — multi-reward GRPO with four independent signals
Timestep Distillation — creates the Turbo checkpoint via TDM

Rubric-Based RL Rewards

Instead of asking a judge model for a single holistic score, the system decomposes each prompt into individually verifiable requirements. This prevents reward hacking and adds a dedicated artifact reward model that catches structural errors.

How to Actually Run Krea 2 Locally

The Quick Path: ComfyUI + FP8

The fastest route is through ComfyUI. Community FP8-quantized weights shrink the transformer from 24.76 GiB to 12.01 GiB, fitting it on a 16GB GPU.

Minimum hardware:

GPU: 16GB VRAM (RTX 4060 Ti 16GB, RTX 5080, RTX 4090)
System RAM: 16GB minimum, 32GB recommended
Storage: ~18GB for model files

Setup in three steps:

Update ComfyUI to 0.25.0+
Download FP8 model files from Comfy-Org/Krea-2
Load the native workflow JSON — no custom nodes required

The Cloud Path

Day-zero integrations are already live on fal, Replicate, Together AI, Cloudflare, and SGLang.

💡 For LoRA fine-tuning, train on Raw and deploy on Turbo. The transfer is specifically engineered — LoRAs trained on Raw "transfer strongly to Turbo" for production inference.

The License Reality

The Krea 2 Community License allows free commercial use if your annual revenue is under $1M and you have fewer than 50 seats. Enterprise licensing is required above either threshold. Content filtering is mandatory.

What This Means for the Open-Weights Race

Krea 2 is the strongest evidence yet that the closed-vs-open gap in image generation is compressing. A 12B model from an independent lab now sits within 0.14 points of GPT Image 2 on quality benchmarks, runs at comparable speed, and ships with style-control that closed APIs still lack.

The technical report's roadmap hints at MoE architectures, native 2K–4K resolution with sparse attention, and NVFP4 training for further efficiency gains.

Originally published at ComputeLeap