Codestral 2 as your Cursor and Cline backend in 2026: Apache 2.0, $0.30/M tokens, 256K context, and whether it beats Gemini 3.5 Flash for daily coding

This article was originally published on aicoderscope.com

TL;DR: Codestral 2 went Apache 2.0 on April 8, 2026, which makes it the cheapest legally-clean-to-self-host coding model worth wiring into your editor. At $0.30/M input via Mistral's API it slots into Cursor Chat, Cline, and Continue.dev in about ten minutes. Its real edge is fill-in-the-middle autocomplete, not agentic reasoning — so pick it for tab completion and privacy, not for multi-step Cline runs.

	Codestral 2	DeepSeek V4-Flash	Gemini 3.5 Flash
Best for	FIM autocomplete + self-host	Agentic Cline work, cheapest	Balanced cloud agent
Price (input / output per M)	$0.30 / $0.90	$0.14 / $0.435	$1.50 / ~$6
License	Apache 2.0 (self-host free)	MIT (self-host free)	Proprietary (API only)
Context window	256K	1M	1M
Params	22B dense	MoE (cloud)	proprietary
The catch	Weaker at multi-step agentic tasks	Thinking mode breaks Cline if left on	No self-host, no FIM endpoint

Honest take: If you want the best inline autocomplete you can legally run on your own GPU, Codestral 2 is the pick — wire it into Continue.dev's FIM slot. If you want a chat/agent backend for Cline, DeepSeek V4-Flash is both cheaper and stronger. Don't use Codestral 2 for heavy agent loops just because it's open.

What actually changed in April 2026

Codestral has existed since May 2024, but the version that matters is Codestral 2, released April 8, 2026. The headline isn't a benchmark bump — it's the license. The original Codestral shipped under the Mistral Non-Production License, which barred commercial use in your product. Codestral 2 is Apache 2.0. That single change is why it's worth a fresh look: you can now self-host it inside a commercial product, ship it on a private server, or run it on a workstation GPU without a lawyer in the loop.

The model itself is a 22-billion-parameter dense transformer (not a mixture-of-experts), with a 256K-token context window and support for 80+ languages. Mistral reports 86.6% on HumanEval and 91.2% on MBPP, with native fill-in-the-middle (FIM) training — the thing that makes inline autocomplete feel native rather than bolted on.

The "dense, not MoE" detail matters more than it looks. A 22B dense model has predictable VRAM and throughput. You're not juggling 384 experts like Kimi K2.7 or a 671B sparse stack like DeepSeek's flagship. At Q4_K_M the weights are roughly 9 GB, so it fits on a single 16 GB card with room for a modest context window. (For the full 256K context you'll need far more — that's a server-class ask, not a laptop one. The runaihome.com local coding LLM guide has the VRAM math by GPU tier.)

Two ways to run it

You have two paths, and they map to different goals:

Mistral API (api.mistral.ai) — fastest, zero hardware, $0.30/M in. Use this if you just want a cheap, capable chat/edit backend and don't care where the tokens go.
Self-hosted via Ollama or vLLM — slower on consumer hardware, but the code never leaves your machine. This is the Apache-2.0 payoff. Use it for client code under NDA or air-gapped work.

Pull the local copy first if you want to test offline:

$ ollama pull codestral
pulling manifest
pulling 0bbfda8e64c1... 100%  ▕████████████████▏  12 GB
pulling f5 db17... 100%  ▕████████████████▏  559 B
success

$ ollama run codestral "write a Python function that returns the nth Fibonacci number iteratively"
def fib(n: int) -> int:
    a, b = 0, 1
    for _ in range(n):
        a, b = b, a + b
    return a

Tested with Ollama 0.12.x on June 19, 2026. On a single RTX 4090 the Q4_K_M build runs around 45–55 tokens/sec for short completions, which is fine for chat and edits but noticeably slower than a cloud call for long agent loops.

If you're going cloud, grab a key from console.mistral.ai and smoke-test it:

$ curl -s https://api.mistral.ai/v1/chat/completions \
  -H "Authorization: Bearer $MISTRAL_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"codestral-latest","messages":[{"role":"user","content":"say ok"}]}' \
  | python3 -c "import sys,json;print(json.load(sys.stdin)['choices'][0]['message']['content'])"
ok

codestral-latest is the rolling alias; pin the dated version if you want reproducibility.

Wiring it into Cline

Cline takes any OpenAI-compatible endpoint, so the Mistral API drops straight in.

Open the Cline panel → Settings (gear icon).
API Provider: choose OpenAI Compatible.
Base URL: https://api.mistral.ai/v1
API Key: your Mistral key.
Model ID: codestral-latest
Save, then start a task.

That's the whole setup. Where it gets interesting is what to use it for. Codestral 2 is a code-specialist, not a generalist agent. On a single "edit this function" task it's excellent. On a 12-step Cline plan — read three files, run a test, parse the failure, patch, re-run — it loses the thread sooner than DeepSeek V4-Flash or Gemini 3.5 Flash. If your Cline workflow is mostly "apply this focused change," Codestral 2 is great and cheap. If it's "figure out why the integration test flakes and fix it," reach for DeepSeek V4-Flash instead.

One practical note: unlike DeepSeek V4-Flash, Codestral 2 has no separate "thinking mode" to disable, so you skip the tool-call loop trap that bites Cline users on reasoning models. It just answers.

Wiring it into Cursor (and the Tab caveat)

Cursor lets you override the OpenAI base URL, which routes Chat and Cmd-K through Codestral 2:

Settings → Models.
Scroll to OpenAI API Key, expand the override.
Base URL: https://api.mistral.ai/v1
Paste your Mistral key, click Verify.
Add a custom model named codestral-latest and enable it.

Here's the catch every Cursor power user hits: the custom endpoint powers Chat and Cmd-K, but not Tab. Cursor's Tab autocomplete runs on Cursor's own proprietary models and cannot be repointed at an external API. So routing Cursor through Codestral 2 gets you a cheaper chat/edit backend, but your inline gray-text completion is still Cursor's. This is the same limitation that applies to every external backend in Cursor — see the Cursor + Ollama setup guide for the full breakdown.

That limitation is exactly why, if autocomplete is what you care about, Continue.dev is the better host for Codestral 2 — because Continue can use the dedicated FIM endpoint.

Continue.dev: the FIM setup, and the bug that quietly breaks it

This is where Codestral 2 earns its keep. Continue.dev lets you assign a model to the autocomplete role and point it at Mistral's dedicated FIM endpoint, which is a different host from the chat API:

FIM completions  →  https://codestral.mistral.ai/v1/fim/completions
Chat completions →  https://api.mistral.ai/v1/chat/completions

In your Continue config (~/.continue/config.yaml in the current YAML format), the autocomplete model looks like this:

models:
  - name: Codestral FIM
    provider: mistral
    model: codestral-latest
    apiKey: YOUR_MISTRAL_KEY
    apiBase: https://codestral.mistral.ai/v1
    roles:
      - autocomplete
    autocompleteOptions:
      maxPromptTokens: 1024
      debounceDelay: 250

The problem: completions feel dumb and slow

Here's the real-world snag. Several Continue users (tracked in continuedev/continue issue #7178) found that autocomplete was hitting …/v1/chat/completions instead of …/v1/fim/completions. The symptoms: completions arrive late, ignore the code after your cursor, and sometimes spit out a markdown code fence into your editor. That's the chat endpoint pretending to do autocomplete — it only sees the prefix, never the suffix, so it can't do

Codestral 2 as your Cursor and Cline backend in 2026: Apache 2.0, $0.30/M tokens, 256K context, and whether it beats Gemini 3.5 Flash for daily coding

What actually changed in April 2026

Two ways to run it

Wiring it into Cline

Wiring it into Cursor (and the Tab caveat)

Continue.dev: the FIM setup, and the bug that quietly breaks it

The problem: completions feel dumb and slow

Tags

Author

Stats

Published

You Might Also Like

AI Dev Weekly #16: Mistral OCR 4, Claude Tag, Alibaba Caught Stealing, GPT-5.6 Delayed

Mistral turns Le Chat into Vibe, a work-and-code agent with remote coding and VS Code support

Mistral OCR 4 brings self-hosted document AI to RAG pipelines

Codestral 2 for Local AI in 2026: Apache 2.0, 22B Params, 256K Context — Which GPU Runs It Best

Mistral AI Eyes €3B at €20B Valuation — Europe's AI Champion Doubles Down in the Compute Arms Race

Mistral OCR 4 vs AWS Textract vs Google Document AI: The Cheapest Accurate Document API (2026)