My homelab AI agent setup was costing $42/month in API calls alone — until I switched to local quantized models.
Key Takeaways
Switching from OpenRouter API calls to local Ollama quantized models cut my monthly LLM spend from $42 to $0.
Llama 3 8B q4_0 fits in ~4GB VRAM on a single RTX 3060, leaving room for other containers.
GPU time-slicing with Docker lets multiple agent instances share one GPU without fighting over resources.
Quality was comparable: 38% preferred local Llama 3, 32% preferred API models, 30% rated them as ties.
Bottom Line
If you're spending $40+/month on API calls for predictable, bursty workloads, switching to Ollama with quantized models can slash costs to near zero while keeping performance acceptable.
Read the full analysis on Susiloharjo.










