113
Points
10
Comments
Alifatisk
Author

Top Comments

foundry27Apr 20
I like this idea. This might be one of the more effective social pressures available for getting inference providers to fix long-standing issues. AWS Bedrock, for example, has crippling defects in its serving stack for Kimi’s K2 and K2.5 models that cause 20%-30% of attempts to emit tool calls to instead silently end the conversation (with no token output). That makes AWS effectively irrelevant as a serious inference provider for Kimi, and conveniently pushes users onto Bedrock’s significantly more expensive Anthropic models for comparable performance on agentic tasks.
bobbiechenApr 20
If I understand correctly, threat model here seems to be to protect against accidental issues that would impact performance, but doesn't cover malicious actor.

For example, Sketchy Provider tells you they are running the latest and greatest, but actually is knowingly running some cheaper (and worse) model and pocketing the difference. These tests wouldn't help since Sketchy Provider could detect when they're being tested and do the right thing (like the Volkswagen emissions scandal). Right?

OsamaJaberApr 20
Good to see this exist. Inference providers quietly swap quant levels. Most users never check. A standard verifier from the model maker is the right move, would love to see other labs ship the same
seismApr 20
A test that runs for 15 hours on a high powered rig is going to be hard to reproduce or scale. But I think this addresses a widespread concern, which affects all kinds of cloud services. What you ping is not necessarily what you get.
curioussquirrelApr 20
After Anthropic, Moonshot is another model provider who restricts tweaking of sampling parameters. I do like the idea of the vendor verifier, though.
Visit the Original Link

Read the full content on kimi.com

Source
kimi.com
Author
Alifatisk
Posted
April 20, 2026 at 06:39 PM


More Top Stories

apple.com Apr 20
John Ternus to become Apple CEO
767385 commentsby schappim
Details
qwen.ai Apr 20
Qwen3.6-Max-Preview: Smarter, Sharper, Still Evolving
485244 commentsby mfiguiere
Details
github.com Apr 20
Soul Player C64 – A real transformer running on a 1 MHz Commodore 64
325 commentsby adunk
Details
isaaccorbrey.com Apr 20
Jujutsu megamerges for fun and profit
2811 commentsby icorbrey
Details
awesomeagents.ai Apr 20
GitHub's fake star economy
702345 commentsby Liriel
Details
github.com Apr 20
We got 207 tok/s with Qwen3.5-27B on an RTX 3090
13134 commentsby GreenGames
Details
👋 Need help with code?