Yearly Archives: 2026

Eric

Google’s TurboQuant Promises 6x KV Cache Compression with Zero Accuracy Loss

New quantization technique slashes LLM memory use and boosts inference speed on existing hardware. Google Research released TurboQuant, a training-free quantization method that compresses key-value (KV) caches in large language models to as little as 3 bits per value. The result: at least 6x lower memory footprint with no drop...
Eric

LiteLLM PyPI Versions 1.82.7–1.82.8 Compromised in Supply Chain Attack

Credential stealer exfiltrated SSH keys, cloud credentials, and Kubernetes secrets from systems running the popular LLM library. On March 24, 2026, developers discovered malicious code inside LiteLLM packages 1.82.7 and 1.82.8 on PyPI. The library, which routes calls to more than 100 LLM providers and logs roughly 97 million downloads...
Eric

Mirantis Embeds MCP Server in Lens Desktop so AI Assistants Can...

Over 1 million users gain zero-setup access to real-time cluster data for Claude, Copilot, Cursor and ChatGPT. Platform engineers have spent years building guardrails around Kubernetes while developers chase faster code velocity with AI. Today Mirantis closed one of the last remaining gaps: AI coding assistants can now talk directly...