LLM Inference Optimization

13h

AI inference costs dropped up to 10x on Nvidia's Blackwell — but hardware is only half the equation

New deployment data from four inference providers shows where the savings actually come from — and what teams should evaluate ...

VentureBeat

New LLM optimization technique slashes memory costs up to 75%

Researchers at the Tokyo-based startup Sakana AI have developed a new technique that enables language models to use memory more efficiently, helping enterprises cut the costs of building applications ...

14d

AI Infrastructure Evolution: How Better Hardware Powers The LLM Era

Running both phases on the same silicon creates inefficiencies, which is why decoupling the two opens the door to new ...

Semiconductor Engineering

HW-SW Co-Designed System With 3 Core Optimization Pathways For Long-Context Agentic LLM Inference (Cambridge, ICL)

A new technical paper titled “Combating the Memory Walls: Optimization Pathways for Long-Context Agentic LLM Inference” was published by researchers at University of Cambridge, Imperial College London ...

Business Wire

MangoBoost Launches Mango LLMBoost™: AI Inference Optimization Software with Up to 12.6x Relative Performance Improvement and 92% Cost Savings

BELLEVUE, Wash.--(BUSINESS WIRE)--MangoBoost, a provider of cutting-edge system solutions designed to maximize AI data center efficiency, is announcing the launch of Mango LLMBoost™, system ...

Show inaccessible results

AI inference costs dropped up to 10x on Nvidia's Blackwell — but hardware is only half the equation

New LLM optimization technique slashes memory costs up to 75%

AI Infrastructure Evolution: How Better Hardware Powers The LLM Era

HW-SW Co-Designed System With 3 Core Optimization Pathways For Long-Context Agentic LLM Inference (Cambridge, ICL)

MangoBoost Launches Mango LLMBoost™: AI Inference Optimization Software with Up to 12.6x Relative Performance Improvement and 92% Cost Savings

The $20 Billion Bet On Inference: What Every AI Infrastructure Team Needs To Get Right

LLM optimization in 2026: Tracking, visibility, and what’s next for AI discovery

Defeating Nondeterminism in LLM Inference by Thinking Machines