One July afternoon in 2024, Ryan Williams set out to prove himself wrong. Two months had passed since he’d hit upon a startling discovery about the relationship between time and memory in computing.
New Linear-complexity Multiplication (L-Mul) algorithm claims it can reduce energy costs by 95% for element-wise tensor multiplications and 80% for dot products in large language models. It maintains ...