Newsletter Subscribe
Enter your email address below and subscribe to our newsletter
Enter your email address below and subscribe to our newsletter

A Stanford University study is drawing fresh attention for its finding that small language models running on desktop computers can match the performance of large cloud-based AI systems on the vast majority of tasks — while consuming far less energy. The research, which introduced a new metric called “intelligence per watt,” was highlighted this week in a Reuters column by investment strategist Joachim Klement, who argued the findings raise questions about the long-term profitability of companies building ever-larger AI models.inkl
The Stanford paper, authored by Jon Saad-Falcon, Avanika Narayan, and colleagues from Stanford University and Together AI, tested more than 20 local language models with up to 20 billion active parameters across eight hardware accelerators and one million real-world single-turn chat and reasoning queries. The researchers found that local models can accurately respond to 88.7% of queries, with accuracy exceeding 90% in creative tasks and remaining strong in sales, management, and entertainment applications.letsdatascience
On the most difficult reasoning tasks, small models keep pace with large language models in roughly 50% of cases — up from just 8% two years earlier, according to Klement’s analysis of the findings. The study also found that “intelligence per watt” improved 5.3 times between 2023 and 2025, driven by a 3.1x gain from model improvements and 1.7x from hardware advances.stanford
The efficiency gains carry direct economic consequences. An “oracle” routing approach — directing queries to local models when capable — could cut energy use by 80.4% and compute costs by 73.8% compared to cloud-only inference, the Stanford team found. Even an imperfect router operating at 80% accuracy still delivers energy reductions above 60%.linkedin
The study’s findings arrive as Nvidia, which supplies the GPUs powering data center AI, faces questions about whether demand for centralized compute will grow as projected. Klement wrote that if small models continue improving at their current pace, companies like Anthropic, OpenAI, and xAI “may have reason to worry” because “the future of AI could be smaller, cheaper and far less profitable than investors expect.”substack
The research reflects a broader industry trend. IBM researchers have tested models including OpenAI’s gpt-oss, Qwen3, and IBM’s Granite 4.0 on consumer hardware, finding that current local models achieve higher intelligence per watt than older-generation models on specialized hardware. Nvidia itself published a 2026 paper arguing that small language models are “sufficiently powerful, inherently more suitable, and necessarily more economical” for agentic AI systems.ibm
The Stanford paper was first released as a preprint in November 2025, with local query coverage — the share of real-world queries local models can handle accurately — having risen from 23.2% in 2023 to 71.3% in 2025.letsdatascience