Takeaways
- Scaling model size and training data alone cannot produce general intelligence, because intelligence requires the ability to adapt to the unknown in real time.
- True intelligence is not stored skill but the capacity to generate new solutions when faced with unfamiliar problems.
- Standard AI benchmarks often mislead because they measure memorized performance, not the capacity for adaptive reasoning or information efficiency.
- Fluid intelligence in AI demands test-time adaptation, where models dynamically update based on novel inputs rather than static inference alone.
- Combining deep learning’s intuitive strengths with programmatic reasoning enables AI systems to generalize better and handle novel tasks.
Summary
François Chollet begins by highlighting the unprecedented decline in computational costs—falling by two orders of magnitude every decade since 1940—and explains how this trend catalyzed the deep learning revolution in the 2010s. He recounts how the availability of GPU-based compute and large data sets allowed self-supervised text modeling to flourish, giving rise to a “scaling era” where expanding model and data size predictably improved benchmark performance. Yet, Chollet argues, this paradigm conflated static, memorized skills with fluid, adaptable intelligence.
To distinguish genuine intelligence from rote proficiency, Chollet introduced the Abstraction and Reasoning Corpus (ARC) in 2019, designed to measure a system’s capacity to solve new tasks on the fly rather than regurgitate learned patterns. Despite a 50,000× increase in model scale, performance on this fluid-intelligence benchmark stagnated near zero, revealing that pre-training and static inference alone cannot yield general intelligence. The landscape shifted in 2024 with the advent of test-time adaptation techniques—methods that allow models to alter their own state during inference—and ARC performance suddenly approached human levels, demonstrating bona fide fluid reasoning.
Chollet then probes the essence of intelligence, contrasting a skill-centric view (machines performing predefined tasks) with a process-centric view (machines facing novel situations). He defines intelligence as an efficiency ratio: the ability to leverage past experience or encoded priors to navigate a broad operational domain under uncertainty. He cautions against using conventional benchmarks—crafted to evaluate task-specific knowledge—as proxies for intelligence, since they incentivize optimizations that miss key cognitive abilities.
To drive progress toward autonomous invention, Chollet describes ARC 2, which emphasizes compositional generalization by presenting more intricate tasks that resist brute-force pattern matching. Despite modest gains from test-time training, models remain well below human performance. He previews ARC 3, an interactive benchmark launching in 2026, which will assess an agent’s capacity to learn goals, explore novel environments, and solve hundreds of unique reasoning games under strict action budgets.
Finally, Chollet articulates the “kaleidoscope hypothesis”: the world’s apparent novelty arises from recombining a small set of reusable abstractions. He distinguishes two abstraction types—value-centric (continuous pattern recognition) and program-centric (discrete structural reasoning)—and asserts that machines must integrate both. He proposes a meta-learning architecture that blends gradient-based learning for perception with discrete program search guided by learned intuition, continuously refining a library of reusable components. This “programmer-like” system, under development at Ndea, aims to imbue AI with the efficiency and flexibility necessary for independent scientific discovery.