A review of FunSearch and AlphaEvolve
Large language models (LLMs) have rapidly become indispensable AI assistants. They excel at synthesizing concepts, writing, and coding to help humans solve complex problems
In a paper published in Nature
Figure 1. The FunSearch process. The LLM is shown a selection of the best programs it has generated so far, and asked to generate an even better one. The programs proposed by the LLM are automatically executed, and evaluated. The best programs are added to the database, for selection in subsequent cycles.
FunSearch uses an evolutionary approach
FunSearch’s capabilities were tested on challenging problems. For example, it was used to solve the cap set problem, which involves finding the largest set of points in a high-dimensional grid where no three points lie on a line. This longstanding open problem in extremal combinatorics, once described by renowned mathematician Terence Tao as his favorite open question, was solved by FunSearch, in collaboration with Prof. Jordan Ellenberg. This marked the first time an LLM made a new discovery for such a challenging scientific problem, outperforming state-of-the-art computational solvers.
Figure 2. Illustration of the cap set problem. The circles are the elements of $\mathbb{Z}_3^2$ with the ones belonging to the cap set shown in blue. The possible lines in $\mathbb{Z}_3^2$ are also shown (with colours indicating lines that wrap around in arithmetic modulo 3). No three elements of the cap set are in a line.
A significant advantage of FunSearch is that it does not just provide solutions. It generates programs that describe how these solutions are constructed. FunSearch also favors highly compact, concise programs, making them easier for researchers to comprehend and learn from.
“The solutions generated by FunSearch are far conceptually richer than a mere list of numbers. When I study them, I learn something.” — Jordan Ellenberg, Professor of Mathematics at the University of Wisconsin–Madison
Figure 3. Code generated by FunSearch for the cap set problem.
The success of FunSearch
More recently, in May 2025, Google DeepMind announced AlphaEvolve, an evolutionary coding agent powered by large language models for general-purpose algorithm discovery and optimization
Figure 4. The AlphaEvolve process. A prompt sampler assembles prompts for the LLMs, which generate new programs. These are then evaluated and stored in a programs database, which uses an evolutionary algorithm to select programs for future prompts.
AlphaEvolve uses an evolutionary approach with four key components (see Figure 4):
Prompt sampler: The prompt contains rich context based on previously discovered solutions, along with instructions for proposing changes to particular solutions.
LLM ensemble: Unlike FunSearch that uses a single LLM, AlphaEvolve uses an ensemble approach combining Gemini Flash and Gemini Pro. The lightweight Gemini Flash enables higher rates of candidate generation through lower latency, while the more powerful Gemini Pro provides deeper insights and higher-quality suggestions that can significantly advance the evolutionary search and potentially lead to breakthroughs.
Evaluator pool: This component verifies, runs, and scores proposed solutions using automated evaluation metrics that provide objective assessments of each solution’s accuracy and quality.
Program database: AlphaEvolve uses an evolutionary database inspired by a combination of the MAP elites algorithm
Unlike traditional genetic algorithms with explicit mutation and crossover operations, AlphaEvolve uses LLMs as sophisticated genetic operators to generate code modifications based on context from past solutions. Mutation occurs when the LLM ensemble suggests code changes (e.g., rewrites or targeted diffs), while crossover is implicit as the LLM receives multiple parent solutions as inspiration. This approach makes AlphaEvolve particularly effective in domains where progress can be clearly and systematically measured, like mathematics and computer science.
AlphaEvolve has already demonstrated significant real-world impact across multiple domains:
Improving data center scheduling: AlphaEvolve discovered a simple yet highly effective heuristic to help Borg, Google’s cluster management system, orchestrate its vast data centers more efficiently. This solution, which has been in production for over a year, continuously recovers, on average, 0.7% of Google’s worldwide compute resources. This sustained efficiency gain allows more tasks to be completed on the same computational footprint. A key benefit is that AlphaEvolve’s solution is human-readable code, offering interpretability, debuggability, predictability, and ease of deployment.
Hardware design optimization: AlphaEvolve proposed a Verilog rewrite that removed unnecessary bits in a key, highly optimized arithmetic circuit for matrix multiplication. The proposal passed robust verification methods to confirm functional correctness and was integrated into an upcoming Tensor Processing Unit (TPU). By suggesting modifications in the standard language of chip designers, AlphaEvolve promotes collaboration between AI and hardware engineers to accelerate specialized chip design.
Enhancing AI training and inference: AlphaEvolve found more efficient ways to divide large matrix multiplication operations into manageable subproblems, achieving a 23% speedup in Gemini’s architecture’s vital kernel, resulting in a 1% reduction in overall training time. In the realm of low-level GPU optimization, AlphaEvolve demonstrated remarkable efficiency by achieving up to a 32.5% speedup for the FlashAttention kernel implementation in Transformer-based AI models.
Figure 6. How AlphaEvolve helps Google deliver a more efficient digital ecosystem, from data center scheduling and hardware design to AI model training.
Beyond these applications, AlphaEvolve made a groundbreaking contribution by discovering an algorithm for multiplying 4x4 complex-valued matrices using just 48 scalar multiplications, surpassing the efficiency of Strassen’s 1969 algorithm. When applied to a diverse set of over 50 open problems spanning mathematical analysis, geometry, combinatorics, and number theory, AlphaEvolve demonstrated remarkable versatility: it successfully rediscovered state-of-the-art solutions in 75% of cases and improved upon previously best-known solutions in 20% of cases. One of its most notable achievements was advancing the 300-year-old kissing number problem, where it discovered a configuration of 593 outer spheres and established a new lower bound in 11 dimensions, showcasing its ability to tackle complex geometric challenges.
Figure 7. Examples of ground-breaking mathematical contributions discovered with AlphaEvolve.
While both FunSearch and AlphaEvolve leverage LLM within an evolutionary framework, AlphaEvolve offers a substantial improvement over its predecessor, both in terms of scale and generality. Here’s a detailed comparison of their capabilities:
Capability | FunSearch | AlphaEvolve |
---|---|---|
Code Scope | Evolves a single function | Evolves an entire codebase |
Code Size | Evolves up to 10-20 lines of code | Evolves up to hundreds of lines of code |
Language Support | Python only | Any programming language |
Computation | Needs fast evaluation (≤ 20min on 1 CPU) | Can evaluate for hours, in parallel, on accelerators |
LLM Usage | Millions of LLM samples used | Thousands of LLM samples suffice |
Model Scale | Small LLMs used, no benefit from using larger models | Benefits from using state-of-the-art LLMs |
Context Handling | Minimal context (only previous solutions) | Rich context and feedback in prompts |
Optimization | Optimizes a single metric | Can simultaneously optimize multiple metrics |
The development of FunSearch