Can we use AI to discover better algorithms?

A review of FunSearch and AlphaEvolve

Large language models (LLMs) have rapidly become indispensable AI assistants. They excel at synthesizing concepts, writing, and coding to help humans solve complex problems . But could they discover entirely new knowledge? As LLMs have been shown to “hallucinate” factually incorrect information, using them to make verifiably correct discoveries is a challenge. But what if we could harness the creativity of LLMs by identifying and building upon only their very best ideas? This question is at the heart of recent breakthroughs from Google DeepMind, which explore how LLMs can be guided to make novel discoveries in mathematics and algorithm design. This post delves into two pioneering works, FunSearch and the more recent AlphaEvolve, showcasing their approaches and implications for the future of automated algorithm discovery.

FunSearch

In a paper published in Nature, Google DeepMind introduced FunSearch, a groundbreaking method demonstrating that LLMs can make new discoveries in mathematical sciences. The core idea is to search for novel “functions” written in computer code, hence the name FunSearch. FunSearch tackles the trade-off between LLMs’ creativity and correctness by pairing a pre-trained LLM with an automated “evaluator.” This evaluator guards against hallucinations and incorrect ideas, ensuring that the system builds upon solid foundations.

How FunSearch works

Overview of FunSearch

Figure 1. The FunSearch process. The LLM is shown a selection of the best programs it has generated so far, and asked to generate an even better one. The programs proposed by the LLM are automatically executed, and evaluated. The best programs are added to the database, for selection in subsequent cycles.

FunSearch uses an evolutionary approach. To start, the user writes a description of the problem in code. This includes a way to evaluate programs and an initial “seed” program to begin the process. The system then follows these steps:

  1. It selects the most promising programs from the current database.
  2. These programs are sent to an LLMIn their work, Google's PaLM 2 was used, though other code-trained LLMs can also work., which creatively builds upon them to generate new program proposals.
  3. The new programs are automatically run and checked by the evaluator.
  4. The best-performing valid programs are added back into the database, improving the database for the next round.

Benefits of FunSearch

FunSearch’s capabilities were tested on challenging problems. For example, it was used to solve the cap set problem, which involves finding the largest set of points in a high-dimensional grid where no three points lie on a line. This longstanding open problem in extremal combinatorics, once described by renowned mathematician Terence Tao as his favorite open question, was solved by FunSearch, in collaboration with Prof. Jordan Ellenberg. This marked the first time an LLM made a new discovery for such a challenging scientific problem, outperforming state-of-the-art computational solvers.

Benefits of FunSearch

Figure 2. Illustration of the cap set problem. The circles are the elements of $\mathbb{Z}_3^2$ with the ones belonging to the cap set shown in blue. The possible lines in $\mathbb{Z}_3^2$ are also shown (with colours indicating lines that wrap around in arithmetic modulo 3). No three elements of the cap set are in a line.

A significant advantage of FunSearch is that it does not just provide solutions. It generates programs that describe how these solutions are constructed. FunSearch also favors highly compact, concise programs, making them easier for researchers to comprehend and learn from.

“The solutions generated by FunSearch are far conceptually richer than a mere list of numbers. When I study them, I learn something.” — Jordan Ellenberg, Professor of Mathematics at the University of Wisconsin–Madison

Code generated by FunSearch

Figure 3. Code generated by FunSearch for the cap set problem.

The success of FunSearch underscores that LLMs, when carefully guided and their outputs rigorously verified, can be powerful engines for scientific discovery.

AlphaEvolve

More recently, in May 2025, Google DeepMind announced AlphaEvolve, an evolutionary coding agent powered by large language models for general-purpose algorithm discovery and optimization. This development builds upon the success of systems like FunSearch and represents a significant step towards leveraging AI for complex problem-solving across various domains. Unlike FunSearch, which focuses on discovering single functions, AlphaEvolve is designed to evolve entire codebases and develop much more intricate algorithms.

How AlphaEvolve works

Overview of AlphaEvolve

Figure 4. The AlphaEvolve process. A prompt sampler assembles prompts for the LLMs, which generate new programs. These are then evaluated and stored in a programs database, which uses an evolutionary algorithm to select programs for future prompts.

AlphaEvolve uses an evolutionary approach with four key components (see Figure 4):

  1. Prompt sampler: The prompt contains rich context based on previously discovered solutions, along with instructions for proposing changes to particular solutions.

  2. LLM ensemble: Unlike FunSearch that uses a single LLM, AlphaEvolve uses an ensemble approach combining Gemini Flash and Gemini Pro. The lightweight Gemini Flash enables higher rates of candidate generation through lower latency, while the more powerful Gemini Pro provides deeper insights and higher-quality suggestions that can significantly advance the evolutionary search and potentially lead to breakthroughs.

  3. Evaluator pool: This component verifies, runs, and scores proposed solutions using automated evaluation metrics that provide objective assessments of each solution’s accuracy and quality.

  4. Program database: AlphaEvolve uses an evolutionary database inspired by a combination of the MAP elites algorithm and island-based population models to continuously improve upon the best solutions while maintaining diversity to encourage exploration.

Unlike traditional genetic algorithms with explicit mutation and crossover operations, AlphaEvolve uses LLMs as sophisticated genetic operators to generate code modifications based on context from past solutions. Mutation occurs when the LLM ensemble suggests code changes (e.g., rewrites or targeted diffs), while crossover is implicit as the LLM receives multiple parent solutions as inspiration. This approach makes AlphaEvolve particularly effective in domains where progress can be clearly and systematically measured, like mathematics and computer science.

Benefits of AlphaEvolve

AlphaEvolve has already demonstrated significant real-world impact across multiple domains:

  1. Improving data center scheduling: AlphaEvolve discovered a simple yet highly effective heuristic to help Borg, Google’s cluster management system, orchestrate its vast data centers more efficiently. This solution, which has been in production for over a year, continuously recovers, on average, 0.7% of Google’s worldwide compute resources. This sustained efficiency gain allows more tasks to be completed on the same computational footprint. A key benefit is that AlphaEvolve’s solution is human-readable code, offering interpretability, debuggability, predictability, and ease of deployment.

  2. Hardware design optimization: AlphaEvolve proposed a Verilog rewrite that removed unnecessary bits in a key, highly optimized arithmetic circuit for matrix multiplication. The proposal passed robust verification methods to confirm functional correctness and was integrated into an upcoming Tensor Processing Unit (TPU). By suggesting modifications in the standard language of chip designers, AlphaEvolve promotes collaboration between AI and hardware engineers to accelerate specialized chip design.

  3. Enhancing AI training and inference: AlphaEvolve found more efficient ways to divide large matrix multiplication operations into manageable subproblems, achieving a 23% speedup in Gemini’s architecture’s vital kernel, resulting in a 1% reduction in overall training time. In the realm of low-level GPU optimization, AlphaEvolve demonstrated remarkable efficiency by achieving up to a 32.5% speedup for the FlashAttention kernel implementation in Transformer-based AI models.

Overview of AlphaEvolve

Figure 6. How AlphaEvolve helps Google deliver a more efficient digital ecosystem, from data center scheduling and hardware design to AI model training.

Beyond these applications, AlphaEvolve made a groundbreaking contribution by discovering an algorithm for multiplying 4x4 complex-valued matrices using just 48 scalar multiplications, surpassing the efficiency of Strassen’s 1969 algorithm. When applied to a diverse set of over 50 open problems spanning mathematical analysis, geometry, combinatorics, and number theory, AlphaEvolve demonstrated remarkable versatility: it successfully rediscovered state-of-the-art solutions in 75% of cases and improved upon previously best-known solutions in 20% of cases. One of its most notable achievements was advancing the 300-year-old kissing number problem, where it discovered a configuration of 593 outer spheres and established a new lower bound in 11 dimensions, showcasing its ability to tackle complex geometric challenges.

Overview of AlphaEvolve

Figure 7. Examples of ground-breaking mathematical contributions discovered with AlphaEvolve.

FunSearch vs AlphaEvolve

While both FunSearch and AlphaEvolve leverage LLM within an evolutionary framework, AlphaEvolve offers a substantial improvement over its predecessor, both in terms of scale and generality. Here’s a detailed comparison of their capabilities:

Capability FunSearch AlphaEvolve
Code Scope Evolves a single function Evolves an entire codebase
Code Size Evolves up to 10-20 lines of code Evolves up to hundreds of lines of code
Language Support Python only Any programming language
Computation Needs fast evaluation (≤ 20min on 1 CPU) Can evaluate for hours, in parallel, on accelerators
LLM Usage Millions of LLM samples used Thousands of LLM samples suffice
Model Scale Small LLMs used, no benefit from using larger models Benefits from using state-of-the-art LLMs
Context Handling Minimal context (only previous solutions) Rich context and feedback in prompts
Optimization Optimizes a single metric Can simultaneously optimize multiple metrics

Takeaways

The development of FunSearch and AlphaEvolve marks an exciting advancement in the application of LLMs. Overall, these systems demonstrate LLMs are moving beyond text generation and coding assistance to become tools for genuine discovery and sophisticated optimization in mathematics, computer science, and engineering. Combining LLM creativity with rigorous, automated evaluation within an evolutionary framework is a powerful and promising strategy for tackling complex, real-world problems by evolving entire codebases. While the journey is still ongoing, the prospect of LLMs significantly augmenting, or even leading in some cases, algorithmic and mathematical discovery is becoming increasingly tangible.