When Faster Isn't Greener: Evaluating the Energy Cost of LLM-Based Code Optimization

This post is a shorter and simpler version of the paper and poster “When Faster Isn’t Greener: The Hidden Costs of LLM-Based Code Optimization” presented at ASE 2025. You can find the paper and its replication package on my publications page and the poster on my talks page

Large Language Models can now rewrite code to make it faster, shorter, or more efficient. This has sparked a familiar narrative: if LLMs can improve performance automatically, they might also make software greener.

But there’s a catch. Generating these optimizations with an LLM consumes energy too, sometimes a lot.

In a recent study, we measured that cost directly and asked a simple question:

When does the energy spent generating an optimization outweigh the energy it saves?

To answer it, we evaluated eight optimization methods across five LLMs, using 118 algorithmic tasks. We tracked performance, correctness, and the energy consumed both during optimization and during program execution.

Why This Problem Matters

LLM-based optimizers sit at the intersection of two trends: automated software engineering and sustainable computing. They promise “drop-in” speedups without manual tuning, but LLM inference is computationally expensive. Any claim that these optimizers make code greener must include the cost of the optimizer itself.

Our results show that this energy cost is often non-negligible, and sometimes dominates the entire equation.

What We Actually Measured

We followed a simple pipeline:

Take a piece of code from the EvalPerf benchmark (118 tasks).
Apply one of eight optimization methods (from simple prompts to evolutionary strategies).
Use one of five LLMs to generate the optimized code.
Measure correctness, runtime speedup, and CPU-level energy savings.
Measure the energy used by the optimizer itself, mostly GPU inference.
Repeat the optimization process up to four times.

This allowed us to compute the core metric introduced in the paper:

Break-Even Point (BEP): How many times must the optimized program run before its energy savings compensate for the energy used to generate it?

\(BEP = \frac{Energy\ cost\ of\ optimization}{Energy\ saved\ per\ run}\)

What We Found

1. LLM generation dominates the energy bill

For nearly all methods, 90–99% of the optimization energy comes from LLM inference. The rest (execution, benchmarking, verification) is trivial in comparison.

2. The energy cost varies enormously

Depending on the model and strategy, producing a single optimized result (after four rounds) cost anywhere from: 0.10 Wh → 52.6 Wh

Lightweight prompts and smaller models tend to be dramatically cheaper.


Energy cost of optimizations (at round 2) depending on model and optimization method.

3. Break-Even Points can be very large

For most settings, the BEP ranged from tens of thousands to hundreds of thousands of executions. Only in rare, extremely inefficient baseline programs did BEP fall below 100 runs.

In other words: in many cases, the optimizer never “pays back” its energy debt.


Sample of 1000 optimizations and their energy reduction and BEP.

4. Better performance doesn’t always mean better energy

Even when optimized programs ran faster, this didn’t reliably correlate with lower energy consumption. The relationship between performance gains and actual energy savings was weak.

5. The strongest speedups come from methods making a lot of generations

Methods like EoH or LLM4EFFI produced the largest performance improvements. However, they often more generations, increasing energy consumption.

Practical Guidance for Developers and Researchers

Don’t optimize everything

Only apply LLM optimizers to hot paths, large workloads, or frequently executed functions. Compute the BEP (or a rough estimate) before deciding.

Limit the number of iterations

Many methods plateau by the second iteration. More rounds mean more energy with diminishing returns.

Measure the whole pipeline

Green claims require full-system measurement, not runtime alone. This includes LLM inference, verification, and execution.

Takeaway

LLMs can indeed make code faster and more energy-efficient, but these gains come at an upfront energy cost. Whether an optimization is actually greener depends on how often the code will run.

In many real-world settings, the optimizer might consume more energy than it saves.

Understanding and quantifying this trade-off is essential if we want automated software engineering to contribute meaningfully to sustainable computing.