Adrien Pavão

Adrien Pavão

Blog | | |

Learning to Run a Power Network with Renewable Energies: a Challenge Design and Analysis

As the climate is rapidly changing, it is crucial that we change the way we produce and consume energy to reduce the amount of carbon and other greenhouse gases being produced. In this context, we propose a challenge that will test the viability of such scenarios. The goal is to control electricity transportation in power networks while pursuing multiple objectives: balancing production and consumption, minimizing energetic losses, keeping people and equipment safe, and avoiding catastrophic failures.

The significance of this challenge’s application not only serves as a goal in itself, but also aims to advance the field of Artificial Intelligence (AI) known as Reinforcement Learning (RL), which offers new possibilities to tackle control problems.

In this new edition, we introduce more realistic scenarios proposed by RTE to reach carbon neutrality by 2050, retiring fossil fuel electricity production, increasing proportions of renewable and nuclear energy and introducing batteries. In this paper, we present the design and the results of this competition, which took place in 2022, and was accepted as an official IJCNN/WCCI’22 challenge.

The winners of the competitions are announced at the end of this post!

Let’s go!

Logo of the challenge. Image by author.


Global warming

In the late 2010s, approximately 85% of the energy produced came from burning fossil fuels that emit greenhouse gases, such as carbon dioxide CO2. These emissions have been steadily increasing since the beginning of the industrial era. It is now widely acknowledged that the negative impact of modern societies on the environment has become significant since the 1950s. To prevent the irreversible destruction of the ecosystem that we need for our survival, it is essential to significantly reduce greenhouse gas emissions and other environmental impacts [1].

Complexity of power network operations

The electric power network can be broken down into three main functions:
production (power generation), transport (power lines), and consumption (end users). Dispatchers (highly trained engineers) ensure the system’s security by performing several actions, including [2]:

  • Managing power overflows and preventing cascading failures by adjusting the way transmission lines are interconnected in order to redirect power flow;
  • Asking producers or consumers to change what they inject into the power network;
  • When required, limiting the amount of energy injected by renewable generators (such as wind or solar) in case of overproduction or local issues for example.

Currently, automatic optimization methods are having difficulty addressing the complexity of this problem. Despite this, some promising heuristic approaches have been developed and are being tested. It is hoped that by utilizing artificial intelligence, dispatchers will be able to make more informed decisions and effectively manage the power network while ensuring the safety and security of all equipment.

Why a new competition

The “Learning to Run a Power Network’’ challenge [3, 4] is a series of competitions that model the sequential decision-making environments of real-time power network operations, as illustrated in the following figure:

Power system operation: The task of dispatchers is to monitor the power network and make eventual changes to ensure safe network operations with no line overflow. If in the environment at time t (left) a line is overflowing (indicated in red), a corrective action may be taken (center), such as a “node splitting”, resulting in restored “power network safety” in the environment at time t+1 (right). Borrowed from [3].

The participants’ algorithms had to control a simulated power network, in a reinforcement learning framework. Power networks of various sizes and topologies are used across competition rounds.

The French electricity network management company RTE has recently published the results of an extensive study outlining various scenarios for tomorrow’s French power management [5]. 
Given the ecological concerns, the majority of scenarios focus on utilizing nuclear and renewable energy sources. The Paris region, Ile-de-France, has expressed a particularly strong concern for these issues and has proposed two milestones [6]:

  • By 2030: Reduce by half the dependence on fossil fuels and nuclear power in the Ile-de-France region. This would be achieved by both reducing the energy consumption by 20% and by multiplying by 2 the energy production from renewable sources.
  • By 2050: Moving towards a 100% renewable energy and zero carbon region. This would be achieved by both reducing the energy consumption by 40% and by multiplying by 4 the energy production from renewable sources.

In this context, it is important to re-evaluate the problem addressed in previous L2RPN challenges. This can be done by updating the simulator and data to reflect zero-carbon scenarios. These scenarios are particularly challenging to manage, making the use of advanced AI techniques particularly beneficial.

Competition design


The L2RPN competition requires a library that can simulate a power system within a reinforcement learning framework. To meet this requirement, RTE has developed Grid2Op [7], a Python module that converts the operational decision-making process into a Markov Decision Process.


In order to make Grid2Op work, we must generate time series describing the electricity injections into the power network, referred to as chronics. These chronics take into account the amount of electricity injected into the network by generators, loads, and batteries.

In order to generate this edition’s chronics, we prioritized the use of renewable energy generators and imposed penalties on the use of fossil fuel generators. We were able to create chronics with an almost carbon-free energy mix, as depicted the following figure:

L2RPN 2022 energy mix over a year.

With less than 3% of electricity being generated by fossil fuels, this energy mix is highly satisfactory for our competition. Therefore, we generated 32 years worth of scenarios for participants to train their agents on.


To rank the participants, a score function was needed that would evaluate their agent’s performance and assign a numerical value. The score function was designed as the average of three cost functions, calculated over test scenarios:

  • Energy Losses Cost: determined by multiplying the electricity lost due to the Joule effect by the current price per MWh.
  • Operation Cost: the sum of expenses incurred by the agent’s actions.
  • Blackout Cost: in cases where the agent failed to manage the power network until the end of the scenario, this cost is calculated by multiplying the remaining electricity to be supplied by the current price per MWh.

Hosting on CodaLab Competitions

We implemented the L2RPN’2022 competition on CodaLab Competitions [8], enabling code submission, detailed outputs and providing a starting kit.

Competition results

The Learning to Run a Power Network (L2RPN) competition 2022, entitled “Energies of the Future and Carbon Neutrality” took place from the 15th of June to the 15th of September 2022, and was accepted as an official IJCNN/WCCI’22 challenge. A total of 16 participating teams made an entry on the final phase of the competition, among which 5 were ranked above the baseline. In this section, we show the detailed results and the fact sheets of the top-3 winners. The code of the winners is publicly available.

General results

Only the last submission of each participant, which are the ones displayed on the leaderboard, are used in this analysis. There are two leaderboards: one for the development phase, where the participants could submit as much as they wanted and get feedback, and one for the final phase in which the participant could only make one submission. The scores of the final phase are summed up in the following figure:

Main final phase results of the top participants (left) and all participants (right). “L2RPN” is the baseline provided by the organizers.

And the following figure compares the scores in the two phases. We can clearly see that the phases are very well correlated:

Scores of the participants in the development phase and in the final phase. The two phases are well correlated.

Winners code and fact sheets

Congratulations for the winning team: Maze RL, Richard Wth and HRI!

The fact sheets can be viewed on an online sheets. Interestingly, the kind of agents used vary for the different teams: Trial and Error (team Maze RL), Expert (team Richard Wth) and Reinforcement Learning (team HRI).

🏆 First place: Maze RL


Their agent is an AlphaZero-based grid topology optimization agent combined with a contingency-aware redispatching, curtailment and storage controller. More details are provided in their publication describing their method [9]. They had prior domain knowledge, as they are working on a congestion management solution for the energy sector, based on their topology optimization methodology. To handle batteries, they used joint optimization of curtailment, redispatching and storage charging and discharging (cross-entropy method).
They started with a more advanced topology agent, but learned to our surprise that a simple redispatching and curtailment approach was able to achieve a better challenge score, because excessive curtailment of renewable energy were not penalized as they initially expected from this years “Energies of the future and carbon neutrality” focus.

🏆 Second place: Richard Wth


Their agent is a single-step agent based on brute-force search and optimization. At each step when the situation is dangerous, their agent attempts to reconnect lines, brutal-force searches the substation actions and solves the DCOPF problem to find the optimal actions for generators and storages. When it is safe, they only do line reconnections. They also tried PPO-type agent which seemed to perform better in offline tests but not on validation set online. At last minute, they decided to submit a none-RL agent. On test dataset, the PPO-type agent also achieved a good score. 
For substation action, they followed the solution of the 2nd winner of L2RPN NeurIPS 2020, that is, using brutal-force search to find a reduced action space. Note that they had no prior domain knowledge.

🏆 Third place: Team HRI


Their agent draws 1000 random actions and use the best one (simulation step one day). If it is not enough, it tries to switch off lines for redirecting currents (try all, one day simulation). It involves no training. They invested some time in understanding major issues when line is hacked and propose current redirections. 
They had prior domain knowledge, as they worked on energy management systems before, e.g. scheduling charging of EV’s considering building power consumption and PV production.


We presented the design and the analysis of the fourth edition of
“Learning to Run a Power Network challenge”, focusing on the energies of the future and carbon neutrality. This competition targets the real-world problem of ensuring the safety of power networks, using a lot of renewable energies and several batteries, with a focus on real-time operations.

Teams competed to develop an automatic management system for the electricity grid. Results from the final phase of the competition were analyzed, and it was found that the winning team, Maze RL, used an AlphaZero-based grid topology optimization agent combined with a contingency-aware redispatching, curtailment and storage controller. Additionally, the second-place team, Richard Wth, used a single-step agent based on brutal-force search and optimization and the third place team, Team HRI, used reinforcement learning. The winning team’s code and further information can be found on their GitHub page and in their publication describing their method.

From this analysis, it can be concluded that different teams used different approaches and methodologies to solve the problem of automatic management of the electricity grid. Furthermore, it’s interesting to see that there’s a diverse range of techniques like AlphaZero, brute force search and optimization, and reinforcement learning were used by the top-performing teams, showing the complexity and diversity of the problem.


Organizers of the challenge: Adrien Pavao, Eva Boguslawski, Benjamin Donnot, Isabelle Guyon, Antoine Marot and Gaëtan Serré.

We are grateful to Alessandro Leite, Farid Najar, and Sébastien Treguer for stimulating discussions.

This project is co-organized by RTE France and Université Paris-Saclay, with support of Région Ile-de-France, TAILOR EU Horizon 2020 grant 952215, ANR Chair of Artificial Intelligence HUMANIA ANR-19-CHIA-0022, and ChaLearn.


[1] H.-O. Pörtner, D.C. Roberts, M. Tignor, E.S. Poloczanska, K. Mintenbeck, A. Alegria, M. Craig, S. Langsdorf, S. Löschke, V. Möller, A. Okem, and B. Rama (eds.). IPCC, 2022: Climate Change 2022: Impacts, Adaptation, and Vulnerability. Contribution of Working Group II to the Sixth Assessment Report of the Intergovernmental Panel on Climate Change. Cambridge University Press.

[2] Benjamin Donnot. Deep learning methods for predicting flows in power
grids : novel architectures and algorithms
. Theses, Université Paris-Saclay (COmUE), February 2019.

[3] Antoine Marot, Benjamin Donnot, Gabriel Dulac-Arnold, Adrian Kelly,
Aïdan O’Sullivan, Jan Viebahn, Mariette Awad, Isabelle Guyon, Patrick
Panciatici, and Camilo Romero. Learning to run a power network
challenge: a retrospective analysis.
CoRR, abs/2103.03104, 2021.

[4] Antoine Marot, Benjamin Donnot, Camilo Romero, Luca Veyrin-Forrer,
Marvin Lerousseau, Balthazar Donon, and Isabelle Guyon. Learning to
run a power network challenge for training topology controllers.
abs/1912.04211, 2019.

[5] RTE France. Futurs énergétiques 2050 principaux résultats. October

[6] Valérie Pécresse et le conseil régional d’Ile-de France. Stratégie énergie-climat de la région Ile-de-france, 2018.

[7] B. Donnot. Grid2op - A testbed platform to model sequential decision
making in power systems
., 2020.

[8] Adrien Pavao, Isabelle Guyon, Anne-Catherine Letournel, Xavier Baro, Hugo Escalante, Sergio Escalera, Tyler Thomas, and Zhen Xu. Codalab Competitions: An open source platform to organize scientific challenges. Technical report, 2022.

[9] Matthias Dorfer, Anton R. Fuxjäger, Kristian Kozak, Patrick M. Blies,
and Marcel Wasserer. Power grid congestion management via topology
optimization with alphazero.
CoRR, abs/2211.05612, 2022.