Microsoft has unveiled Agent Lightning, an open-source framework designed to make artificial intelligence (AI) agents smarter, more adaptable, and capable of continuous improvement. The system introduces a powerful way to train and optimise agents – the software entities that power chatbots, coding assistants and search tools – using reinforcement learning (RL) without the need to rewrite existing code.
The goal is simple but ambitious: to bridge the gap between agent workflow development and agent optimisation. By doing so, Microsoft aims to turn today’s static, pre-trained AI agents into dynamic learners that can refine their behaviour based on real-world experience.
Why Agent Lightning Matters
Until now, most frameworks for building AI agents, such as OpenAI’s Agents SDK, LangChain, and Microsoft’s AutoGen, have made it easy to design interactive, modular agents but offered little support for training them to improve. Conversely, existing RL tools are good at learning from experience but struggle to deal with the messy, multi-step, multi-agent workflows common in real applications.
Agent Lightning fills that gap. It can optimise any agent built on any framework, connecting seamlessly to reinforcement learning systems such as VeRL, Microsoft’s scalable RL infrastructure. This means developers can take an existing AI agent, plug it into Agent Lightning, and start training it on real user interactions without modifying the agent’s original code.
How It Works
At the heart of Agent Lightning are two key components:
- The Lightning Server, which handles training and exposes an OpenAI-compatible API for the updated model.
- The Lightning Client, which runs alongside the agent, collects data on its performance, and streams that data back to the server for analysis.
This setup, known as Training-Agent Disaggregation, cleanly separates the agent’s operational workflow from the training process. The system records traces of each model’s actions (inputs, outputs, and rewards) and converts them into a standard reinforcement learning format of state, action, reward, next state.
It also includes Automatic Intermediate Rewarding (AIR), a mechanism that gives agents small, real-time feedback when they perform useful actions, helping them learn more efficiently.
Real-World Performance Gains
Microsoft tested Agent Lightning on three demanding tasks:
- Text-to-SQL translation using the Spider benchmark, where the system improved accuracy in generating database queries.
- Retrieval-Augmented Generation (RAG) with the MuSiQue benchmark, which saw steadier improvements in how the AI retrieved and used information from a Wikipedia-scale index.
- Math problem solving with tool use, using the Calc-X dataset, where agents learned to use calculator tools more effectively.
Across all tasks, the research team reported consistent, stable performance gains during training and testing, showing that Agent Lightning helps AI agents learn faster and perform better in complex, real-world scenarios.
Easy Integration for Developers
Perhaps the most practical feature is how simple it is to adopt. Developers using popular frameworks like LangChain, OpenAI Agents SDK, or AutoGen can integrate Agent Lightning with near-zero code changes. The runtime also supports OpenTelemetry, meaning developers can stream training data through existing monitoring systems.
Once connected, the Lightning Server exposes an API endpoint compatible with standard tools, allowing the newly trained models to be deployed instantly without altering the production environment.
The Road Ahead
Microsoft’s researchers say the next phase of Agent Lightning will bring richer feedback systems, advanced RL algorithms, and broader compatibility with other frameworks such as Semantic Kernel and CrewAI.
By decoupling learning from operation, Agent Lightning marks a significant step toward creating AI agents that continuously adapt and improve, potentially ushering in a new generation of self-updating, context-aware digital assistants.
As Microsoft puts it, Agent Lightning is like giving AI agents their own update button – one that helps them learn, evolve, and keep pace with the fast-changing world around them.








