OpenAI’s AI codex now works for hours on its own

OpenAI has launched a new version of its artificial intelligence coding assistant, GPT-5 Codex, which it says can work for hours without human help on complex programming tasks.

The tool, based on OpenAI’s GPT-5 model, is designed to support software engineers with everything from debugging to large-scale code refactoring. Unlike earlier systems, Codex is able to stay “on task” for long periods, in some tests running independently for more than seven hours.

Long-haul programming

“During testing, we’ve seen GPT-5-Codex work independently for more than 7 hours at a time on large, complex tasks, iterating on its implementation, fixing test failures, and ultimately delivering a successful implementation,” OpenAI said in its launch announcement.

The company also noted: “GPT-5-Codex adapts how much time it spends thinking more dynamically based on the complexity of the task.”

On a key industry benchmark, SWE-bench Verified, Codex achieved a 74.5% success rate across 500 real-world coding tasks. On refactoring, it reached just over 51% accuracy, up from around 34% for the base GPT-5 model.

Faster and more efficient

OpenAI reports that for smaller tasks, Codex uses about 93% fewer tokens than GPT-5, meaning shorter response times and lower cost. The company also claims cloud-based coding tasks now run with 90% lower latency, thanks to infrastructure changes.

In its evaluations of code reviews, OpenAI said: “We find that comments by GPT-5-Codex are less likely to be incorrect or unimportant, reserving more user attention for critical issues.”

Working inside existing tools

Codex integrates with widely used platforms:

VS Code and other IDEs for in-editor assistance
Command line tools for local or cloud execution
GitHub for automated pull request reviews
Web and mobile apps, including ChatGPT on iOS

Analysts say this seamless integration is one reason adoption has been rapid, with installations climbing quickly in the first weeks after launch. Simon Willison has documented some of these early trends.

Security controls

By default, Codex runs in a sandbox without internet access. Developers must explicitly grant permissions if the system is to read or write files, or connect externally.

Every action is logged, with test results and explanations provided for review. OpenAI says this ensures developers retain control and can audit the AI’s changes.

Industry impact

AI coding tools are becoming a competitive field. Microsoft’s GitHub Copilot, Anthropic’s Claude Code, and Google’s Gemini Code Assist are all vying to become the go-to assistant for professional developers.

Developer Simon Willison noted: “OpenAI report Codex crunching for seven hours in some cases… GPT-5-Codex adapts how much time it spends thinking more dynamically based on the complexity of the task.”

Observers say the advance is significant, though caution remains. Some users have reported Codex can drift during very long sessions, requiring human oversight to stay on course.

A partner, not a replacement

OpenAI stresses that Codex is meant to assist, not replace, engineers. It can take on repetitive work such as large-scale clean-ups or automated reviews, but human judgement is still needed for design and production decisions.

As the competition heats up, one thing is clear: the role of AI in software development is no longer a future prospect. It is rapidly becoming part of everyday programming.