The ADE Wars Have Begun

For eons (months) only two major Agentic Development Environments (ADEs) existed: Cursor 2.0 & Conductor.

(aside) ADEs are defined by desktop UX experiences in which 99% of the focus is on chatting with agents to carry out work, not looking at the code. This is different from IDEs with AI assistance, such as Cursor 1.0, VSCode with Copilot/Cline/CC, etc. This is also different from terminal-based agents like Claude Code.

Then one day, the model providers attacked. Google shipped Antigravity w/ Agent Manager. OpenAI shipped Codex Web. Anthropic shipped Claude Web.

I played with almost all of them (sans Antigravity), and what's immediately clear is that none of these have it all.

What Actually Matters in an ADE

After weeks of bouncing between all four platforms, I've landed on seven features that separate a genuinely usable ADE from a frustrating one. Not one of them has all of these, which is part of why the landscape still feels so unsettled.

Model flexibility

This isn't just because Codex has "caught-up". What people are discovering is that certain agents are better for certain tasks. I've seen Opus 4.5 be given its flowers for its planning prowess, while Codex 5.2 is being praised for its endurance and stamina in long-running tasks. Not just the big players matter either: Kimi 2.5's emergence has shown that lower-cost models can have similar performance. In my view, futuristic orchestration will delegate less complex tasks to cheaper models while assigning more complex ones to the SOTA providers. So a single-provider ADE will not fly.

The ability to work seamlessly with main & with worktrees

Worktrees are table stakes for parallel execution, it's the accepted standard for making isolated changes to code without running into conflicts with other sessions. The workflow around worktrees is important. Conductor does a great job making it clear what worktree I'm in. Cursor does not.

Sometimes, though, I just want to fix a typo or tweak a config, and spinning up a worktree for that is overkill. Conductor forces everything into a worktree, which I find frustrating for small fixes. I keep dropping back to a terminal window just to avoid the overhead, and that's a failure mode—the ADE should be my single entry point for all work, whether it's exploratory, complex, or surgical.

In-app code editing

Maybe this is my old age showing...I still like to look at the code and edit it myself sometimes. But that's where several of these tools fall short, except Cursor, which was formerly an IDE. Conductor lets you view, but you can't edit code directly inside the interface. Codex, Claude, and Conductor all have buttons that boot you back to your IDE—"open in Cursor" or "open in Xcode."

If I'm developing an iOS screen and I just want to change a bottom margin, I shouldn't have to wait for an agent to process that request, nor should I have to load an entire IDE (which would scale to 8 IDEs for 8 worktrees). I should just edit the file.

Clean, out-of-the-box UX

This matters more than I expected. Cursor 2.0's launch was genuinely confusing to me. The marketing site showed a light background while my default was dark, so I wasn't even sure I was looking at the right app (like, was Cursor 2.0 a separate new app?). The agent view felt disconnected from the IDE view, and it took real configuration before it felt like a coherent tool. Not to mention the traces of VSCode that show up in all sorts of functions. Meanwhile, Conductor's UX made sense immediately—the mental model was clear from the first session. Codex was quite clear off the bat as well.

Full PR lifecycle management

This one was jarring to me when trying non-Conductor tools, because of how comfortable I had become with it. When I'm working in Conductor, I don't touch GitHub at all. It creates PRs natively from the desktop app, resolves conflicts within itself, and suggests archiving the chat when I merge. The whole workflow lives in one place. When I was using Codex recently, it asked me to navigate to GitHub, click merge myself, and when I came back, it had no idea the PR was merged. Claude has a "manually create PR" button, which feels unacceptable to me. When I'm running eight parallel worktrees, the ADE needs to own that workflow end-to-end.

Terminal access

Conductor lets me spin up multiple terminal sessions within the same worktree, and Cursor does too. Codex didn't seem to support this when I tested it. Claude is arguably worse—it gives me a terminal window, but it's just running the same Claude session I'm already looking at in the UI. That's not really a terminal; it's a different viewport on the same conversation.

At the end of the day, I'm still coding. I might have to install dependencies. Figure out file permissions. Sym links. I still want to do some things myself, not run everything through the agent.

Browser or device previews

A huge benefit Cursor gives me is the ability to load browser tabs within the context of a given conversation. This gives me a great way to check my work as my conversation is chugging along, or at critical breakpoints. Yesterday's XCode + Claude Code launch showcases something similar: the ability to view device previews side-by-side as you're developing with CC. These previews are important if we aim to have one ADE be our entire coding interface. Any switch to an external browser, or copy-pasting localhost links, is unnecessary friction.

The Comparison

Here's how the four platforms stack up across these dimensions:

Feature	Cursor 2.0	Conductor	Codex Web	Claude Web
Model flexibility	Yes	Yes	No (Codex only)	No (Claude only)
The ability to work seamlessly with main & with worktrees	Partial, worktree support is janky	No	Yes	Yes
In-app code editing	Yes	No	No	No
Clean, out-of-the-box UX	No, VSCode fork is apparent	Yes	Yes	Yes
Full PR lifecycle (create, merge, resolve) in-app	No	Yes	No	No
Multiple terminal sessions	Yes	Yes	No	No
Browser or device previews	Yes	No	No	No

Where This Is Going

Short-Term

The next evolution will be coordinating dependent work streams. It's rare that I'm running eight completely unrelated features in parallel—usually there's a dependency graph. Task D depends on C, task E depends on D, and so on. Right now, setting that up in Conductor means manually selecting branches from dropdowns and pointing them at each other instead of origin/main. It's not a nice workflow, and it's not visually accessible from the sidebar either. The current presentation of worktrees as completely isolated doesn't match how real development actually works.

I think what Graphite has built with stacking could be useful here. When D depends on C and E depends on D, changes to C need to propagate upward through the chain. We need to be able to merge things down cleanly. The worktree paradigm alone won't handle that elegantly, and I expect the ADEs that figure out stacking-style workflows will have a real advantage.

Long-term

Cursor recently published a blog post where they described having agents build a browser from scratch using just three roles: planners, workers, and judges. The planner decides what to do, the worker executes, and the judge decides if the work meets the bar to move forward. Steve Yegge published a crazy blog post on Gastown, where his Mad Max themed swarm of agents was tackling tasks for him too.

Right now in my ADE workflow, I act as the planner and the judge while the agents serve as workers. I tell them what to do, I review the output, I decide when we're done. That model works for the current generation of tools, but I don't think it's the endgame.

If you look at Ralph Wiggum (my implementation), it is taking over, in a very rudimentary way, those planner and judge roles. The bash loop (which is the dumbest possible planner) takes a pre-existing plan and hands the next available task to the next worker. The workers judge their own work. The work is completely sequential, but at least dependencies are managed. Tasks that depend on each other are only started when those dependencies are met.

Ralph was a great first step that came organically from the craving the world has for an orchestrator. And I believe it's coming! There's a version of this where the ADE handles all three—where you assign a task to a swarm of agents and they figure out how to decompose it, execute it, and verify it themselves.

The winning interface 6-12 months from now will likely have this native orchestration baked in. Maybe it comes from the model providers themselves—Anthropic or OpenAI building harnesses on top of their agent harnesses. Maybe it comes from Conductor or Cursor. Or maybe it comes from a third party that figures out the coordination layer better than anyone else. But that's the unlock I'm watching for: not just parallel agents, but agents that coordinate with each other, express dependencies, and judge their own progress.

We're not there yet. The current tools are the opening moves, and there's a lot of room for someone to pull ahead.