The top of this scoreboard tells two different stories
Cursor sits at 9.0. That score matters not because it leads the board, but because of what surrounds it: Windsurf at 8.5 and GitHub Copilot at 8.4. Three code editors occupy three of the top six spots. No other category comes close to that kind of internal consistency at the high end.
The coding category has, for practical purposes, found its hierarchy. Cursor is the standard. Windsurf is the serious challenger. Copilot is the enterprise fallback. Three tools spanning 0.6 points (8.4 to 9.0) tells you the market understands these products. Users have settled opinions. The evaluation criteria are stable.
Compare that to video generation.
Three video tools, zero clarity
Runway (8.4), Higgsfield (8.1), HeyGen (7.8). Three serious products within 0.6 points of each other. That spread looks similar to the code editor range, but the story is the opposite. In coding, the gap reflects a clear leader and two followers. In video, it reflects three tools doing genuinely different things well enough that no one has pulled away.
Runway dominates cinematic generation. Higgsfield has built a strong position in social-ready video with preset motion styles. HeyGen owns talking-head and avatar production. The 0.6-point spread is not a ranking. It is a map of three different product bets running in parallel.
Image generation tells a similar story. Midjourney (8.2) versus Adobe Firefly (8.0). Two points separate tools with completely different positioning: Midjourney for creative experimentation, Firefly for teams that need commercially safe output inside an existing Creative Cloud workflow. Neither is beating the other. They are serving different buyers.
The agents category is the odd one out
ChatGPT scores 8.5 here, which seems reasonable until you consider the category label: agents. ChatGPT is doing many things under that term, from basic conversation to multi-step task execution. Lovable at 8.1 is a much more focused bet, generating functional web apps from natural language prompts.
The 0.4 gap between them is the interesting part. Agents is supposed to be the defining frontier of this AI cycle. If the category were maturing the way coding has, you would expect one tool to be pulling clearly ahead. Instead ChatGPT and Lovable sit close together, which suggests the category definition itself is still blurry. Are agents tools that automate tasks, build things, or both?
What settled categories actually look like
Coding looks settled. Cursor leads by a full 0.5 points over the nearest competitor in its category. That scoreboard is a ranking, not a cluster.
Everything else looks unsettled. Video, image, and agents show patterns of fragmentation or early differentiation. Even search, with Perplexity at 8.6, has only one representative on this board, so there is no internal category signal to read.
That is not a criticism. It is an observation about where real competition is happening right now. Cursor already won its category. The fights worth watching are in video and agents, where the scores are tight and the product strategies are still diverging.