why keep this many models in rotation?

no single model wins every task. i pay the switching cost on purpose so i can match latency, reasoning depth, and tool behavior to the kind of work on my desk that hour.

are the exact strings in the table stable forever?

no. products rename tiers and bump snapshots all the time. treat the identifiers as what my editor shows today, and use the documentation links when you need the current catalog.

deep dive: the ai models i use

ai - This article is part of a series.

Part : the future of data engineering workflows with ai

Part : the danger of trusting the ai agent

Part : This Article

Part : a practical ai workflow: jira, github, and mcp

Part : from prototype to production: my early adopter view of ai

Part : starter templates for ai rules, skills, and commands

Part : how to use ai to create ai rules, skills, and commands

Part : ai br-ai-n fr-ai

i spend most of my working day using an ai assistant in cursor. the part that is easy to skip in public write-ups are the simpler details like which model name maps to which vendor, what each one is trying to be good at, and where i should not pretend it is interchangeable with the others.

this post is that roster for me, written as of friday mar 20, 2026. i am not running benchmarks here. i am writing down how these models behave in my hands, with links so you can read the official specs if you want to explore further on your own. for why i treat multi-model routing as a production-era default, see from prototype to production: my early adopter view of ai.

quick answer
#

six models in my rotation right now: composer 2 when i want cursor-native agentic work, gpt-5.3 codex xhigh when i need serious implementation muscle, claude 4.6 opus max when the problem is genuinely hard and i want anthropic thinking, gemini 3.1 pro when the input is big or visual, grok 4.20 when i am stuck and want a fresh perspective, and kimi k2.5 when i want strong tool use from outside the usual three vendors.

who this is for
#

anyone already using cursor (or something similar) who wants to know what models are out there
engineers who do not want to watch an hour of launch videos to get a vendor map
future me, six months from now, when half of these names have changed and i need to remember what i was actually using

comparison table
#

the table is the quick reference. the sections below are where i get honest about what each model is actually like to use.

model (as shown in my router)	maker	speciality / intended use	pro / con	documentation
`composer-2`	cursor	agentic coding inside cursor: edits, terminal-shaped workflows, tool use	pro: built for the editor; strong on long-horizon tasks with summarization training. con: not a portable api model in my mental model; i think of it as an environment capability, not a generic llm	composer 2 model page
`gpt-5.3-codex-xhigh`	openai	agentic coding via the codex line; the `xhigh` suffix is how my router encodes a higher reasoning effort preset on top of the codex family	pro: excellent when i want careful refactors and api-shaped thinking. con: slower and more expensive than “just answer fast” tiers; easy to overuse on trivia	gpt-5-codex model, codex product hub
`claude-4.6-opus-max`	anthropic	maximum depth sonnet-family reasoning when latency is a fair price	pro: best anthropic option in my rotation for subtle bugs, spec ambiguity, and multi-file coherence. con: the cost and latency are real; i save it for work that deserves the tax	claude models overview
`gemini-3.1-pro`	google	flagship gemini tier for long context and strong multimodal reasoning in the gemini stack	pro: great when i am dragging in screenshots, pdf-shaped context, or very wide file sets. con: vendor-specific quirks still matter; i verify critical logic instead of trusting vibe	gemini models
`grok-4-20`	xai	grok 4 family reasoning with the 4.20 snapshot naming xai uses in api surfaces	pro: useful second opinion when i feel anchored to one vendor’s “house style”. con: i treat cutting-edge models as higher variance until i have personal calibration data	xai api introduction
`kimi-k2.5`	moonshot ai	kimi k2 line tuned for coding, math-style reasoning, and tool calling on moonshot’s platform	pro: strong when i want mixture-of-experts-style efficiency stories and a different training prior than the usual us trio. con: operational details (regions, billing, rate limits) are another console to respect	kimi api quickstart

`composer-2` (cursor)
#

composer 2 is cursor’s house model for agentic work such as file edits, tool calls, and terminal workflows. it does not feel like chatting with an llm. it feels like the editor itself got smarter.

i use it when the task lives in the repo: multi-step refactors, searching across the workspace, long sessions where i do not want to re-explain context every ten minutes. i do not think of it as an api model i happen to access through cursor. it is more like a capability of the editor itself.

the official docs say it is tuned for tool use and long horizons. that matches what i see.

`gpt-5.3-codex-xhigh` (openai)
#

this is my “i need the ai to really think about this” slot on the openai side. the public docs call the family gpt-5-codex; the 5.3 and xhigh parts are how my router encodes the version and reasoning effort. your account might show a different string.

i use it when the work is code-heavy and i want the model to show its reasoning, not just spit out an answer. it shines when the change touches contracts, apis, types, migrations, or anything where a wrong assumption quietly spreads.

the downside is obvious: it is slower and more expensive, and it tempts me into using a sledgehammer on a thumbtack.

`claude-4.6-opus-max` (anthropic)
#

this is my only anthropic route right now and i save it for the hard stuff: security-sensitive code, tricky concurrency, specs that contradict themselves, and problems where i want the model to slow down and really chew on it.

the trade-off is cost and patience. opus is not “better” at everything. it is better at the things where i would otherwise redo the work three times trying to get it right with a faster model.

i check anthropic’s model pages periodically because vendors bump versions quietly and my router changes behavior without telling me.

`gemini-3.1-pro` (google)
#

gemini is where i go when the input is not just code. screenshots, long mixed documents, big file sets, and that is where the pro tier earns its keep for me.

same review standard applies though. if the answer involves auth, money, or data integrity, the model is writing drafts, not making decisions. i sign off. always.

`grok-4-20` (xai)
#

grok is my “break the pattern” model. when i have been staring at the same bug through two other model families and getting nowhere, throwing it at a third set of priors sometimes finds the thing i missed faster than another hour of printf debugging.

i keep my expectations honest though. this model does not compete with the above flagship models, but it sometimes i even find value in seeing what it gets wrong which prompts a better question for me to ask to one of the better models. it is kind of like using microsoft edge to download google chrome.

`kimi-k2.5` (moonshot ai)
#

kimi k2.5 is my pick when i want strong coding and tool calling from outside the usual us vendor trio. moonshot makes it easy to try because their endpoints are openai-compatible, so i do not have to rewire everything to test it.

i only keep a model in rotation to make sure i do not always only use the same two or three models. otherwise it just collects dust.

how i actually pick (it is not scientific)
#

lots of files, lots of tool calls → composer 2
hard code problem, i want to see the reasoning → codex xhigh or opus max, depending on whether i want openai-flavored or anthropic-flavored thinking
big context window or images involved → gemini 3.1 pro
i have been going in circles for an hour → grok or kimi for a fresh set of eyes

faq
#

do you run all six every day?
#

no. most days it is 90% gpt-5-codex. the full roster is there for when i need it, and over time i have built up a mental map of which model tends to do well on which kind of task.

should i copy this exact list?
#

please do not. if you are not living inside an agentic editor all day, half of this will not make sense for your workflow. honestly, one fast model and one deep model will cover most people. add a third only if you keep running into the same wall.

deep dive: the ai models i use

quick answer
#

who this is for
#

comparison table
#

`composer-2` (cursor)
#

`gpt-5.3-codex-xhigh` (openai)
#

`claude-4.6-opus-max` (anthropic)
#

`gemini-3.1-pro` (google)
#

`grok-4-20` (xai)
#

`kimi-k2.5` (moonshot ai)
#

how i actually pick (it is not scientific)
#

faq
#

do you run all six every day?
#

should i copy this exact list?
#

references
#

related reading
#

Related

quick answer#

who this is for#

comparison table#

composer-2 (cursor)#

gpt-5.3-codex-xhigh (openai)#

claude-4.6-opus-max (anthropic)#

gemini-3.1-pro (google)#

grok-4-20 (xai)#

kimi-k2.5 (moonshot ai)#

how i actually pick (it is not scientific)#

faq#

do you run all six every day?#

should i copy this exact list?#

references#

related reading#

Related

quick answer
#

who this is for
#

comparison table
#

`composer-2` (cursor)
#

`gpt-5.3-codex-xhigh` (openai)
#

`claude-4.6-opus-max` (anthropic)
#

`gemini-3.1-pro` (google)
#

`grok-4-20` (xai)
#

`kimi-k2.5` (moonshot ai)
#

how i actually pick (it is not scientific)
#

faq
#

do you run all six every day?
#

should i copy this exact list?
#

references
#

related reading
#