How is the Agent Reality Index calculated?

Each tool is scored 0–100 within its category from six weighted signals: market demand (search and AI-search volume, weighted most heavily), developer adoption, real usage, momentum, production-readiness, and community. Weightings vary by category, and the index is refreshed monthly.

Why doesn't the index rank by GitHub stars?

Stars measure who bookmarked a repository, often years ago — not who is using a tool today. A tool with 180,000 stars but almost no current search demand will rank below a smaller tool people are actually adopting, because the index leads with real-world demand.

How does the index handle tools with ambiguous names like Goose or Tabby?

When a tool is named after a common word, brand search volume is mostly noise (we measured it at over 99% noise for some names). For those tools the index switches signals and measures traffic to the product's own website instead, and flags the demand confidence so you can see how each number was derived.

How often is the Agent Reality Index updated?

It is rebuilt monthly from fresh demand, adoption, and usage data, so rankings move over time. The current edition was last updated 15 June 2026.

Does a high index score mean a tool is the right choice for me?

No. A high score means real momentum and a real community; it settles none of the readiness questions and does not mean a tool fits your system. The framework is the most replaceable layer — whichever you pick still needs its own identity, scoped credentials, evaluation, and governance.

How the Agent Reality Index is built

Most "best AI agent framework" lists rank by GitHub stars or the author's opinion. Stars measure who bookmarked a repository years ago; they do not measure who is using a tool today. The Agent Reality Index ranks on real-world demand — how much each tool is actually searched for, and asked about inside AI assistants — alongside genuine usage signals. Every tool is scored 0–100 within its category and refreshed monthly. Last updated 15 June 2026.

The six signals

Each tool's score is a weighted blend of six dimensions, every signal normalised across the full field:

Market demand — Google search volume plus AI/LLM search volume. The heaviest factor, because it is the truest measure of who is reaching for a tool.
Developer adoption — package downloads, stars, forks, Stack Overflow presence.
Real usage — dependent repositories, installs and container pulls: production use, not playground stars.
Momentum — growth velocity, release cadence and activity freshness.
Production readiness — release discipline, issue resolution, security policy, licence.
Community — documentation, answered questions and ecosystem breadth.

Weighted per category, not one-size-fits-all

"Adoption" means something different for a Python library than for a coding agent you install from an IDE. So each of the four boards uses its own weighting. For frameworks, real install usage carries more weight; for coding agents and assistants, demand and momentum lead. Market demand is the single largest factor in every category.

The hard part: measuring demand honestly

Brand search volume is the cleanest demand signal — until a tool is named after a common word. Plainly searching "goose" returns the bird; "tabby" the cat; "jan" the month. We checked: the search volume around those names is over 99% noise, and no clever query recovers a clean number. So for tools with ambiguous names we switch signals entirely and measure the traffic to the product's own website — which cannot be confused with a bird. Every tool carries a visible note on how its demand was measured and a confidence flag; where a clean signal is genuinely unavailable, we mark it low-confidence rather than guess. Transparency is the point: you can see exactly why each tool sits where it does.

What the index deliberately does not do

It does not treat GitHub stars as truth — a tool with 180,000 stars and almost no current search demand will rank below a smaller tool people are actually adopting. And it does not let you re-weight the ranking yourself: a single, defensible verdict is more useful than an infinitely adjustable one. The recipe is fixed and published here; the judgement is ours to stand behind.

A note on what a ranking settles

A high index score tells you a tool has real momentum and a real community — it does not tell you it is the right choice for your system, and it settles none of the readiness questions. The framework is the most replaceable layer in an agentic system. Whichever you pick still needs its own identity and scoped credentials, evaluation before autonomy, and governance with names attached. Use the index to see the landscape clearly — then keep the durable parts portable.