Evaluation in a Constrained Landscape: What AI Makes Possible and What Remains Unknown
AI has quietly become part of how many of us evaluate. But how much do we actually know about what it will cost to keep using it?
With the rise of AI, the evaluation field has found ways to use it across the different stages of evaluation projects, from designing instruments to synthesizing data and drafting reports. We see this advancement reflected in evaluation conferences, professional groups, and publications focused on AI in evaluation like the Special Issue of the New Directions for Evaluation journal in 2023 (Mason & Montrosse-Moorhead, 2023). Beyond simply using these tools, AI is now being implemented across systems and procedures. I am one example: I have built AI tools to expand the reach and capacity I have as an internal evaluator, and also as a way to build broader evaluation capacity.
But because I have significantly increased my use of AI in practice, I am always thinking about what the future holds. In this post, I want to talk about something I recently learned that prompted me to look more closely at the future cost of AI, but first let's start with what AI has made possible.
What AI Tools Actually Make Possible
Evaluators working with AI tools today are doing things that previously required significantly more time: designing evaluation plans, synthesizing large volumes of data, drafting data collection instruments, or drafting reports for dissemination. By doing this, AI is reducing the time cost of the scaffolding work that used to take more time thinking and planning through.
The conditions for responsible use matter, though. Concerns remain regarding essential aspects of qualitative research such as rigor, transferability, credibility, and trustworthiness, as AI platforms may fail to grasp the nuances of a given context (Kabir et al., 2025). This is not a reason to avoid AI tools in evaluation work but it provides the warning to use them deliberately, always with human judgment anchoring interpretation and methodological decisions to the evaluator.
The Economics of Access
Current access to these tools is real. Part of what makes this accessible is the pricing structure itself. Most major providers, including Claude and ChatGPT, offer a free tier with no cost to the user, and entry-level paid plans for individuals typically start around $20 per month. A large share of users rely on these free or low-cost tiers rather than enterprise contracts, which is part of what makes AI feel so immediately within reach for evaluators and small organizations. For districts and nonprofits already absorbing funding cuts, this accessibility is meaningful because evaluation work that could otherwise be deprioritized or contracted out can now be sustained. While this sounds very promising for us, a fair question is whether the affordability will last. What will happen with pricing models?
I attended the AI in Education Summit at the University of South Florida earlier this month. At one of the opening sessions, a presenter flagged that AI pricing today is heavily subsidized. He also encouraged attendees to prepare for the possibility of usage-based billing as the industry matures.
After that statement, I investigated a little bit more, and found more information that points in the same direction. OpenAI is preparing to ask public investors to value the company at more than $1 trillion while projecting a $14 billion loss for 2026, with no profitability expected before 2029 or 2030 (Weinberg, 2024). Anthropic, the company behind Claude, reported a different trajectory: a projected operating profit of $559 million for the second quarter of 2026, on $10.9 billion in revenue (Carvão, 2026). That figure excludes stock-based compensation, and sustained profitability for the full year is not guaranteed given planned increases in compute and model training spending.
Neither point predicts what will happen to subscription prices, but both point to the same underlying reality: providers are on different financial paths, the broader industry has not settled into stable pricing, and the conditions that make AI accessible today might not be guaranteed to hold.
What This Means for Evaluation Practice
AI tools have proven to be very useful for extending evaluation capacity under resource constraints, and the affordability right now makes it possible. But there is a difference between using AI tools to support evaluation tasks and building AI-dependent systems and procedures in a yet-to-be-clear future. Those of us implementing AI into our evaluation infrastructure, workflows, staffing models, or compliance processes, are taking on structural risk they may not have fully accounted for.
Having this in mind, it might be a good moment to think about what breaks in your evaluation system if the tool becomes inaccessible or prohibitively expensive. That question is important as a preventative measure of any AI deployments at scale.
One practical step is to treat this the way we would any other cost-sensitive evaluation design: with a sensitivity analysis. Modeling your evaluation budget and workflows under a few pricing scenarios, current rates, double, five times, or a scenario where the tool becomes unavailable altogether, can clarify which parts of your practice are genuinely AI-dependent and which simply benefit from AI without requiring it. That distinction is what allows an organization to keep using these tools confidently now, while still knowing what its evaluation capacity would look like without them.
None of this argues against using AI in evaluation work. It argues for building evaluation infrastructure thoughtfully, knowing what the tools can and cannot do, maintaining the methodological judgment that no tool can replace, and avoiding the assumption that current access conditions define a permanent baseline. We are all still learning what this technology is, what it will become, and what it will cost. That uncertainty is a reason to stay informed and build carefully.
References
Carvão, P. (2026, May 21). OpenAI and Anthropic are testing two very different AI business models. Forbes. https://www.forbes.com/sites/paulocarvao/2026/05/21/anthropic-openai-enterprise-ai-profitability/
Kabir, S. M. A., Ali, F., Ahmed, R. L., & Sulaiman-Hill, R. (2025). Exploring the use of AI in qualitative data analysis: Comparing manual processing with Avidnote for theme generation. International Journal of Qualitative Methods. https://journals.sagepub.com/doi/10.1177/16094069251336810
Mason, S., & Montrosse-Moorhead, B. (2023). Editors' notes. New Directions for Evaluation, 2023(178-179), 7–10. https://onlinelibrary.wiley.com/doi/10.1002/ev.20563
Weinberg, C. (2024, October 9). OpenAI projections imply losses tripling to $14 billion in 2026. The Information. https://www.theinformation.com/articles/openai-projections-imply-losses-tripling-to-14-billion-in-2026