Orbrya2026-03-12

The Jagged Frontier: Why AI Performance Is Unpredictable

AI outperforms humans on some tasks and fails badly on others, and the boundary is invisible. Research explains why this makes verification non-optional.

There's a useful assumption many families make about AI: that it's roughly equally good or bad across all kinds of tasks. In practice, AI performance is not uniform at all. It's dramatically better on some tasks than others, and the boundary between those two zones (what researchers have called the "jagged frontier") is largely invisible to the person using it.

This matters practically for families teaching AI literacy, because it means blanket policies about AI ("always check" or "AI is fine for X") miss the actual structure of the risk. The check that's needed depends on what kind of task you're doing and where it falls relative to AI's real capability boundary.

The research

In 2023, researchers from Harvard Business School conducted a study with 758 consultants at Boston Consulting Group. Each consultant was randomly assigned to one of three groups: one using GPT-4 for their work, one without AI, and one with AI plus guidance on its limitations. The participants worked on realistic business tasks: analysis, brainstorming, writing, problem-solving.

For tasks that fell within AI's capability range, the AI users outperformed significantly. The tasks AI was good at, it was very good at, consistently and by a meaningful margin.

For tasks outside AI's capability range -- tasks that required different reasoning than AI could reliably provide -- the AI users were 19 percentage points less likely to produce correct solutions than the group working without AI. The consultants who used AI on tasks AI couldn't do well actually performed worse than they would have without it.

The researchers called this the jagged frontier: a capability boundary that isn't a smooth wall but an irregular edge. On one side, AI is excellent. On the other side, AI makes things worse. And from the inside, the frontier is invisible. The consultants couldn't see which side of the boundary they were on. The AI gave them confident output either way.

Why the invisibility is the problem

AI does not change its presentation depending on whether it's operating inside or outside its capability range. It doesn't say "this is a type of problem I handle well" versus "this is a type of problem I tend to get wrong." The confidence level of the output is roughly constant regardless of the underlying reliability.

This means the user can't rely on AI's expressed confidence as a signal. A confident-sounding answer might be excellent or it might be nonsense. The only way to know is to bring outside knowledge, check against independent sources, or work through the problem independently and compare.

For families, this is the clearest argument against a blanket "AI is fine for this" policy. Even within a domain where AI is generally reliable, specific questions may fall outside the frontier. The only reliable response is treating verification as standard rather than optional.

What this looks like for children

For a ten-year-old using AI to help understand a science concept: AI is probably reliable for well-established basic science and probably unreliable for nuanced or recent findings. The child can use AI to get an initial explanation but should check specific claims about ongoing research against a more current source.

For a fifteen-year-old using AI to prepare for a debate: AI is probably reliable for marshaling standard arguments on both sides of a well-known topic, and probably unreliable for the specific recent evidence that would distinguish a strong argument from a weak one. The preparation AI does is a useful starting point, not a finished product.

For a college applicant using AI to review their personal statement: AI is quite good at grammar, clarity, and structure, and quite poor at capturing authentic individual voice. The student who lets AI improve their prose and then reads the revised version to make sure it still sounds like them is using AI appropriately. The one who accepts the revision without that check is at risk of submitting something that sounds polished but doesn't sound like them.

The practical takeaway

The jagged frontier doesn't mean AI is unreliable. It means AI is selectively reliable in ways that aren't always obvious. Building AI literacy means developing, over time, a working map of where that frontier runs for the tasks you actually do.

That map is built through experience: noticing when AI gets things right and when it gets things wrong, tracking the types of errors that show up repeatedly in a domain, developing a sense of which tasks benefit from light review and which demand thorough checking.

For children, that map takes years to build, which is why starting early matters. A child who begins paying attention to AI's failure modes at age ten will have a much more refined model of AI capability by the time they're making consequential decisions with it at eighteen. A child who starts at eighteen is navigating with a much less reliable map.

The jagged frontier also reinforces something fundamental about the boss/employee relationship with AI: even a highly competent employee has areas of weakness, and a good employer knows what those areas are before assigning tasks. The frontier is the map of those weaknesses. Building it is the work.

Sources cited in this post:

Dell'Acqua, F., McFowland, E., Mollick, E. R., Lifshitz-Assaf, H., Kellogg, K., Rajendran, S., Krayer, L., Candelon, F., & Lakhani, K. R. (2023). "Navigating the Jagged Technological Frontier: Field Experimental Evidence of the Effects of Artificial Intelligence on Knowledge Worker Productivity and Quality." Harvard Business School Working Paper 24-013. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4573321