THISISGRAEME

AI, Assessment, and the Growing Capability Trust Problem

AI Didn’t Break Assessment — It Exposed What Was Already Fragile

For the past two years, tertiary education has been flooded with variations of the same question:

“How do we stop students using AI to cheat?”

Underneath that question sits a growing layer of institutional anxiety.

Educators are increasingly uncertain about what they are actually looking at when they assess learner work. Moderators are encountering evidence that feels harder to interpret confidently. Policy teams are trying to keep pace with technologies evolving faster than governance cycles. Meanwhile, learners are using AI tools in ways that range from legitimate support through to heavy substitution — often somewhere in between.

In many cases, the operational reality has already shifted well ahead of the systems surrounding it.

But there is a deeper issue emerging beneath the immediate AI conversation.

Generative AI may not have broken assessment.

It may have exposed assumptions that were already becoming fragile.


The Hidden Assumption Stack

For a long time, many assessment systems operated with a relatively linear logic chain:

submitted work
→ authorship
→ understanding
→ capability

Not perfectly. Not universally. But often implicitly.

A learner submitted an essay, report, portfolio, workbook, reflection, or project. The artefact was assessed against criteria. From there, capability was inferred.

Under earlier conditions, this worked reasonably well much of the time.

The problem is not that educators were naïve. Nor is it that assessment systems were inherently flawed. Most were designed within environments where producing substantial written outputs still required a meaningful degree of learner effort, synthesis, interpretation, and communication.

That environment has changed rapidly.

Today, high-quality outputs can increasingly be:

often within minutes.

The challenge is not simply that AI can produce text.

It is that the relationship between artefact production and underlying capability has become far less stable than many systems assumed.


AI Accelerated Visibility

There is a temptation to frame this as a sudden collapse caused entirely by AI.

That interpretation is probably too simple.

Generative AI did not invent:

Those dynamics already existed in parts of the system.

What AI did was industrialise ambiguity.

It dramatically lowered the effort required to produce convincing outputs while simultaneously making that assistance harder to detect consistently.

In doing so, it exposed something many educators were already quietly sensing:

high-quality artefacts do not always equal high-confidence evidence of capability.

That distinction matters.

Because the issue is no longer confined to academic misconduct alone.

It increasingly affects how institutions establish confidence in what learners genuinely know, understand, and can actually do.


The Real Pressure Point

Much of the current public conversation still centres on cheating.

But the deeper institutional pressure point may be trust.

More specifically:

what educational evidence genuinely allows us to conclude.

If a polished submission can now emerge from a complex blend of:

then interpreting capability becomes more complicated.

This does not mean learners are not learning.

Nor does it mean AI use is automatically inappropriate.

In many contexts, AI tools can genuinely support:

The challenge is subtler than prohibition.

The challenge is distinguishing:
support from substitution,
surface fluency from deeper understanding,
and convincing performance from reliable capability.

That ambiguity becomes especially significant in:

particularly where direct observation of learner capability is limited.


Why This Matters Beyond Education

The implications extend beyond classrooms and assessment policy.

Credentials ultimately function as trust signals.

Employers, professions, industries, and communities rely on them as indicators that a person can:

If confidence weakens around what credentials actually verify, pressure eventually flows outward into workforce trust itself.

This is particularly important in vocational, professional, and capability-based contexts where performance matters more than artefact production alone.

The strategic issue is not whether AI exists in professional life. It already does.

The issue is whether educational systems can still reliably determine when meaningful capability is genuinely present beneath increasingly sophisticated outputs.


What The Sector May Be Misreading

One of the risks right now is that institutions respond to an infrastructure problem as though it were only a policy problem.

More rules alone are unlikely to stabilise confidence if the underlying evidence assumptions remain uncertain.

Similarly, detection technologies may help in some situations, but they are unlikely to function as a complete long-term trust architecture on their own. AI systems are evolving too quickly, usage patterns are too varied, and false positives carry their own risks.

At the same time, unrestricted “AI everywhere” approaches can create different forms of ambiguity if institutions lose clarity around what capability standards still matter and how they are verified.

This is why many providers are currently operating in mixed-mode uncertainty:

Much of the sector is still trying to reconcile older assessment assumptions with a fundamentally altered evidence environment.


Quiet Adaptation Is Already Happening

Interestingly, some of the most promising responses are not entirely new.

Across parts of tertiary and vocational education, educators are increasingly experimenting with:

These approaches already existed in many places.

What may be changing is their strategic importance.

As AI increases uncertainty around standalone artefacts, confidence may increasingly emerge through:

In other words:

the system may gradually shift from relying primarily on outputs alone toward building stronger confidence in capability itself.

That is a different orientation.

And potentially a significant one.


A More Useful Framing

None of this means:

Those framings are unlikely to help.

The more useful interpretation may be simpler:

the environment changed faster than the assumptions underneath many assessment systems.

What we are now seeing is not just a technology challenge.

It is a trust and evidence challenge.

Educational institutions are increasingly being asked to answer a more difficult question than before:

How do we confidently recognise capability under AI-assisted conditions?

That question is still emerging.

But it may become one of the defining tertiary challenges of the next decade.

If these tensions are surfacing in your organisation as well, I’d be interested in hearing what patterns you’re seeing across assessment, moderation, capability verification, or workforce readiness.

The next post in this series explores why the deeper issue may not actually be cheating — but capability trust itself.


Graeme Smith is the founder of Te Aho Lab and creator of Tertiary Signals, exploring capability, trust, and verification under AI-assisted conditions across tertiary education and workforce systems in Aotearoa New Zealand.

Exit mobile version