Why the real shift isn’t AI—but what happens when producing answers no longer proves understanding
There is a quiet shift happening in assessment.
It isn’t primarily about AI.
It’s about what happens when answers become easy to produce.
For a long time, assessment has relied on a simple structure:
A question is set.
A response is produced.
A judgement is made about the quality of that response.
That structure assumed something that is no longer stable—that producing a response required effort that correlated, at least loosely, with understanding.
That assumption is now unreliable.
When tools can generate plausible answers on demand, the problem is not that students might use them.
The problem is that the assessment itself no longer reveals very much.
Not because it has been bypassed, but because it has been completed too easily.
Predictable assessments tend to fail first.
This is where much of the current conversation gets stuck.
It circles around detection.
Can we tell if a piece of work was written with AI?
Can we identify signals of external assistance?
Can we preserve the integrity of the task?
These questions are understandable. They are also limited.
Detection becomes increasingly uncertain as soon as a student begins to work with the output—editing, reshaping, inserting their own examples, refining the language. What emerges is neither fully generated nor fully original. It is something in between.
At that point, the distinction becomes difficult to hold.
More importantly, it becomes less useful.
The deeper question is not whether AI was used.
It is what counts as evidence of learning in an environment where it almost certainly was.
One way to understand the shift is this:
Assessment is moving from the evaluation of products to the interpretation of thinking.
This is not entirely new. Good educators have always paid attention to reasoning, not just answers. But the balance has changed.
Where the final submission once carried most of the evidential weight, it now carries less.
Not because it has no value, but because it is no longer sufficient on its own.
What begins to matter more are traces of thinking.
How an idea was formed.
How it was revised.
What alternatives were considered.
What judgements were made along the way.
These are not easily replaced by generated text.
They are also not always visible in traditional assessment formats.
This creates a subtle design problem.
If the task only asks for an answer, it invites optimisation.
If it makes space for reasoning, it begins to reveal something else.
There is also a corresponding shift for those doing the marking.
Automation can assist in identifying patterns, summarising work, or drafting feedback. But it does not remove the need for judgement.
If anything, it sharpens it.
Because the task is no longer simply to evaluate a finished piece, but to interpret what it represents.
That requires context, experience, and disciplinary understanding.
Assessment is not disappearing.
It is becoming more interpretive.
And, in some ways, more demanding.
What is still unresolved is how far this shift needs to go.
Most current designs sit somewhere in between—holding onto familiar formats while beginning to adapt around them.
That may be a necessary phase.
But it is unlikely to be the end state.
For now, the useful question is not how to remove AI from assessment.
It is how to design assessment that still makes thinking visible, even when AI is present.
Assessment is no longer about proving that an answer can be produced.
It is about revealing how thinking happens when it can.
If you’re working through this shift in your own context, I’m always open to a conversation.

