"An open book with data streams flowing toward a student at a laptop, surrounded by a glowing brain diagram mapping cognitive load dimensions — working memory, emotional strain, engagement, fatigue, linguistic complexity, processing speed, and attentional stability — alongside a NASA-TLX scalar gauge and a thermometer-thermostat metaphor, with a classical philosopher's bust and archway in the background"

A Margin Note Five Years Later

23 May 2026 at 15:53 BST10 min read

Listen to this post0:00 / --:--

In June 2021 I finished reading Oliver Lovell's Sweller's Cognitive Load Theory in Action and left a review on Goodreads. It was not a long review. It summarised the book's contribution — Lovell does an honest job of making Sweller's framework accessible — and then, in the final lines, turned outward into a question I had no apparatus to answer yet. I wondered, in print, whether it might be possible to apply cognitive load theory to a subject using machine learning algorithms that could generate, based on a student's personal attention span, working memory capacity and depth of understanding, a structured programme that would ensure a student was maximising learning whilst minimising cognitive load, stress, and the detriment to mental health.

That was the margin note. Not in a physical book — I do not annotate my books — but in the public record, timestamped, before adaptive AI tutors were a mainstream conversation and before I had any formal framework for what I was gesturing at. I did not build anything from it immediately. The idea sat the way ideas sit when you cannot yet see their shape clearly: present, occasionally surfacing, unresolved.

Five years later, two papers published on Zenodo formalised what that review contained in embryo. A whitepaper laying out the conceptual framework and phased rollout strategy. A measurement guidelines paper formalising the tensor architecture, the signal-to-dimension mapping, the three-layer measurement system. The question from 2021 had become, if not an answer, at least a rigorous attempt at one.

This is an account of what that attempt proposes, where it holds, and where it does not. It is also, finally, a reckoning with the context it arrives in — which is to say, a reckoning with an EdTech industry that has had access to a century of serious pedagogical thinking and has largely chosen not to use it.

What the Framework Proposes

The central claim is this: cognitive load, as it has been measured, has been measured wrongly — not in its particulars, but in its form.

The dominant instruments — NASA-TLX and its descendants — collapse cognitive workload into a single number. A composite score. A scalar. You finish a task, you rate it across several dimensions, the dimensions are weighted and summed, and what emerges is a figure that tells you how hard it was. This is useful for comparing tasks in controlled conditions. It is not useful for adapting instruction to a learner in real time, because the compression destroys the information that would make adaptation possible.

Knowing that a learner's composite workload score is 0.72 tells you very little. Knowing that their working memory load is near threshold while their emotional strain is low — that they are cognitively saturated but not anxious, that the problem is density not distress — tells you something you can act on. The intervention for saturation is different from the intervention for distress. The scalar cannot distinguish them.

The framework's structural proposal is to replace that scalar with a tensor: a multidimensional representation in which each dimension corresponds to a distinct component of cognitive load. Linguistic complexity. Engagement. Working memory demand. Processing speed. Attentional stability. Emotional strain. Fatigue. Each assigned a value, each tracked independently, each capable of crossing its own threshold and triggering its own response. The composite norm — the Euclidean distance across dimensions — remains available as a summary measure, but the dimensional structure is preserved rather than discarded at the point of measurement.

The measurement system is layered. A static calibration layer establishes a baseline cognitive profile for the learner before instruction begins — working memory span estimation, attention profiling, prior knowledge assessment. A dynamic tracking layer monitors behavioural signals in real time during a session — response latency, error patterns, interaction pace, re-reading frequency. An adaptive response layer translates threshold crossings into interventions: reduced complexity, modality shift, pacing adjustment, rest prompt.

The architecture is coherent. The motivation is genuine. There are, however, places where the framework asserts more than it has yet demonstrated.

Where It Gets Difficult

The signal-to-tensor mapping is the first and most persistent problem. The measurement paper maps observable behavioural signals to tensor dimensions — response latency to processing speed load, for instance, re-reading frequency to working memory demand. The logic is intuitive. The practical problem is that every one of those signals is confounded by factors that have nothing to do with cognitive load.

Response latency is confounded by topic interest, ambient distraction, physical interruption, re-reading for pleasure rather than confusion, network lag, and fatigue operating independently of the learning task. There is no principled decomposition method specified in the papers that isolates the load-specific component of a given signal from the noise surrounding it. This is not a minor implementation detail. It is the gap between a measurement architecture and a measurement instrument. A framework that maps signals to dimensions without specifying how to separate signal from confound has described the shape of a solution without yet building one.

The validation strategy compounds this. The papers propose correlating the tensor composite norm against NASA-TLX ratings as a measure of validity. But NASA-TLX is precisely what the framework was designed to supersede — a subjective, retrospective, scalar measure. If the new measure agrees with the old one, what has been demonstrated is consistency with a known-imperfect baseline. If it disagrees, there is no adjudicator. The comparison cannot establish validity; it can only establish correlation with something that is itself not ground truth. Independent validation — dual-task interference paradigms, EEG theta and alpha band ratios, controlled performance degradation studies — is what the framework requires before it can claim to measure what it says it measures.

The static calibration layer relies heavily on n-back tasks for working memory span estimation. This is a significant vulnerability. The neuroscience literature on cognitive training is fairly consistent on this point: n-back performance is highly practice-dependent, and improvements do not reliably transfer to real-world working memory capacity. Building a foundational cognitive profile on n-back performance is building it on a measure that may not generalise to the learning contexts the system is designed to support.

The composite norm formula — the Euclidean distance across all tensor dimensions — treats every dimension as equivalent. High fatigue and high linguistic complexity contribute equally to the aggregate score. But a learner saturated by fatigue requires a different intervention than a learner saturated by linguistic complexity, and both are different again from a learner whose emotional strain is elevated. The Euclidean norm discards that information at the final aggregation step, reproducing a version of the same compression it was designed to avoid. A weighted norm, or a per-dimension threshold architecture with priority ordering, would be more defensible.

The largest absence in both papers is the adaptive control policy. The framework describes, in considerable detail, what the system measures. It does not specify, with comparable rigour, how measurements translate into interventions. When working memory load crosses its threshold, what happens? By how much does complexity reduce? What is the decision rule when working memory load is high but engagement is also high — when the learner is saturated but still present, challenged but not lost? These are not implementation details to be resolved later. They are the algorithmic core of the system. A measurement instrument without an actuator is a thermometer without a thermostat. The framework names the temperature. It has not yet specified what controls the heat.

These are real problems. They are the problems of a framework at an early stage — coherent in architecture, incomplete in execution, honest about the distance between what it proposes and what it has demonstrated. None of them invalidate the central contribution, which is the dimensional reframing of cognitive load measurement. But they should be stated plainly, because the framework exists in a context where plainness is in short supply.

What the Classroom Already Knew

The idea that instruction should respond to the learner's current cognitive state is not new. It has been understood, in various forms, for a long time.

Vygotsky described it as the Zone of Proximal Development — the space between what a learner can do unaided and what they can do with appropriate support. Effective instruction, on this account, is not the delivery of content at a fixed level of difficulty but the dynamic maintenance of a learner at their productive edge: challenged enough to grow, supported enough not to fail. Dewey built an entire philosophy of education on the premise that learning is inseparable from experience — that the curriculum must meet the learner where they are, not where a syllabus expects them to be. Montessori structured entire learning environments around self-directed pacing, on the observation that children learn most efficiently when they are neither bored by material below their capacity nor overwhelmed by material above it. The Socratic method is, at its core, a real-time responsiveness loop: the teacher reads the learner's state from their responses and adjusts the next question accordingly.

These are not obscure ideas. They are the considered conclusions of thinkers who spent their working lives in proximity to actual learners, watching what happened when instruction did and did not meet the mind it was aimed at. The understanding was there. It did not require a neural network to arrive at.

EdTech arrived and built content libraries with progress bars.

Not universally, and not without exceptions, but as a tendency — as a structural disposition. The learning management system became the dominant paradigm, and the learning management system is, at its core, a distribution mechanism for pre-authored content with a layer of tracking on top. The tracking tells you whether the learner completed the module. It does not tell you whether the learner was cognitively present for it, whether they were saturated or bored or anxious, whether the pacing was calibrated to their working memory capacity or simply imposed by the author's sense of appropriate length. Completion is not comprehension. A progress bar is not personalisation.

The AI layer has not fundamentally changed this. Recommendation engines suggest the next piece of content based on performance signals. Generative systems produce explanations on demand. These are useful. They are not responsive in the sense that Vygotsky meant, or Dewey, or Montessori. They do not maintain the learner at their productive edge because they do not measure the edge. They approximate it, at best, from the outside — from performance proxies that collapse the dimensional structure of cognitive experience into the same scalar convenience that NASA-TLX offered fifty years ago.

The tensor framework is interesting not because it is new but because it is attempting to restore what was abandoned. The pedagogical tradition understood the problem. It lacked the instrumentation to solve it at scale. What the framework proposes is that instrumentation — a formal architecture for measuring the cognitive state that the best teachers read intuitively and adjust to in real time, now rendered computable.

The honest account is this: it has not solved the problem yet. The signal-to-tensor mapping is under determined. The control policy is unspecified. The validation strategy requires independent grounding. These are not reasons to dismiss the framework — they are the unsolved engineering problems of doing computationally what a skilled teacher does in the room, in the moment, without instruments, because they are present and paying attention.

Most of what is currently being sold as EdTech innovation is not attempting this at all. It is selling the distribution mechanism faster, at lower cost, with better graphics. The margin note from 2021 was asking a different question. Five years later, the question is still worth asking — and still, largely, unanswered.

A Margin Note Five Years Later

What the Framework Proposes

Where It Gets Difficult

What the Classroom Already Knew

Related posts

The Sweller Load: Rethinking Human Learning Efficiency in the Age of AI

How Memories Form: From the Work of the Moment to the Architecture of the Past

The Human in the Equation

A Margin Note Five Years Later

#What the Framework Proposes

#Where It Gets Difficult

#What the Classroom Already Knew

Related posts

The Sweller Load: Rethinking Human Learning Efficiency in the Age of AI

How Memories Form: From the Work of the Moment to the Architecture of the Past

The Human in the Equation

What the Framework Proposes

Where It Gets Difficult

What the Classroom Already Knew