Predicting children’s potential: baseline tests

by Terry Wrigley, Visiting Professor, Northumbria University

Yesterday’s post explained how theories that intelligence is genetically determined had restricted the access of working-class children to a quality education. The idea that individuals have a fixed or predetermined ‘intelligence’ or ‘potential’ places a ceiling on their development which particularly affects children who have had fewer educational opportunities in their early life.

Baseline tests at the start of Reception mark a new phase in trying to pin down children’s “potential”. It is well established that social class differences manifest in terms of vocabulary differences etc before children even start school. Four-year-olds with less educated parents or less experience with books will usually score lower. Testing at the age of 4 will label less advantaged children as intellectually inferior.

As with theories of genetically transmitted intelligence (see the earlier post), correlations are being used to give the baseline tests some scientific credibility. This depends on being able to demonstrate a close match with each pupil’s achievement at a later stage.

The Department for Education have licensed six different companies to market baseline tests to schools. The Centre for Evaluation and Monitoring (CEM) in Durham are undoubtedly the most experienced of these businesses. They have over 20 years’ experience of baseline assessment. Their claim for their baseline test’s “excellent predictive validity” is based on a “strong correlation” between a child’s initial test score and later academic performance. This is supposedly a reliable way of measuring potential.

CEM’s precise claim is that the Reception baseline test “correlates at 0.68 level with age 11 assessments”. What does this mean in reality?

Correlations are scored like this: a perfect relationship counts as 1 and a random association between two sets of data counts as 0. Textbooks often say 0.7 is a good correlation, but doesn’t that depend what is being compared with what? In fact, engineers calibrating measuring instruments would be appalled if correlation dropped as low as 0.99 between two instruments that were supposedly measuring the same thing.

Fortunately one CEM report illustrates the problem very clearly in the form of a chances table, presented in the form of a fictitious class with a wide range of initial scores. For each initial score, it shows what percentage of children with that score subsequently gain Level 2, 3, 4, 5 or 6 at KS2 maths, for example. This is based on the real outcomes of children who had already taken the test.

It shows that “Samantha” (third lowest) has a 4% chance of a Level 2, a 61% chance of a Level 3, a 33% chance of a 4 and a 3% chance of a 5. By contrast, Rachel (third highest) has an 8% chance of a 3, a 49% chance of a 4, and a 42% chance of a 5.

As you would expect, children with low initial scores are more likely to get low KS2 results than children with high initial scores, and vice versa. That seems obvious. But notice the very wide range of KS2 results linked to each initial score.

Nationally around half of children get a Level 4, so an initial score which links to level 3 and level 5 as well as level 4 represents a very wide spread. The national spread of KS2 results looks something like this:

The problem is even clearer around the middle of the class. 17% of the children with the same initial score as “Bethany” went on to score Level 3 in KS2 SATs, 56% of them scored Level 4 and 27% of them scored Level 5. In fact, most of the initial scores link to KS2 outcomes stretched across well over half the population, and some across three quarters.

Extract from CEM’s chances table, showing a very low scoring, an average, and a very high scoring pupil.

Is this really “excellent predictive validity”, or are baseline tests the educational equivalent of a sawn-off shotgun? If you point a sawn-off shotgun towards the east, most of its pellets go east rather than west, but that hardly makes it a refined piece of technology. The baseline tests are no more reliable a predictor of children’s future attainment than a quick glance at the state of their shoes.

In fact, the above chances table is for a test taken at the start of Year 5 with KS2 SATs. The match between the start of Reception and KS2 SATs could be even worse.

This is not to say that CEM are incompetent. In fact, they are probably the best show in town. But that doesn’t make the tests fit for purpose. Indeed, if such an expert organisation can produce something so uncertain, the whole Government policy is in shreds.

The big danger is that teachers will believe they are accurate and useful. Admittedly some of these companies advise schools to complement them with other data, but the scientific appearance of statistical correlation, along with the marketing claim of “excellent predictive validity”, will convince many overworked teachers that they’ve found the holy grail. They will genuinely believe –not surprisingly – that they now have an accurate scientific measure of each child’s “ability” and “potential”. Furthermore, when they start to teach accordingly or place children in “ability groups”, the baseline assessments of 4 year olds will become a self-fulfilling prophecy.

Thanks to government policy, schools are again being pushed down the old track of determinism which has blighted English schools for over a century. The test scores of thousands of 4-year-olds will severely restrict their future achievement. The least confident four-year-olds, often children growing up in poverty, will be labelled “low ability” and “limited potential” from the start.

Leave a comment Cancel reply

Latest Blog Posts

Follow us on Twitter

Tags

Categories

Follow Blog via Email

Follow Blog via Email