The delusions of baseline testing

This post updates our earlier analysis of baseline testing, following a successful Freedom of Information request. It is clear that baseline tests are likely to make correct predictions of a child’s later achievement in about 4 cases out of every 10.

confused four year old 2

The idea that baseline testing at the age of 4 is a fair predictor of later achievement is not only delusional, it is extremely dangerous. The scientific aura surrounding such tests will lead teachers to believe they reflect a child’s true ability or potential. Schools and teachers are already expected to provide teaching appropriate to a child’s “ability”, including placing children in “ability groups”. Countless children will be written off as “not very bright” because of their low scores on baseline tests. In many cases, this will be because they are in their first weeks at school, are slower to develop (including many boys), disadvantaged, don’t speak much English yet, or are simply too young (summer births).

The Department for Education know full well that early assessment is a very poor predictor of school achievement, on the basis of existing Early Years Foundation Stage (EYFS) profiles. Their own research shows clearly that only half of the variation between pupils’ KS1 results can be explained by scores in their EYFS assessments.

Prediction is relatively strong from very low or very high baseline scores, but the vast majority of children diverge widely. For children with an average 5 points out of 9 in early reading in the EYFS profile, 21% went on to obtain level 1, 22% level 2c, 31% level 2b, and 25% levels 2a or 3. In other words, from a single starting point, they divide more or less equally into four different outcome levels.

CLLreadingscale RR034

Predictions will be even poorer when assessing children at the start of Reception rather than the end, as required by the new Baseline Assessment. Since Early Excellence (the most popular provider) will use the same observational methods for its version, but with a cruder method for calculating scores, their results are unlikely to be any better.

Of the three approved providers, CEM are by far the most experienced. In fact, they have been refining their tests for over 20 years. According to a research study by Peter Tymms and colleagues from CEM in 2012, their baseline assessment “is associated with around 50% of the variance in the outcome measures of pupils leaving primary school at age 11 years.” They add that it is “not sufficiently good to identify children with special needs with any degree of surety. The proportion of false positives is around two-thirds.”

Our freedom of information request to CEM has confirmed the extent of the problem. (They add the rider that their figures are “derived from a model” rather than an exact record, but this is the best data they have been able to provide from extended use of the PIPS Baseline test.)

Their data illustrates what claims of “excellent predictive validity”, with correlations of around 0.7, mean in practice. The estimates provided by CEM shows that predictions from baseline are likely to be correct about 4 times out of 10. This is based on a data model relating scores at the end of Reception to KS1 results in reading. As with the DfE research (above), prediction is more reliable from very high and very low baseline scores, but poor around the middle range (i.e. the scores of the vast majority of children).

The wide divergence makes the baseline test next to useless. In reading, for example, starting from the baseline score with the highest chance of Level 2b at KS1, only 32% actually get that level. 16% of children with that baseline score get 1 or below, 21% get 2c, 22% get 2a, and 9% get level 3. Baseline assessments taken at the start of Reception and extending to KS2 are likely to be even less reliable.

The other providers cannot even provide an estimate of their ability to make reliable predictions. Early Excellence merely points to a vague overall similarity at school level: schools with good KS1 results tend to get good baseline scores  and vice versa, but the match at pupil level is, to say the least, approximate.

Example: Baseline and KS1 patterns for a low attaining school


The other provider NFER simply point to research papers on early assessment in general, but a close reading of these papers shows that predictions from even the most careful assessments as early as Reception or Year 1 are not very good predictors of skills in Year 2.

Our criticism is not of the competence of organisations providing baseline assessment, but of this deeply flawed government policy. Misleading assessment scores are particularly likely in the case of boys (who often develop more slowly), children with English as an additional language or affected by poverty, and the younger children in each school year (those with birthdays in the summer months). This should ring alarm bells for parents: their children are at risk of being inappropriately scored, labelled and predicted, and will be damaged by teaching based on flawed estimates of their “ability” or “potential”.

