UK’s Boast of Enhanced Research Excellence ‘Lacks Credibility’ (Part 2 of 2)

(November 20th, 2015) The UK’s periodic evaluation of its university research has triumphantly reported a doubling of its top-class research results during the previous 6 years, but a reanalysis of the data has found a much lower result.

Wooding et al. said that some of this increase in relative quality might be explained by the increased selectivity of university submissions - there were 9% fewer articles submitted to Panel A in REF 2014 compared to RAE 2008. However, the absolute number of journal articles in the highest percentiles were still higher in REF 2014, with a 10% absolute increase in papers rated in the top 10%. Meanwhile, the citation equivalent of 4-star and 3-star articles had fallen between RAE 2008 and REF 2014. In RAE 2008, as many articles were classified as 4-star as were in the top 4.4% of world outputs; by REF 2014, this threshold had dropped to 7.3% of world outputs.

The 50,000-odd articles submitted to Panel A of REF 2014 represented only a sixth of the UK’s overall research output in medical and life sciences. To get a “complete view” of the change in quality and quantity, Wooding et al. looked at the change in total volume and quality of the UK outputs compared to the rest of the world. RAE 2008 considered research published from 2001-08. During that period, the UK published 8.7% of the world output in Medical and Life Sciences (288,327 journal articles out of a world total of 3,311,114). For the REF 2014 period (2008-14), the UK’s contribution was 7.7% (299,628 UK journal articles vs. world total of 3,870,031). The bibliometric analysis indicated that the UK’s contribution to the top 10% of world papers had increased by 17% during this period.

Discrepancy causes

Therefore, although Wooding et al. did find bibliometric evidence for some increase in the quality of UK research between the periods covered by RAE 2008 and REF 2014, they noted a “remarkable disparity” between the level of improvement indicated by bibliometric indices (between 10 to 25% depending on indicator) and panel-rated improvement of 103% in “world leading” (4-star) outputs. They proceeded to look for possible explanations.

Three possibilities they considered were:

  1. an increase in research funding
  2. a lower threshold for inclusion as “world class”
  3. a shift in the relative threshold between RAE 2008 and REF 2014

Since 2006, the National Institute of Health Research (NIHR) had increased funding in the area of health sciences by £6 billion. However, Wooding et al. point out that other subjects in Panel A that had received no increased funding from NIHR (e.g. Biological Sciences and Food/Veterinary Science) showed an even higher increase in self-rated world-leading 4-star quality (129% and 165% respectively). Therefore extra funds did not seem sufficient to account for the boost in quality assessment.

The UK’s share of the world’s research articles in Medical and Life Sciences actually fell from 8.7% to 7.7% between RAE 2008 and REF 2014. But this was not due to a decrease in the absolute number of articles published but rather the faster increase in publications from other countries. Despite this, the UK seems to have “held its own in terms of world-leading outputs” but none of these findings are compatible with the nearly “doubling” of world-leading quality reported in REF 2014.

Therefore, Wooding et al. concluded that, at least when compared to bibliometric indicators, the most likely explanation is that the REF 2014 Panel A used a “lower threshold of acceptance” than RAE 2008 when judging research articles as top class (4-star).

And there is a very good reason why this might have happened - the results of these research assessment exercises determine the allocation of very large amounts of money – “£1.6 billion per annum hinges on these results”! Although they may not directly affect a given individual, the REF results have “huge implications” for the relative standing of fields, the research funding of universities, and funding allocated within universities to different research groups. Therefore, it is “understandable” that the REF evaluations “are likely to be influenced” by these external factors - a better note increases your chances of more funding!

In conclusion, using bibliometric indicators of the papers submitted to REF 2014 and UK medical and life sciences output in general, Wooding et al. did not find any support for the claim in REF 2014 that there had been a doubling in the “world leading” quality of UK life sciences. Instead, they found it “plausible that changes in the financial consequences of the RAE vs. REF exercise may have influenced university submission behaviour or panel judgements.” Unfortunately, the peer-review ratings of individual papers were not made public so they cannot test this “possibility” and confirm their suspicions.

However, they insist that the “discrepancy we have highlighted is of concern”
because these REF ratings have lots of “implications” for rankings of UK departments within a research field, inter-field comparisons within the UK, and all kinds of claims regarding the position of UK science in the context of worldwide output.

Ongoing Criticism of Research Assessment Exercises and REFs

Corresponding author Jonathan Grant, who conceived and designed the experiments, is director of the Policy Institute at King’s College London. He told Times Higher Education that the grade inflation they had found in the REF “makes it harder to distinguish between different degrees of ‘excellence’ within the 4-star category” and said that this “potentially means the performance-related funding aspects are undermined and the incentive to grow true quality is diluted. Being in the top 4.4 or 7.3 per cent [by citation impact] is clearly very high performance...but to claim that the amount of ‘world-leading’ research has doubled over that period lacks credibility and validity internationally”.

David Colquhoun summed up other criticisms of the UK’s research assessment system noting that it takes a lot of time and money, doesn’t seem to find much, and creates distorting pressures within universities: – “It cost at least £60 million. At University College London alone, it took 50-75 person-years of work, and the papers that were submitted were assessed by people who often would have no deep knowledge about the field. It was a shocking waste of time and money, and its judgements in the end were much the same as last time.”

He disputed the argument that the REF improved the UK’s science output – “The people who claim this need a course in the critical assessment of evidence. Firstly, there is no reason to think that science has improved in quality in the last 6 years, and secondly any changes that might have occurred are hopelessly confounded with the passage of time, the richest source of false correlations.” Instead David Colquhoun points to extensive evidence that the REF has “harmed science by encouraging the perverse incentives that have done so much to corrupt academia.” The REF, and all the other university rankings produced by journalists, are taken far too seriously by vice-chancellors and that does active harm. As one academic put it :- “This isn’t about science – it’s about bragging rights, or institutional willy-waving!” (‘willy’ is slang for ‘penis’).

To achieve better research scores in REF 2014, there were numerous reports of “bullying” as university managers demanded better “performance” data from their academics, e.g. at Queen Mary University London (QMUL) researchers in biology and medicine were sacked when they failed to meet inflated research metrics (see DC’s Improbable Science, 29/06/12; Lab Times, 4/2012 p.20-25; and Lab Times online, 4/07/12), and the tragic suicide of Professor Stefan Grimm at Imperial College London was directly linked to the threat he would lose his job if he failed to obtain £200,000 a year in research grants (see DC’s Improbable Science, 1/12/2014).

Colquhoun said that the REF has added to the pressures resulting from the culture of metrics and the mismeasurement of science. “There are now serious worries about lack of reproducibility of published work, waste of money spent on unreliable studies, publication of too many small under-powered studies, bad statistical practice (like ignoring the false discovery rate), and about exaggerated claims by journals, university PR people and authors themselves.”

Instead, he suggests quality-related research (QR) funding could be simply allocated on the basis of the number of people in a department and points to an analysis by Dorothy Bishop, Professor of Developmental Neuropsychology at Oxford University, who has shown that under the present system the amount of QR money received is strongly correlated with the size of the department (correlation coefficient = 0.995 for psychology/neuroscience). “Using metrics produces only a tiny increase in the correlation coefficient for RAE data. It could hardly be any higher than 0.995.” In other words, he concludes “after all the huge amount of time, effort and money that’s been put into assessment of research, every submitted researcher ends up getting much the same amount of money.”

Faced with lowering budgets for research and higher education, can the UK continue to justify spending so many resources on expensive quality assessment exercises when the final outcome ‘lacks credibility’? Moreover, this has consequences for other countries who have followed the UK’s lead and introduced their own national frameworks for evaluating the research output of universities as a basis for funding decisions. The European Commission’s 2010 report on ‘Assessing Europe’s University-Based Research’ identified 13 other countries – Australia, Belgium, Denmark, Finland, France, Germany, Italy, the Netherlands, New Zealand, Norway, Portugal, Spain, and Sweden.

Jeremy Garwood

Picture: Fotolia/studiostoks

Last Changes: 12.21.2015