In a previous blog, I analysed the strange case of a non-existent paper entitled ‘The Art of Writing a Scientific Paper’, and the small number of papers that used it to support the claim that rutin has health benefits. From the attention that this story of ‘the phantom reference’ received on twitter, I see two primary reasons why this story has spread so far and so fast.
The first is that the phantom’s citations make a mockery of citation impact metrics. The idea that a non-existent paper can be cited more than your entire body of work jars with a common but flawed assumption that citations are a measure of quality. If you were worried that your University might not promote you because you haven’t been cited enough, then perhaps you should start worrying that phantoms are about to take your job.
The second was a concern about peer-review. How could a phantom reference escape the attention of reviewers? How could scientists be so careless in their referencing? In this case, as Anne-Wil Harzing found, most citations to this phantom were likely due to a simple reference list error, where scientists had used a reference style template that contained this fake reference but had either forgotten to remove it or had taken it as something they ought to cite . My small contribution to this story was that, in 13 papers, this phantom had come to be used to support the claim that rutin has certain health benefits – it was a short chain of papers, but one that indicated how errors could be propagated by copying references from one paper to another without actually reading them .
There was also a very interesting discussion about why Google Scholar was recording so many citations to this phantom – with a total citation count widely discordant to that produced by Web of Science (I checked this today, and found a total of 527 citations on Web of Science and 1,163 on Google Scholar). I don’t think Google Scholar should be used to assess citation impact. But the many errors researchers were finding here were strange and need to properly investigated, including papers identified by Google as citing this phantom that had done no such thing.
This was a funny example, but there are worrying examples that suggest that reference copying might be common.
Citation and the distortion of meaning
In 2010, Andreas Stang  published a critique of the Newcastle–Ottawa scale (NOS) , a scale introduced by Wells et al. to assess the quality of non-randomised studies for the purposes of meta-analysis. He came to an unequivocally critical conclusion:
[W]ells et al. provide a quality score that has unknown validity at best, or that includes quality items that are even invalid. The current version appears to be inacceptable for the quality ranking of both case-control studies and cohort studies in meta-analyses. The use of this score in evidence-based reviews and meta-analyses may produce highly arbitrary results (2010 p.6).
Stang (2010) also highlighted that the NOS scale had not yet been published in the peer-reviewed literature, and that citations to it were directed to a web page hosted by the Ottawa Health Research Institute .
Eight years later, Stang et al. (2018)  published a second paper calling attention to the widespread inaccurate citation to ‘Stang (2010)’. In the intervening years, ‘Stang (2010)’ had become the ‘go to’ reference for scientists seeking something to reference when they used NOS in their publications – accumulating, by 2016, 1,250 citations.
To understand this, Stang et al. read systematic reviews that had used the NOS scale, and had cited ‘Stang (2010)’. In 94 of 96 systematic reviews, ‘Stang (2010)’ was cited in a manner that suggested it supported use of the NOS – and only 18 of these systematic reviews provided any other reference in support of its use. Many of the citing papers cited other papers that cited ‘Stang (2010)’, and Stang et al. conjectured that, through chains of (mis)citation, ‘Stang (2010)’ had acquired a meaning that was opposite to its original content.
“…the vast majority of indirect quotations of the commentary have been misleading. It appears that authors of systematic reviews who quote the commentary most likely did not read it” (p.1030).
Stang et al. proposed that the odd publication history of the NOS scale might have played a role in this case. Wells et al. account of the NOS was never published as a peer-reviewed paper, but was hosted on a webpage, while ‘Stang (2010)’ was published in the European Journal of Epidemiology under the neutral title ‘Critical evaluation of the Newcastle-Ottawa scale for the assessment of the quality of nonrandomized studies in meta-analyses’. As it turns out, this was the first paper to be published with the term “Newcastle-Ottawa” in the title, so authors searching for a paper to cite as the NOS paper might have mistakenly believed this paper, being the first with the term in the title, might have been the paper introducing the scale. As ‘Stang (2010)’ wasn’t indexed with an abstract, authors couldn’t quickly see what this commentary was arguing, so might have simply assumed that it was supportive.
A (very) brief examination
By October 10 2020, ‘Stang (2010)’ has been cited 5,218 times according to Web of Science .
Here, I’m going to just examine the five most recent publications published in different journals to see how ‘Stang (2010)’ is being cited.
The most recent is a meta-analysis by Xu et al.  in Gene (2020):
“Two investigators independently assessed the quality of the included literature using the Newcastle-Ottawa Scale (NOS) scoring system (Stang, 2010)” (p.2)
Next, is a systematic review and meta-analysis by Zeng et al.  in Nutritional Neuroscience:
“We used the Newcastle-Ottawa scale (NOS) for methodological quality assessment of cohort and case–control studies ” (p.812; 16 is Stang 2010)
Wu et al.  published a systematic review and meta-analysis in Infection, Genetics and Evolution:
“The quality of studies was adapted to assess based on the Newcastle–Ottawa Scale (NOS) (Cota et al., 2013). All included studies were rated based on three main domains, including selection, comparability and exposure. The supreme score for each dimensions was 4 points, 2 points and 3 points, respectively (Stang, 2010)” (p.3)
Here, Stang is used to elaborate on the NOS scale’s details, but his criticism is not mentioned. Cota et al. (2013)  is another systematic review that uses NOS, and this did reference Wells et al.
Next is Dhindsa et al. , a systematic review and meta-analysis in Endoscopy International Open
“The collected data were treated akin to single group cohort studies, therefore, we used the Newcastle-Ottawa scale for cohort studies to assess the quality of studies ” (p. e1334; 21 is Stang 2010)
This is followed by Mazza et al.  systematic review and meta-analysis in ACTA NEUROPSYCHIATRICA:
“Quality of eligible observational studies was assessed using the Newcastle Ottawa Scale (NOS) for case–control studies (Stang, 2010)” (p. 231)
It seems that papers are still citing ‘Stang (2010)’ inappropriately. The attempt by Stang et al. (2018) to combat this erroneous interpretation appears to have fallen on deaf ears – since 2018, Stang et al.’s analysis of this misquotation has been cited only six times (as of October 12 2020, Web of Science).
Whether or not this kind of referencing error is causing problems in the way scientists are applying NOS is beyond the scope of this blog. However, one does wonder why Stang 2010 is referenced by authors who do not seem to have read it. If they have read it, then why have they omitted the major argument of that paper from their own discussion?
We know reference copying happens in the literature because of the repetition of error, but we do not know how prevalent this practice is [13-14]. If authors copy references and interpretations from other studies without reading the original papers but make no obvious errors, how would anyone know that they had done this? We cannot assume that just because a paper is highly cited that those who cite it have actually read it.
The point of highlighting this example is to show that second hand information about a particular study or finding can be distorting. If you don’t read the original and you simply take the interpretation of later writers as an accurate description of it, you are going to quickly run into problems of citation distortion. The number of citations to an article rarely tell us what we think it should. We tend to assume this number reflects some homogenous influence on the literature, but often this single number conceals a complex reality.
Author: Rhodri Leng | Date: 12/10/2020
1. Harzing, A. (2017) “The mystery of the phantom reference”. Blog post. Stable URL: https://harzing.com/publications/white-papers/the-mystery-of-the-phantom-reference. Accessed: 12.10.2020
2. Leng, RI (2020) “The phantom reference and the propagation of error”. Blog post. Stable URL: https://www.the-matter-of-facts.com/post/the-phantom-reference-and-the-propagation-of-error. Accessed: 12.10.2020
3. Stang A. (2010). Critical evaluation of the Newcastle-Ottawa scale for the assessment of the quality of nonrandomized studies in meta-analyses. Eur J Epidemiol. 2010 Sep;25(9):603-5. doi: 10.1007/s10654-010-9491-z. Epub 2010 Jul 22. PMID: 20652370.
4. Wells, GA., Shea, B., O'Connell, D., et al. (Online). The Newcastle-Ottawa Scale (NOS) for assessing the quality of nonrandomised studies in meta-analyses. Hosted on the Ottawa Hospital Research Institute website. Stable URL: http://www.ohri.ca/programs/clinical_epidemiology/oxford.asp. Accessed 12/10/2020
5. Stang A, Jonas S, Poole C. (2018). Case study in major quotation errors: a critical commentary on the Newcastle-Ottawa scale. Eur J Epidemiol. 33(11):1025-1031. doi: 10.1007/s10654-018-0443-3. Epub 2018 Sep 26. PMID: 30259221.
6. Search performed via Web of Science ‘Cited Reference search’ and included three reference variants of Stang 2010.
7. Xu, T., Zhang. S., Qiu, D., et al. (2020). Association between matrix metalloproteinase 9 polymorphisms and breast cancer risk: An updated meta-analysis and trial sequential analysis. Gene. 2020 Oct 30;759:144972. doi: 10.1016/j.gene.2020.144972. Epub 2020 Jul 30. PMID: 32739585.
8. Zeng, Y, Tang, Y, Tang, J, et al. (2020) Association between the different duration of breastfeeding and attention deficit/hyperactivity disorder in children: a systematic review and meta-analysis. Nutr Neurosci. 2020 Oct;23(10):811-823. doi: 10.1080/1028415X.2018.1560905. Epub 2018 Dec 21. PMID: 30577717
9. Wu BB, Gu DZ, Yu JN., (2020). Association between ABO blood groups and COVID-19 infection, severity and demise: A systematic review and meta-analysis. Infect Genet Evol. 2020 Oct;84:104485. doi: 10.1016/j.meegid.2020.104485. Epub 2020 Jul 30. PMID: 32739464; PMCID: PMC7391292.
10. Cota, GF., de Sousa, MR., Fereguetti TO, et al. (2013). Efficacy of anti-leishmania therapy in visceral leishmaniasis among HIV infected patients: a systematic review with indirect comparison. PLoS Negl Trop Dis. 2013 May 2;7(5):e2195. doi: 10.1371/journal.pntd.0002195. PMID: 23658850; PMCID: PMC3642227.
11. Dhindsa, BS., Saghir, SM., Naga, Y., et al. (2020). Efficacy of transoral outlet reduction in Roux-en-Y gastric bypass patients to promote weight loss: a systematic review and meta-analysis. Endoscopy international open, 8(10), E1332–E1340. https://doi.org/10.1055/a-1214-5822.
12. Mazza, MG, Capellazzi, M, Lucchi S. et al. (2020). Monocyte count in schizophrenia and related disorders: a systematic review and meta-analysis. Acta Neuropsychiatr. 2020 Oct;32(5):229-236. doi: 10.1017/neu.2020.12. Epub 2020 Mar 17. PMID: 32178747.
13. Simkin, MV., Roychowdhury, VP. (2003). Read before you cite! Complex Syst. 14:269–74
14. Ioannidis, JPA. (2018). Massive citations to misleading methods and research tools: Matthew effect, quotation error and citation copying. Eur J Epidemiol. 2018 Nov;33(11):1021-1023. doi: 10.1007/s10654-018-0449-x. Epub 2018 Oct 6. PMID: 30291530.