Web of Science and Scopus are not global databases of knowledge

Both Web of Science and Scopus are critical components of the current research ecosystem, providing the basis for university and global rankings as well as for bibliometric research. However, both platforms are structurally biased against research produced in non-Western countries, non-English language research, and research from the arts, humanities, and social sciences. This viewpoint emphasizes the damage that these systematic inequities inflict upon global knowledge production systems and the need for research funders to unite to form a more globally representative, non-profit, community-controlled infrastructure for the global pool of research knowledge. VIEWPOINT Tennant JP. Web of Science and Scopus are not global databases of knowledge. European Science Editing 2020;46. DOI: 10.3897/ ese.2020.e51987 2 of 3 In terms of language preference, we see much the same pattern. As Ryan Regier stated in 2018: “For example, Scopus indexes more than 23,000 currently-published scholarly journals and only about 5,000 of these publish in languages other than English. (You can find this data in the Scopus Title List). “Compare that with the Directory of Open Access Journals (DOAJ), who does a much better job with outreach in nonEnglish countries, of their 11,000 journals, 6,000 publish in languages other than English. They have less than half the journals Scopus does and they still have more non-English Journals!” A more recent study from 2019 confirmed the extent of this bias: “English dominates both WoS and Scopus (92.64% of the documents indexed in Scopus are in English and this percentage is even higher in the WoS with 95.37% compared to the second language with the highest number of documents in Scopus, Chinese, with 2.76% and the second language in WoS, Spanish, with 1.26%).”8 Given that most nations on this planet do not speak English as a first language, unless imposed by colonialism, this clearly represents a hegemonic linguistic practice. This has consequences. First, non-native English-speaking authors may be required to spend part of their research budgets, as well as a significant amount of time and effort, on bringing their work to an ‘acceptable’ standard of English, a problem exacerbated by the exploitative publish-or-perish system. Although there are some clear advantages in homogenizing the language of science, such a practice simultaneously erodes the social and cultural contexts inherent in non-English languages. Clarivate Analytics have taken some positive steps to broaden the scope of WoS, integrating the SciELO citation index and also through the creation of the Emerging Sources Index (ESI), which has allowed the inclusion of many more international titles into the service. Given the rich history of knowledge production in Latin America, the fact that ESI refers to this as the ‘emerging sources’ seems to display a clear prejudice: they are emergent only in the sense that WoS did not index them earlier, not because the research did not exist or was not well established. Together, these numbers highlight quite a mismatch between the industry leaders and the global reality, with a number of structural geographic discrepancies. In other parts of the world, this mismatch seems to be less of a problem. In western Europe, Scopus lists 11,000 journals, and 6000 from North America. Other studies have also shown that these services tend to under-represent research from the arts, humanities, and social sciences.9 Combine this with the linguistic and geographic biases inherent in either database, and it seems difficult to argue that this a quality issue. One of the reasons for this skew might simply be that journals from the excluded regions are of varying or lower ‘quality’. However, it is well known that legitimate publishing activities from the ‘global south’, such as Academic Journals, have an ethnocentric prejudice focused on them. Much of this perspective was created thanks to the infamous, and now largely academically discredited, ‘predatory publishing’ list.10 However, this remains a pernicious issue at the heart of the global scholarly communication system. First of all, defining the quality of both research and journals is beset with difficulties. A recent consensus definition on what constitutes a ‘predatory publisher’ did not seek to explicitly delimit legitimate and illegitimate publishing activities, but did highlight the complexity of this situation;11 by these standards, even the largest publishers such as Elsevier could be classified as predatory. Irrespective of these problems, the selection criteria for journals and the power imbalance they impose upon the global research community should not be left to commercial third-parties such as WoS or Scopus. It should be the research community, including learned societies and institutes, who makes quality assessments and defines the standards for journals and scholarly communication. WoS and Scopus are both commercial and for-profit services that, irrespective of their methods, have a fiduciary duty and accountability to their shareholders and investors—not a duty to science or to the public. The reality is that the global research community has outsourced the critical functions of acting as custodians for our scholarly ecosystem to a handful of private companies. And not just that, but organizations with an incredible track record of harm to the scholarly community.12 Both are shining beacons in the world of ‘platform capitalism’ with business models that not only predate but also outperform those of Google and Facebook in terms of profit.13 It is perhaps no wonder that these databases are so biased towards research from western Europe and North America. WoS is owned by Clarivate Analytics, based in London, and Scopus by Elsevier, based primarily in Amsterdam and London. Their geographic location is not a coincidence here: they are two of the centres of historical Western colonialism, the aftershocks of which are still being widely felt. Widespread use of both services continues to reinforce a Western hegemony in global scientific endeavours. The result of this is suppression of global innovation through reducing epistemic diversity in participation in the research process and relegating specific forms of knowledge to the ‘periphery’ as a form of cognitive injustice. Exclusion of non-Western knowledge from these databases dictates what we read, what we value, and we build upon. This in turn discriminates against the contexts of knowledge generation in those places, including invaluable cultural perspectives. For example, in Latin America, there is an incentive policy in higher education that aims to foster increasing internationalization. This is understood as publishing in WoSand Scopus-indexed journals. A consequence of this is that some researchers no longer focus on ‘local’ topics because they would not get published in ‘international’ journals. Virtually all journals publish important work at different levels and are relevant to different elements of society. That this is supressed in systematic ways against specific forms of knowledge by WoS and Scopus should be a matter for concern. I do not believe that either entity should be in control of defining what research is deemed significant or not significant, given that they intrinsically have little scientific interest in this matter. This trend will continue as both Clarivate Analytics and Elsevier continue to extend their control over critical elements of scholarly infrastructure.14 The fact that Elsevier is also the leading publisher of scholarly content is not a coincidence, and amounts to one of the most significant conflicts of interest in the world.15 The very fabric of our knowledge society is in the ongoing process of being handed over to for-profit enterprises. Tennant JP. Web of Science and Scopus are not global databases of knowledge. European Science Editing 2020;46. DOI: 10.3897/ ese.2020.e51987 3 of 3 Knowledge infrastructure, academic cultures, and research practices have become subject to the maximization of profits in the interests of a few and at the expense of everyone else. The first simple step to resolve this problem is to simply stop all research from using either platform, and for users and institutes to stop subscribing to them. By deconstructing or reducing the power of existing faulty elements of the scholarly ecosystem, funds and energy can be liberated and put to use in investing in a more open, non-profit, community-owned global scholarly communication infrastructure, which provides more efficient, effective, and representative information on the global knowledge landscape. This born-digital infrastructure should be truly equitable, comprehensive, and multilingual, facilitating fair participation in knowledge creation. Decolonizing scholarly communication is not something simple, and the primary focus must be on creating inclusive digital infrastructure that does not replicate the hegemony inherent in the present systems. This requires synchrony in both reducing the power status of those existing systems and simultaneously amplifying other voices that have previously been marginalized to occupy such a space. International research funding bodies have a key responsibility here to unite to help achieve this in the context of the UN Sustainable Development Goals. This promises to address the geopolitical impact that existing systematic discrimination has on knowledge production16 and to further the inclusion and representation of marginalized research demographics within the global research landscape. Acknowledgements Thank you to Andy Nobes, Bárbara Rivera-López, and Asura Enkhbayar for valuable discussions and input to an earlier version of this article.

There has never been a more pressing need for science to be working to address the major challenges that we face as a global society, as envisaged by the UN Sustainable Development Goals. 1 The earth's climate is changing catastrophically and irreversibly; we are in the midst of yet another global pandemic 2 ; and we are facing resource distribution inequities like never before in the face of a booming global population. These are global challenges that affect us all, and therefore we need to ensure that the science we are using to help address them is truly globally representative. 3,4 Clarivate Analytics' Web of Science (WoS) and Elsevier's Scopus platforms are synonymous with data on international research. Both are widely considered by the scholarly community to be the two most trusted or authoritative sources of bibliometric data and form the basis for virtually all peerreviewed knowledge on research across different disciplines. Figure 1 is a cartogram, developed by Juan Pablo Alperin and Rodrigo Costas, which shows the world scaled in proportion to the number of publishing researchers per country. The publication data are from Scopus, and reveal an alarmingly warped version of reality: research from Africa, South America, and major parts of Asia is almost non-existent. As the largest of their kind, WoS and Scopus are often hailed as 'global' databases of knowledge and used widely for bibliographic research and academic assessments. This includes the creation of global higher education rankings or their adoption in tenure and promotion guidelines; WoS, for example, features prominently in the UK Research Excellence Framework 2021 and in international league tables, and bibliographic data from Scopus represent more than 36% of assessment criteria in the popular Times Higher Education world university rankings.
Cameron Neylon and others criticize these rankings, demonstrating that the data sources underpinning them are heavily biased and incomplete. 5,6 This seems strange, given that the current global scholarly ecosystem imposes a critical dependence on them. Neither of the two databases seems to do a fair, precise, or even reasonable job of being unbiased or globally representative; instead, both platforms seem to discriminate against different forms of knowledge, particularly that which does not hail from the English-speaking Western world.
In general, Scopus is larger and geographically broader than WoS; however, Scopus covers only a fraction of journal publishing outside of Europe and North America. This discrimination is especially visible in the case of Asia. As of August 2017, Scopus reports a coverage of over 2000 journals in the Asia-Pacific region, which it boasts as being '230% more than the nearest competitor' . Now, this might seem impressive, until you actually look at the data. In Indonesia alone, now the top country in the world for Open Access (OA) publishing, 7 the national government's Garuda portal currently lists more than 9000 journals published in the country-more than four times what Scopus indexes for the entire Asia-Pacific region. Similarly, in Japan, nearly 3000 journals are currently listed on the national J-Stage platform.
Scopus currently lists 750 journals from Africa and the Middle East, 212% more than the nearest competitor, whereas African Journals Online alone indexes 524 journals; the Directory for Arabian Journals lists 319 journals; the Algerian Scientific Journal Platform indexes 510 journals; and Iraqi Academic Scientific Journals lists 272 OA journals.
In Latin America, we see much the same story. Scopus claims to list about 700 journals from this region, 168% more than the nearest competitor. SciELO, which has been providing open infrastructure for Latin American health science journals since 1997, lists more than 1700 active journals at present. Redalyc also supports more than 550 social sciences journals in Latin America.

Web of Science and Scopus are not global databases of knowledge
In terms of language preference, we see much the same pattern. As Ryan Regier stated in 2018: "For example, Scopus indexes more than 23,000 currently-published scholarly journals and only about 5,000 of these publish in languages other than English. (You can find this data in the Scopus Title List).
"Compare that with the Directory of Open Access Journals (DOAJ), who does a much better job with outreach in non-English countries, of their 11,000 journals, 6,000 publish in languages other than English. They have less than half the journals Scopus does and they still have more non-English Journals!" A more recent study from 2019 confirmed the extent of this bias: "English dominates both WoS and Scopus (92.64% of the documents indexed in Scopus are in English and this percentage is even higher in the WoS with 95.37% compared to the second language with the highest number of documents in Scopus, Chinese, with 2.76% and the second language in WoS, Spanish, with 1.26%). " 8 Given that most nations on this planet do not speak English as a first language, unless imposed by colonialism, this clearly represents a hegemonic linguistic practice. This has consequences. First, non-native English-speaking authors may be required to spend part of their research budgets, as well as a significant amount of time and effort, on bringing their work to an 'acceptable' standard of English, a problem exacerbated by the exploitative publish-or-perish system. Although there are some clear advantages in homogenizing the language of science, such a practice simultaneously erodes the social and cultural contexts inherent in non-English languages.
Clarivate Analytics have taken some positive steps to broaden the scope of WoS, integrating the SciELO citation index and also through the creation of the Emerging Sources Index (ESI), which has allowed the inclusion of many more international titles into the service. Given the rich history of knowledge production in Latin America, the fact that ESI refers to this as the 'emerging sources' seems to display a clear prejudice: they are emergent only in the sense that WoS did not index them earlier, not because the research did not exist or was not well established.
Together, these numbers highlight quite a mismatch between the industry leaders and the global reality, with a number of structural geographic discrepancies. In other parts of the world, this mismatch seems to be less of a problem. In western Europe, Scopus lists 11,000 journals, and 6000 from North America. Other studies have also shown that these services tend to under-represent research from the arts, humanities, and social sciences. 9 Combine this with the linguistic and geographic biases inherent in either database, and it seems difficult to argue that this a quality issue.
One of the reasons for this skew might simply be that journals from the excluded regions are of varying or lower 'quality' . However, it is well known that legitimate publishing activities from the 'global south' , such as Academic Journals, have an ethnocentric prejudice focused on them. Much of this perspective was created thanks to the infamous, and now largely academically discredited, 'predatory publishing' list. 10 However, this remains a pernicious issue at the heart of the global scholarly communication system. First of all, defining the quality of both research and journals is beset with difficulties. A recent consensus definition on what constitutes a 'predatory publisher' did not seek to explicitly delimit legitimate and illegitimate publishing activities, but did highlight the complexity of this situation; 11 by these standards, even the largest publishers such as Elsevier could be classified as predatory.
Irrespective of these problems, the selection criteria for journals and the power imbalance they impose upon the global research community should not be left to commercial third-parties such as WoS or Scopus. It should be the research community, including learned societies and institutes, who makes quality assessments and defines the standards for journals and scholarly communication. WoS and Scopus are both commercial and for-profit services that, irrespective of their methods, have a fiduciary duty and accountability to their shareholders and investors-not a duty to science or to the public. The reality is that the global research community has outsourced the critical functions of acting as custodians for our scholarly ecosystem to a handful of private companies. And not just that, but organizations with an incredible track record of harm to the scholarly community. 12 Both are shining beacons in the world of 'platform capitalism' with business models that not only predate but also outperform those of Google and Facebook in terms of profit. 13 It is perhaps no wonder that these databases are so biased towards research from western Europe and North America. WoS is owned by Clarivate Analytics, based in London, and Scopus by Elsevier, based primarily in Amsterdam and London. Their geographic location is not a coincidence here: they are two of the centres of historical Western colonialism, the aftershocks of which are still being widely felt.
Widespread use of both services continues to reinforce a Western hegemony in global scientific endeavours. The result of this is suppression of global innovation through reducing epistemic diversity in participation in the research process and relegating specific forms of knowledge to the 'periphery' as a form of cognitive injustice. Exclusion of non-Western knowledge from these databases dictates what we read, what we value, and we build upon. This in turn discriminates against the contexts of knowledge generation in those places, including invaluable cultural perspectives. For example, in Latin America, there is an incentive policy in higher education that aims to foster increasing internationalization. This is understood as publishing in WoS-and Scopus-indexed journals. A consequence of this is that some researchers no longer focus on 'local' topics because they would not get published in 'international' journals. Virtually all journals publish important work at different levels and are relevant to different elements of society. That this is supressed in systematic ways against specific forms of knowledge by WoS and Scopus should be a matter for concern. I do not believe that either entity should be in control of defining what research is deemed significant or not significant, given that they intrinsically have little scientific interest in this matter.
This trend will continue as both Clarivate Analytics and Elsevier continue to extend their control over critical elements of scholarly infrastructure. 14 The fact that Elsevier is also the leading publisher of scholarly content is not a coincidence, and amounts to one of the most significant conflicts of interest in the world. 15 The very fabric of our knowledge society is in the ongoing process of being handed over to for-profit enterprises.
Knowledge infrastructure, academic cultures, and research practices have become subject to the maximization of profits in the interests of a few and at the expense of everyone else.
The first simple step to resolve this problem is to simply stop all research from using either platform, and for users and institutes to stop subscribing to them. By deconstructing or reducing the power of existing faulty elements of the scholarly ecosystem, funds and energy can be liberated and put to use in investing in a more open, non-profit, community-owned global scholarly communication infrastructure, which provides more efficient, effective, and representative information on the global knowledge landscape. This born-digital infrastructure should be truly equitable, comprehensive, and multilingual, facilitating fair participation in knowledge creation.
Decolonizing scholarly communication is not something simple, and the primary focus must be on creating inclusive digital infrastructure that does not replicate the hegemony inherent in the present systems. This requires synchrony in both reducing the power status of those existing systems and simultaneously amplifying other voices that have previously been marginalized to occupy such a space. International research funding bodies have a key responsibility here to unite to help achieve this in the context of the UN Sustainable Development Goals. This promises to address the geopolitical impact that existing systematic discrimination has on knowledge production 16 and to further the inclusion and representation of marginalized research demographics within the global research landscape.