A new study has found that Google Scholar renders research written in languages other than English practically invisible. No matter how relevant and applicable the research, the study states it gets buried in favor of research written in English, making it largely inaccessible. This is detrimental to researchers from all over the non-English-speaking world, study authors state, reinforcing the belief that scientific literature in non-English languages is lacking, or non-existent. In reality, it is the algorithm that is hostile to these languages, making such research undetectable.
The research, published in Future Internet, sought to find out the connection between the language of a document and the sorting algorithm of search results on Google Scholar. The study was conducted by members of the department of communication at Pompeu Fabra University, Barcelona. Their results show that when a search is performed on Google Scholar — using a word such as ‘festival’ that is spelled the same across English, Spanish and Portuguese, for example — 90% of results in all languages except English are systematically assigned to positions that make them invisible, even if all of them have the same number of citations.
Language bias in academia is something that researchers have been facing for a long time. Not only are the vast majority of scientific papers published in English, the ‘correctness’ of English used in them is a factor that determines their acceptance into top journals—which, incidentally, also publish exclusively in English. It discourages non-English speaking researchers as it prioritizes the purity of the language over the content of their research. This also puts them at a disadvantage professionally, as they’re robbed of the most common, popular platforms that can further their research and careers. In addition, the overwhelming reliance on, and preference of, English favors only research that looks at the world in specific predisposed ways, brought on by the use of English, such as the tendency to prescribe indigenous knowledge as ‘folklore’ and not something that could have factual validity. It discounts anything that digresses from the norm, even when the information might be highly relevant and important. In medical research, for example, clinical trials that are reported in languages apart from English are often excluded from meta-analyses, despite their legitimacy.
Related on The Swaddle:
Transgender Kids’ Brains Reflect Their Gender Identity in Structure and Function
Online, the visibility of scientific articles and conference papers depends upon the ease of finding them in academic search engines, especially Google Scholar, which is the largest academic research search engine in the world. To enhance visibility, search engine optimization (SEO) is applied to research papers, which ensures that they are ranked better—based on their relevance— in search pages.
To prevent fraudulent practices Google Scholar does not explain the algorithm it uses. “… we need to further our understanding of Google Scholar’s relevance ranking algorithm, so that, based on this knowledge, we can highlight or improve those characteristics that academic documents already present and which are taken into account by the algorithm,” Rovira, the first author of the study, said in a statement.
On Google Scholar, the number of citations any piece of research has is an indicator of the value, recognition, and importance of the published results. But the study shows that documents that weren’t in English were almost always placed in positions above rank position 900, meaning that they were buried under a very large number of documents that precede them. This is despite the fact that they sported hundreds of citations — a testament to their quality. While English research and articles that were highly cited were positioned in the first few ranks of Google Scholar search, documents in languages other than English with the same citation ranked way below, positioned even beneath English language articles that had fewer citations, and therefore were of worse relevance.
The finding has important repercussions, especially for articles that use keywords that are the same in languages around the world: trademarks, chemical compounds, acronyms, diseases, with Covid-19 being the most recent example.
“It is more than evident that until this bias is addressed, the chances of being ranked in a multilingual Google Scholar search increase remarkably if the researchers opt for publication in English,” added the authors.