A screenshot of the semanticscholar.org website home page is shown on a laptop screen.
An arrow pointing leftHome

A milestone for Semantic Scholar


As the AI-powered search engine for academic studies turns six this month, it also hits a giant record — 200 million papers in its archives

Even the most diligent scientists need a quick primer on the latest research. Which is why Semantic Scholar, the AI-powered platform for academic papers, can come in handy when you want to know the latest studies on, say, Covid-19 or Russian troll accounts. And this month, the rapidly-evolving search engine turns six, while also hitting another milestone: uploading 200 million papers to its archives. “Semantic Scholar is a poster child for AI2’s mission: AI for the Common Good,” says Oren Etzioni, CEO of the Allen Institute for AI, which created the project. “When we launched it, we had no idea that it would serve upwards of 8 million users per month just a few years later.”

What began in 2015 as a database for some 3 million computer science papers has recently grown into much more. Along with adding neuroscience papers, then biomedicine, then all fields of science in 2019, the platform last year launched the CORD-19 dataset and paper, a comprehensive dataset of more than 300,000 full-text Covid-19-related papers, and more than 840,000 metadata entries in total, that’s available to anyone, thus facilitating further research on the pandemic. To date, this largest single collection — with some articles on coronaviruses that would otherwise languish behind paywalls and others that date back to the 1950s — has been downloaded more than 200,000 times, and has become the basis of the most popular Kaggle competition ever.

Summary page on semanticscholar.org for the of CORD-19 dataset paper

Semantic Scholar uses the latest machine learning and natural language processing techniques to automatically “read” emerging papers, analyze their content, and extract their contributions and limitations, saving researchers and reporters untold hours poring over text. Acting as a researcher’s Spotify, the platform also recommends to each scientist papers it thinks they’ll find interesting, then improves its matching abilities based on the researcher’s actions. Recently, the platform introduced a new feature that automatically summarizes each paper in its archive, creating a one-sentence “TLDR” summary to answer the time-sucking question vexing every researcher: To read or not to read potentially relevant papers.

NLP search results

“With challenges ranging from global pandemics to climate change, speeding scientific advances offer our best hope at a solution,” says Dan Weld, General Manager and Chief Scientist of Semantic Scholar and Professor Emeritus at the University of Washington. “But scientists today are overwhelmed by the exponential growth in publications, which doubles every few years. It’s no longer humanly possible to keep up with the latest advances in one’s field.”

AI for the common good, indeed.