Data Mining Research Papers
He has spent decades publishing copyrighted legal documents, from building codes to court records, and then arguing that such texts represent public-domain law that ought to be available to any citizen online. Now, the 60-year-old American technologist is turning his sights on a new objective: freeing paywalled scientific literature. Over the past year, Malamud has — without asking publishers — teamed up with Indian researchers to build a gigantic store of text and images extracted from 73 million journal articles dating from 1847 up to the present day.The cache, which is still being created, will be kept on a 576-terabyte storage facility at Jawaharlal Nehru University (JNU) in New Delhi.

“This is not every journal article ever written, but it’s a lot,” Malamud says.

It’s comparable to the size of the core collection in the Web of Science database, for instance.

Fourth, by reviewing paper, the reviewer can feel that he is helping the research community. Personally, I receive a lot of requests to do reviews from journals. But now, I decline many of them because otherwise it will take too much of my time.

At first, I was accepting all of them when I was a Ph. So I usually only review the papers that are related to my field and for the top journals and conferences.

Researchers in India, the United States and the United Kingdom are already making plans to use the JNU store instead.

Researchers in India, the United States and the United Kingdom are already making plans to use the JNU store instead.

Malamud and Lynn have held workshops at Indian government laboratories and universities to explain the idea. Dozens of research groups already mine papers to build databases of genes and chemicals, map associations between proteins and diseases, and generate useful scientific hypotheses. But publishers control — and often limit — the speed and scope of such projects, which typically confine themselves to abstracts, not full text.

"Our position is that what we are doing is perfectly legal," he says. For the moment, he is proceeding with caution: the JNU data depot is air-gapped, meaning that no one can access it from the Internet.

From the perspective of reviewers, reviewing papers is also important. First, it means that the reviewer is recognized as having enough expertise to review papers. For example, if you are invited to review papers for some famous journals, you can mention it in your CV and on your website, as it shows that some famous journals are trusting you for doing reviews. If I receive some offer to review papers that are unrelated to what I am doing or from journal or conferences that I never heard of, I will decline the invitation. For example, if a friend ask me to review for his journal, I will usually say yes even if the paper is not too much related to what I am doing. Actually, you can consider the job of a reviewer as free work as the reviewers usually never get paid.


