September 2010

SCIENCE PUBLISHING:

Uncovering Plagiarism in Biomedical Research

Like any other professional, scientists sometimes engage in unethical research conduct. Many technical medical editors are indifferent to plagiarism; non-plagiaristic fraud is relatively common in biomedical and clinical research; scientists often do not comply with the data sharing requirements of open-access journals.

I don't say this to tar scientists as a group. It's nevertheless important to uncover unethical behavior before it impacts other scientists' research, although ideally it would be prevented before it starts, since investigating even one case of scientific misconduct costs tons of money.

Harold Garner (Virginia Bioinformatics Institute, United States) and coworkers have contributed to this need. They have used the computational software eTBLAST to uncover plagiarism in the PubMed Central database.

Searching the technical literature with eTBLAST.

eTBLAST was originally developed as a research tool to search the biomedical technical literature by topic. It has recently been used to uncover plagiarism, i.e. false data and submitting largely duplicate manuscripts.

PubMed Central is a database of publicly-funded technical research articles in the life science and biomedical research fields. Since entire articles are archived in this database, eBLAST can be used to search entire articles, as opposed to simply abstracts, enhancing the utility of this search tool to uncover plagiarism.

The scientists used eTBLAST to quantitatively compare the text of one article against all others in the database. Over 72,000 articles were searched, according to the full text, broken down into sections (e.g. introduction, discussion, etc), and broken down into paragraphs.

A little over 2 seconds was required to seach 200 words of text against all articles in the database. Similar articles were classified as possessing a shared author or not.

Plagiarism revealed.

The scientists found 150 article pairs with substantially similar abstracts and complete text, 282 article pairs with substantially similar complete text but not abstract similarity, and 598 article pairs with substantially similar abstracts but not complete text similarity. This indicates a plagiarism rate of around 1%.

Manuscripts which shared at least one author were 2.31 times as likely to share an introduction, and 1.83 times as likely to share a methods section, than those which did not share any authors. A shared results section didn't lean in either direction.

The methods section was the most likely to be plagiarized. However, similarity among two results sections is the most conclusive indicator of plagiarism.

Further, 262 pairs of similar review pairs were uncovered which shared at least one author (of over 5400 total reviews), a substantially higher rate of plagiarism. Fifty-four percent of them were published in the same year, and 68% were even published in the same journal.

Overall evaluation.

The scientists note that of the similar manuscript pairs they manually examined subsequent to eTBLAST analysis, none would be considered unethical by a typical scientist, since they were multipart submissions, updates, and this general kind of duplication. I respectfully counter that scientists should not break up their articles into smaller bits simply to increase their publication output.

I personally feel that scientists should resist the indirect demands of funding agencies and promotion committees, which tend to consider total output at the expense of scientific rigor, novelty, utility, and quality. This is easily stated, and is a commonly expressed opinion, but unfortunately it is a huge problem that ultimately requires a broad change in mindset within the scientific community.

eTBLAST is a very useful method for uncovering plagiarism in biological research. Reviews appear to be rather excessively plagiarized.

Searching articles by introduction or methods section yields many plagiarized articles, but those uncovered by comparing results section yield the most conclusive plagiarism results. The use of eTBLAST should be extended to uncover plagiarism in math and physical sciences.

NOTE: The scientists' research was funded by the Hudson Foundation, the National Institutes of Health, and the National Library of Medicine.

ResearchBlogging.org for more information:
Sun, Z., Errami, M., Long, T., Renard, C., Choradia, N., & Garner, H. (2010). Systematic Characterizations of Text Similarity in Full Text Biomedical Publications PLoS ONE, 5 (9) DOI: 10.1371/journal.pone.0012704