GenBank can be trusted, study shows: Comparisons of 4.7 million mtDNA sequences show GenBank is reliable for animal IDs
Did a murderer walk through the room? Did a shark just swim by? Is this a poisonous mushroom? Which reef species are lost when the coral dies? These questions can potentially be answered quickly and cheaply based on tiny samples of DNA found in the environment. But identifying DNA requires a trustworthy library of previously identified DNA sequences for comparison. Smithsonian scientists and their colleagues analyzed more than 4.7 million animal DNA sequences from GenBank, the most commonly used tool for this purpose, and discovered that animal identification errors are surprisingly rare -- and sometimes quite funny.
"We wanted to use GenBank to identify DNA from ocean water samples as we evaluate the health of coral reefs and other marine ecosystems, but we were concerned by reports questioning the accuracy of the data there," said Matthieu Leray, post-doctoral fellow at the Smithsonian Tropical Research Institute (STRI). "In our sequence comparisons, we found fewer errors than people had predicted, which is a very good news, because monitoring programs and conservation efforts increasingly rely on analysis of environmental DNA."
The reliability of data in GenBank, the virtual library maintained by the U.S. National Center for Biotechnology Information at the National Institutes of Health, where geneticists deposit DNA sequences from all living creatures, has been questioned in the past. An article entitled "Can You Bank on GenBank?" published in Trends in Ecology and Evolution in 2003, referred to studies showing that half of human mitochondrial DNA sequences contained errors, and that there were significant differences in sequences deposited for fruit flies. Another article reported that 12 of 51 species of the highly poisonous fungus, Amanita, were misidentified.
"We assumed that we would find lots of errors when we started the study," said Nancy Knowlton, scientist emeritus at STRI and at the Smithsonian's National Museum of Natural History.
"Some people think that GenBank is just a data dump," said Leray. "No one checks to see if the data are entered correctly. Researchers just upload their sequence data and they don't have to deposit a specimen anywhere in particular, so if there is a question, there may be no way to go back to the source to find out if a sequence is correct. We needed to be sure that GenBank was a good tool to use to identify marine organisms in our samples, so we decided to find out."
With colleagues from Academia Sinica and the George Washington University, Leray and Knowlton estimated the proportion of sequences with incorrect genus, family, order, class and phylum names. Overall, less than 1 percent of the sequences were mislabeled. They identified certain groups of animals that are particularly problematic and some of the potential sources of error like mislabeling and contamination from humans, rodents, lab animals, food, mosquitos and pets like dogs and cats.
"For example, when you enter sequence data, at some point there is a drop-down menu giving choices of different species," Leray said. "Some people evidently just clicked on the wrong species, the one above or below the name of the species they were trying to enter. This part of the process could be fixed to lower the error rate even further."
Direct DNA identification is a fast, low-cost way to answer many questions about the environment, and GenBank is a reliable tool to use to identify the source of the DNA. The authors concluded: "Our encouraging results suggest that the rapid uptake of DNA-based approaches is supported by a bioinformatic infrastructure capable of assessing both the losses to biodiversity caused by global change and the effectiveness of conservation efforts aimed at slowing or reversing these losses."
More News in Environment
| Press Release NEW YORK, NY / ACCESSWIRE / October 21, 2019 / Health, Wealth, Love, and Happiness - the 4 pillars to a good life. We often view the pillar of health through the
Ocean acidification can cause the mass extinction of marine life, fossil evidence from 66m years ago has revealed. A key impact of today's climate crisis is that seas are again getting more acidic, as they absorb
SHSPhotography The Orionid meteor shower is due to reach its peak tonight (Monday, October 21). Here's what you need to know about watching the spectacle. Like all meteor showers, the Orionids is caused by particles
Analytics as a Service Market Overview: Global Analytics as a Service Market is expected to reach USD 30 billion at a 35% CAGR over the forecast period 2017-2023 and Analytics as a Service
Scientists have developed an artificial skin that allows devices such as smartphones and computers to 'feel' the user's grasp, pressure, and location, and can even detect interactions such as tickling, caressing, twisting and pinching. The
Oct 21, 2019 06:00 UTC MORRISVILLE, N.C.--(Business Wire)--Circassia Pharmaceuticals Inc. ("Circassia" or "the Company"), today announced the launch of DUAKLIR® PRESSAIR®(aclidinium bromide and formoterol fumarate) for the maintenance treatment of patients with chronic obstructive pulmonary disease