As DNA tests for ancestry explode in popularity, a fundamental problem remains: The tests deliver more detailed results for people of European descent, as evidenced by the ethnicities and data that major DNA testing companies represent. While this bias should, theoretically, abate as more people take the test and add their DNA data to the mix, the companies have some work to do before their kits can work reasonably well on a worldwide population.
In 2017, more people took DNA tests than in all the previous years combined, according to the MIT Technology Review, and that number keeps climbing. According to the International Society of Genetic Genealogy (ISOGG), more than 18 million people have tested their DNA to learn about their ethnic identity or to find relatives. DNA testing companies like AncestryDNA and 23andMe have become household names as a result, while new tests claiming more specialized results crop up every few years.
It’s easy to see the appeal. For $99, 23andMe and AncestryDNA simply require that you spit in a cup, send it off to a lab for testing, and then wait a matter of weeks to learn the ethnic breakdown of your genes by region. (See our comparison of these two popular kits.)
The data problem
The risk for bias in DNA tests starts with the databases used by the companies. AncestryDNA, for instance, bases the ethnicity estimate in its test upon a reference panel sourced from the DNA of 16,638 people representing 43 different populations. The people in the reference panel are screened to ensure they represent a certain ethnicity strongly—“people with a long family history in one place or within one group,” the company explains. The screening involves controls, such as removing close relatives, to avoid skewing the ethnicity profile.
While this pre-screened data can identify ethnicity on a broad level, more detail comes only with more data. Every DNA test kit sent in adds to the company’s database. That’s why leading contenders AncestryDNA and 23andMe have some of the best estimates available—they have more customers, and therefore more data.
Because DNA tests like AncestryDNA and 23andMe were at first available only in the United States, however, and have expanded mostly to European countries or former European colonies, the customer base continues to be fairly homogeneous. ISOGG estimates that four-fifths of the people who have taken DNA tests are U.S. citizens, meaning their data reflects a population with majority European ancestry.
Challenges in funding and poor infrastructure make it more difficult to gather genetic data on underrepresented DNA groups like Africans, Asians, and indigenous peoples. Sarah Tishkoff, a professor at the University of Pennsylvania who has studied African genomics for 18 years, told PCWorld, “right now, it’s not possible to infer the exact sources of ancestry of African Americans,” Tishkoff said, ”and it would be unfortunate if they have the expectation that they will be able to get that information.”
Tishkoff said that gathering a more diverse set of DNA data brings its own challenges, both financial and ethical. “There needs to be better funding and resources for generating that data. It’s also important to do the research in an ethical manner. I personally think there should be caution about using information from indigenous populations for commercial purposes such as ancestry testing.”
Regional representation: A breakdown
Now that you know how the data for these DNA tests is gathered, the ethnicity breakdowns among the tests are no surprise. All the major test companies’ data skews toward people of European descent.
AncestryDNA is the most popular DNA test in the world, having sampled more than 10 million people. Yet 296 of the 392 ethnic regions it represents are for people of European heritage. That's more than three-fourths European.
23andMe, the world’s second-most popular DNA test, became more representative of non-European ethnicities earlier this year after it added regions for Asia and Africa. The company has tested the DNA of more than 5 million people. Of the ethnicities it represents in the Ancestry Composition panel if you take the test, 52 of 171, or 30 percent, are European.
What’s more, half of the DNA reference samples 23andMe uses to test a customer’s genes and estimate ethnicity come from Europeans, suggesting it’s better at evaluating people of European descent.
AncestryDNA also has a disproportionately higher amount of reference samples from people of European heritage. Of the 16,636 samples AncestryDNA uses, more than 65 percent come from people of European ancestry.
Even though Africa is geographically larger than Europe, China, and the U.S. combined, AncestryDNA offers only 33 ethnic regions for people of African descent, while 23andMe has 34 regions. Compare that to the 296 regions AncestryDNA offers for people of European descent, and 23andMe’s 52 regions.
In the case of AncestryDNA, many of these regions include European migrations into America. AncestryDNA’s Europe category lists 173 ethnic regions for European settlements in America. The test does something similar for African Americans, but only 24 of the 33 regions in its Africa category track the lineage of Africans forced into slavery.
How DNA testers are diversifying their data
When asked about how DNA tests are less detailed for non-European people, an AncestryDNA spokesperson told PCWorld that the company plans for its test to include more than 500 regions by early 2019, with a particular focus on African American and Hispanic communities. To improve its test, AncestryDNA is gathering more DNA reference samples from around the world, updating its algorithms, and adding and updating the genetic markers of diverse global populations.
“Our company’s history is one of continued evolution and progress and our platform is constantly improving as more and more people participate through AncestryDNA and build family trees,” the spokesperson said.
When 23andMe first offered its ethnicity estimate in 2008, the company included only three regions. Now, it represents 171.
The rapid growth is a testament to how algorithms and big data can quickly improve genetic science. But there’s still more to be done.
To better serve underrepresented DNA groups, 23andMe launched the Global Genetics Project in February of this year to gather more genetic data. If you have a grandparent from one of 59 underrepresented countries, 23andMe provides you with a free test and access to its more than 90 genetic reports.
Joanna Mountain, senior director of research at 23andMe, told PCWorld in an interview that the Global Genetics Project has already exceeded its original two-year goal of collecting 5,000 samples in less than a year.
“We really have captured the genetic diversity of the world in a way that I would never have imagined 20 years ago,” Mountain said.
Mountain said 23andMe is also collaborating with researchers and academics to gather more data and better educate the world about genetic science.
“Many people in this country and beyond have very little understanding of genetics and concerns about privacy,” Mountain said. “So there is a lot of education to be done.”
Mountain said 23andMe noticed early on that there was a bias in its reference sample data because they had more U.S. customers. “We have more representatives of Italy than we have of Devon, [South Africa], for instance, which is not surprising given our customer base.”
But she said that doesn’t always mean 23andMe is less detailed for people of non-European descent. Someone from Mexico could learn about both their indigenous and Spanish ancestry, for example.
“It varies so much from person to person depending on your family’s history,” Mountain said. “You could at a very crude level say that Europeans might get a bit more detail, but that’s going to be very much variable.”
The good news is that 23andMe and AncestryDNA are regularly updating their models to improve the accuracy and detail of their tests.
“We are going to be looking where people get less detail and working to fill those gaps and to provide more detail to as many people as we can,” 23andMe’s Mountain said. “So that’s going to be something we continue to push on in the next five years.”