Back in my childhood when I first got into science fiction, I read all those classics, from early romantic science fiction like Jules Verne into the “golden age” of Asimov, Clarke and beyond. Ah, all those wonderful dreams of perfect societies, robots and spaceships! Except that the overwhelming majority of those perfect societies bore a striking resemblance to a certain ideal of 1950s America, where women were love interests/to be rescued/in the kitchen and people of colour were…er, nowhere to be seen, actually. Now that we’re in the timeline of the future that a lot of those books imagined, it’s refreshing that the science fiction is a lot more reflective of the wide variety of human experience that exists. I’d like to say that societies have progressed along with the technology, and, very unevenly, they have. A bit. And yet somehow it’s still really disappointing to me when I read something about personalised medicine, and that something is: it’s heavily biased towards white people.
Here’s a link to an (open access) paper looking at genetic variants associated with Mendelian inherited disorders (i.e. ones with fairly simple genetics, not complicated multifactorial diseases like heart disease, which have large numbers of potential genetic risk factors plus a big environmental contribution). They looked for these variants in a newly released, far more sensitive database, amongst individuals with different ancestry groups. It has one figure, showing the percentage coverage amongst these ancestry groups and charts of the number of candidate variants per genetic disorder. It’s pretty damning. Stare and wince:
Figure 1: a Percentage representation of the 5965 IGM reference cohort across six geographic ancestry groupings. b A semi-transparent overlaid histogram representing the tally of candidate variants between IGM’s 5094 European (Eu) individuals (blue) and the collection of non-European individuals (red) (Mann–Whitney U test p < 1 × 10−320). The non-European distribution reflects individuals with a: Latino ethnicity (La), East Asian (EaAs), South Asian (SoAs), primarily African (Af), and unassigned (Un) ancestry. Estimates indicate the mean number of singleton non-synonymous variants among OMIM disease-associated genes. Singleton variants are identified based on a reference cohort of 5965 IGM sequenced samples. c Percentage representation of the combined 66,217 IGM and ExAC reference cohorts across six geographic ancestry / ethnic groupings. d Similar to b but singleton variants were identified based on the absence among the combined IGM and ExAC reference cohorts accumulating to 66,217 samples.
Okay, I’ll admit the blurb for that figure is horrible. Essentially, the top half, (a) and (b), is from the old database. The second half, (c) and (d), is from the old database combined with the shiny new one (ExAC). (a) and (c) are showing you what percentage of the database is sampled from various (very broad) ancestry groupings. (b) and (d) are showing you the average number of disease-associated variants in different genes (“non-synonymous” meaning non-overlapping variants that are basically the same thing), with blue being European and other colours non-European.
First off, anyone can see that people of European ancestry are hugely over-represented. The current population of Europe is around 750 million, out of a world total of 7 billion. Even factoring in populations descended from Europeans in America, Australia, etc., that’s really not that much. Han Chinese alone should account for around 18%, if you’re being strictly representative. Secondly, the number of genetic variants associated with these diseases in European ancestry populations is much lower. This might seem a little counter-intuitive, but, essentially, the better the coverage, the lower the number of candidates because it’s more accurate – the higher the number, the more likely it is that some of these are false candidates that have not yet been ruled out by more vigorous screening or, as is most likely in this case, more diverse population screening. The authors make this depressing statement:
Need and Goldstein specifically argued in 2009 that our ability to effectively filter variants to identify pathogenic ones as sequencing becomes clinically routine would be very different amongst different ancestry groups unless our knowledge of genetic variation is made more equal across ancestry groups . Unfortunately, now with clinical sequencing becoming routine this fear has been clearly realized.
Of course, it’s not that all these geneticists are outrageous racists (though problems with unconscious bias are an issue for everyone, and some of them probably are). It’s largely to do with the funding, and the existing health inequalities. Biomedical research is concentrated in Europe and America (though China is rapidly catching up). That’s where the money is, and where most of the researchers are. If somebody in Frankfurt gets a research grant to sample a thousand people’s genomes for variants for a rare disease, they are going to ask for volunteers from Frankfurt, or, at least, Germany. And those people will primarily be people of European origin. They aren’t going to take a trip around China, India and Africa to get more diverse genetic samples; even if they wanted to, they simply wouldn’t have the funding (or the time). They could, however, be aware of this problem, and expand it to include more people of non-European ancestry within their countries. That would help. The authors conclude [bold emphasis mine]:
Given that sample sizes are about to explode with the US national initiative and other large-scale international sequencing studies, it is vital that we ensure the most equitable distribution of the generation of genomic data possible. Enriching our knowledge of genetic variation in different ancestry groups remains the most effective solution to this problem. With initiatives like the recently announced Precision Medicine Initiative (PMI) Cohort Program, this must be recognized as a high priority for the field as we move towards an era where precision medicine is a reality. If not, genomics could further contribute to healthcare inequalities
About those healthcare inequalities…this is a problem that of course will compound with existing problems. A poor person living in rural Namibia, say, will face multiple issues in taking advantage of this fancy new personalised medicine lark: They may live a very long way from a healthcare centre large enough to offer this sort of service; they probably won’t be able to afford a genetic test; even if they did, they may not be able to afford any available treatment, etc. If somebody like that pays for a genetic test that is going to give an intrinsically less useful result, it’s a social injustice, and a problem. It’s also worth noting that at least some of that list will also apply to a poor black person in Louisiana.
You will note that the common factor there is “poor”. Health inequalities to a large degree reflect wealth inequalities (of course, wealth inequalities themselves encompass a whole suite of historical causality of which racism and colonialism feature heavily, but that’s another story I’m not qualified to tell). Any ray of hope on the horizon? Well, in terms of this analysis, the first piece of good news is that this inequality has at least been recognised, and this paper was specifically looking for this problem to see how bad it still was. The second piece of good news is that when they expanded their analysis and included the new dataset, it did improve thing significantly. There’s just quite a way to go yet.
And since I’m still disappointed, I’ll be focusing on a success story in overcoming the challenge of health inequalities in my next piece.
Petrovski S, Goldstein DB. Unequal representation of genetic variation across ancestry groups creates healthcare inequality in the application of precision medicine. Genome Biology 2016; 17:157
 Need AC, Goldstein DB. Next generation disparities in human genomics: concerns and remedies. Trends Genet. 2009;25:489–94