Invisible Women - Data Bias in a World Designed for Men Review
- Veena Calambur

- Jan 24, 2021
- 6 min read
“Invisible Women - Data Bias in a World Designed for Men” by Caroline Criado Perez is one of the most thorough and well researched books to explore what sex-bias really looks like in the world. Media portrayals of sex-bias tend to be dramatic portrayls of women getting explicitly excluded or even harassed and by the final act if the protagonist hasn’t outright won they conclude on a “two steps forward, one step back” note to encourage the audience that true progress has been made. Unfortunately the real world hardly ever works this way and so much of bias is just unspoken and implicit. And until reading “Invisible Women” I hadn’t realized how bad it really was. Caroline Criado Perez does such a fantastic job if not only shedding light on so many facets of implicit bias in daily life, but attempting to do the impossible - attempting to quantify the extent of it.
One of the most critical takeaways for me is questioning whether or not a data source I’m working with is actually “sex-disaggregated data” because apparently it is not done nearly enough. As someone working with data on health and medicine for work, Part IV of the book, “Going to the Doctor'' was unfortunately not surprising but still incredibly impactful to thinking about my work, which is what I’ll primarily focus on for the rest of this post.
My team and I are acutely aware of collection bias that can be present, particularly when it comes to medical data since that is often so contingent upon healthcare access which we know is extremely difficult for certain sub-populations here in the United States particularly by race / ethnicity, socioeconomic status, geographic status, and of course by sex. But it never really occurred to me how often the sex of an individual data record is hardly captured across so many systems and how this can really impact clinical knowledge and practice.
The chapter starts out with one of the more common stories of women who go to doctors and are not taken seriously only to find out their suspicions were correct and the underlying pain and disease was indeed not a manifestation of imagination. Perez is quick to point out that the problem isn’t a few bad apples but they are “products of a medical system which, from root to top, is systematically discriminating against women leaving them chronically misunderstood, mistreated and misdiagnosed”. Medical studies date back thousands of years and although our scientific and medical practices have evolved considerably, one persistent remnant of the past is the assumption that the human body is male and medical research and cures must be designed for the default male body. And it seems as though this is still an incredible challenge to overcome this gap in medical knowledge even to this day.
Even though women represent half if not more than half of the population for major diseases such as cardiovascular diseases or HIV, women are lucky if they make up even 25% of the population of clinical trial studies. The clinical trial process allows drug manufacturers and researchers to understand and catalog effectiveness and safety information including side effects and adverse events. And it doesn’t seem to matter that there is a national mandate from the 1993 NIH Revitalization Act to include women and racial / ethnic minorities in publicly funded trials as there is either very little accountability. If nearly half of studies from the FDA audited found no sex captured in trial data it seems impossible to know if Revitalization Act has been help up to standard even decades later, even though we may have our suspicions that it may not be the case.
We don’t even have to look that far into the past to see these systemic issues. An article published in “Contemporary Clinical Trials” discussed an analysis of both observational and randomized clinical trials that have been published and found that a third of studies did not publish any information on race or ethnicity and all of the studies had significantly under-represented Black patient populations relative to how much they are currently impacted by the disease.
In my work at a pharmaceutical company there is this perception that compliance and regulatory standards of regulators such as the FDA are exhaustive and comprehensive and that meeting these standards is enough. But it is clear that unfortunately these agencies themselves don’t prioritize issues such as diversity in clinical trials and at best some companies will put out public statements touting their diversity efforts. But those are band-aid solutions to systemic problems.
And these issues can still persist for years to come even with increasing awareness of a need for diversity in medical trials and closer examination of equity in medical practices. If the FDA and NIH were to declare tomorrow that all clinical trials will be recruited proportionately to the population afflicted by the disease effective immediately, as a data scientist who typically works with years and years of medical data for training models, I would still need to account for the historical race and sex gaps in historical data somehow.
“Invisible Women” also does a fantastic job at addressing the implicit sexism in the perceived issue of sex gaps in collected medical data by often being labeled as “confounding”. Even in undergraduate statistics classes we casually discussed demographic variables as “confounders” but I never thought about how strange or even potentially damaging that language is. If applied across all races and sexes analyzed then maybe it wouldn’t mean much, but given how often white males are the default subject for analysis it often falls on underrepresented minorities and/or women to bear the status as the “confounder”. Even though luckily in our classes we did explore various effect models to examine these differences, Perez reports that there are several medical studies that simply find collecting and analyzing this data “burdensome” leading to the issue of lack of sex-disaggregated data. Going forward I’ll be mentally noting who I might be implicitly perceiving as a “confounder” in an analysis.
In past projects when I’ve analyzed anonymized electronic medical records, we have joked around on my team the presence of “garbage codes”. This covers very broadly medical diagnosis codes in which doctors did not give a clearly defined diagnosis to whatever ailment the patient was experiencing during their doctor visit. They tend to muddle analyses so we often exclude those codes since they don’t inform us of compelling medical patterns. So it was very eye opening to see that one of the codes I often exclude “nonspecific chest pain” is very likely to be made up of undiagnosed female patients who were actually very likely going through a heart attack and simply were not diagnosed or treated properly. They of course may reappear in the data with the heart attack or stroke they were likely suffering in their past visit but it struck me that something I had deemed a useless record could represent scores of women being ignored and turned away from critical care that they need. And I was treating them just like their doctors by filtering them out and tossing them aside.
I learned so much from this book not just about how shoveling patterns can be sexist but even the data I work with and the way that I conduct analyses continues to have traces of implicit sex-bias. My only criticism of the work is that the entire time I was reading I wish that Perez had highlighted some examples of the intersectionality of sex-bias and racial/ethnicity bias. I fully understand her book was already brimming with extensive detail citing hundreds of examples of studies and articles providing proof of the extent of sex-bias and that entire books could be devoted to the subject of intersectional gaps in data, but I would have appreciated if there was a little more acknowledgement of this. Especially in the medical outcomes section of the book in the back of my mind I kept wondering “what is this statistic for Black or Hispanic women”? I would have also like to have seen better acknowledgement of the difference between sex and gender identity so that it is clear if Perez was inclusive of transgender and non-binary confirming people. But otherwise I really appreciate “Invisible Women” doing the important work of bringing the issues of sex-bias to light so that folks and hopefully in particular data analysts will carefully consider who is missing from our data and its potentially fatal consequences.


Comments