By Dr. Matt Austin and Dr. Shannon Cole
Interest in how to best incorporate artificial intelligence (AI) into the US healthcare delivery system is growing. The benefits AI can offer our healthcare system could prove both live-saving and cost-saving. However, despite the considerable opportunities that AI may offer, we need to remain mindful of the current disparities we have in the U.S. healthcare system across patient race, ethnicity, and socioeconomic status in the access, utilization, and quality of healthcare services. As one example of these disparities, recent studies have found that patients that identify as Black are more likely to receive care at hospitals with higher rates of patient safety events than patients that identify as white, and even when treated at the same hospital, Black patients are more likely to experience a patient safety event than white patients.
As we work to adopt AI into healthcare, we need to ensure AI at best helps shrink existing disparities, and at worst, does not exacerbate existing disparities. However, achieving this goal of greater equity will not “automagically” happen. To achieve this goal, we will need to adopt a practice of “equity intentionality” across the healthcare solution lifecycle, including the planning, development, acquisition, and implementation phases of healthcare solutions. The positive impact of AI on health at the individual and population level will be limited if the unique characteristics, needs, and capabilities of ALL patient groups are not considered throughout the lifecycle. For example, some AI algorithms used for risk prediction have been found to perform less well in patient subgroups that were not included in the datasets that were used to “train” the algorithms. Adopting a practice of “equity intentionality” promotes the idea of engaging ALL patients for which the solution is designed at EVERY stage of the planning, development, and implementation process.
In the ongoing conversation of how best to mindfully legislate and regulate the embodiment of equity values in AI, we need to acknowledge that there are major gaps in knowledge about how AI operates in reality. The adage of “garbage in; garbage out” holds true in terms of how the quality of data used dictates the quality of output, but we still do not fully comprehend how the inputs get converted to outputs in evolving AI learning algorithms. Although we can measure inputs and outputs, in many respects we still remain ignorant of the nuts and bolts of how AI operates. If we do not understand all of the relevant internal mechanics of an intelligent system, how can we ensure that AI will have adequate safeguards and operational guidelines that will the propagation of that inequities?
Are All Technologies the Same or is AI a Different Beast?
In contrasting other health technologies with AI, a relevant question is “where are things the same and where are they different with AI?” That is: are there fundamental differences between the technologies we have known versus emerging AI technologies? With other technologies, we have had the capacity to look at inputs, to examine internal mechanisms of these technologies, and to intervene and alter these mechanisms that shape outputs to improve healthcare delivery and patient outcomes. Historically, the difference is that we could understand the inputs and outputs and how they are mechanistically linked.
This line of questioning highlights a challenge for regulators and the health care community: if AI is so different than what we have known, how can we take our existing knowledge base of how best to regulate, improve or implement non-AI technologies to be of equitable health benefit, and apply those same lessons to AI in a useful way? In some senses, we are going into a world where we do not understand how the inputs get converted our ability to manage outputs and outcomes feel less concrete. We need a certain amount of trust that the tool is helping, but we may not actually know whether information it is providing is accurate because we do not know how the “black box” works.
A Consequentialist Approach to Equitable AI: What Outcome Measures Do We Need to Effectively Regulate Equitable AI?
As a country, we are struggling with what should be and should not be regulated, and how to reach acceptable conclusions in a world with AI tools. Additionally, who is best to regulate AI? Which existing government agencies are best to assert that equity needs to be considered as a part of what is being regulated? What are existing federal agencies to understand and what is the current regulatory authority around digital healthcare? If we cannot fully understand the inner mechanics of the AI black box, what can we use as meaningful measures to shape its development and integration?
One option is to start at the end: does an AI produce the outcome that we want? If it does not, it is difficult for us to work backwards because we do not necessarily know the internal mechanics of an AI. The outcomes and consequences of the AI can be observed and monitored to determine if AI-equipped systems are achieving our goals, values, objectives that we as a healthcare system have determined as important. In theory, this could be accomplished by giving different AI products different equity scores: an AI product that produces highly equitable outcomes can receive an A, those that do not can receive a D or F. We can do post hoc evaluations to have certain products improve or we can eliminate low-scoring products from the market.
A real-life example of a currently implemented scoring system implemented which Health Services Cost Review Commission (HSCRC) refers to as the “PAI score”: a patient adversity index score. A PAI score looks at readmissions gaps between different groups, and what you want to do is minimize the gap or disparity between 2 subgroups. A PAI score does not say that the readmission rate is good. It simply indicates that a hospital has done a good job minimizing gaps between different subgroups. Applied to AI with to goal of equitable health care: we could calculate a score of how equitable or inequitable health outcomes are based on the application of an AI product by examining readmission rates between 2 different groups and see whether disparities get better or worse between different subgroups. Further, we can use these same scoring systems to assess how well AIs are with generalizable results across different institutions, within the same hospital or between different subpopulations. Such a generalizable or similar scoring system is important as comparisons at each of these levels will be needed. For example, in follow up research on finding described above that even within the same hospital patients from historically marginalized communities experience more harms than patients from historically non-marginalized communities.
Food for Thought: Questions that We’ll Need to Wrestle With
A consideration is how can we measure disparities in patient care before or after solutions are implemented? A key area where the US has historically struggled to get good data about our patients and in stratifying quality in safety measures by important associated demographic data. To understand if a technology is driving greater differences in mortality rates between 2 subgroups, you need to already be measuring these groups to form a baseline. This means we would already need to be collecting and stratifying data to determine if differences exist. Therefore, we need to better understand who our patients are to understand if these solutions are beneficial. Who are we trying to attract? Are the patients going to identify as white or black? Are they going to identify as straight or gay? The data that we used to train AI models needs to be representative of the population or populations that this technology is intended to interact with or helping. There are examples where data used for training a model had some degree of bias in terms of what was included. Thus, we know that we need to be mindful that these situations do exist in the development, planning, and implementation of any technology; including AI. We need include adequate patient representation at each step of this process to ensure we are not exacerbating existing disparities, and at best, to reduce these disparities. We need to pay attention to those who are going to be using and impacted by the technology itself. Unfortunately, with modern AIs, we don’t necessarily know who the end user will be.
Questions that we need to ask ourselves about equity and disparities related to AI include:
- Which patient groups is this health care solution intended to help? Have we successfully engaged representatives for all subgroups in our planning and development phases?
- How do we ensure that those implementing the health care solution are being intentional about equity?
- What can we do to measure disparities before and after the solution has been implemented to understand its impact?
- Have we tested the solution in different patient subgroups?
- What are the potential unintentional consequences to different patient subgroups?
- What feedback loops have we constructed so that we can assess and improve the solution to ensure greater equity?
What Oversight Might Be Required?
Testing and regulatory oversite is needed to define acceptable outcomes and to test that these outcomes will be achieved before product deployment. Who would both test and regulate acceptable standards for AI products, what products can be improved, and which need to be entirely discarded? Who is responsible and who bears the risk in creating a model? For example, if the Food and Drug Administration (FDA) were to regulate algorithm development for improved patient outcomes, would the developer be required to 1) share data used to train and outcome data with the agency, 2) would the FDA require the inclusion of patients in different patient subgroups for 3) regulation decision making? What are the guideposts that we can use to pre-emptively embody equity, to mitigate risks, and to mitigate those risks without reacting to patient harms to inform regulation? What contingencies and actionable insights are needed now?
As a historical example, the FDA is the authority that regulates pulse oximeters, which are known to consistently overestimate oxygen levels in patients with darker skin pigmentation. An equity intentionality approach would require that testing guidelines be shared with the FDA to determine that pulse oximeters currently in use are improved or replaced to provide more accurate blood oxygen levels for all patient demographics to reduce preventable harms. Many people were harmed before we began acknowledging inaccurate information provided by pulse oximeters. This results in changed mandates and guidelines for more accurate, less harmful equipment to be used.
Similarly, AI regulations appear to be ad hoc as these relatively new technologies go online. Do we currently have a high-quality data, stratified, demographic data sources that we can examine today to compare as AI technologies roll out to know they are helping everyone and not just one group or one group over another? Presently, it seems as though we are more in place to identify potential known risks rather than every solution to potential risks. It can feel as though we are figuring out what the solutions are before we really know what the exact problems are and what we need to be thoughtful about.
So, what are the things we need to be thoughtful about as we move forward? What are the things we need to pay attention to?
We argue that we should be paying attention to equity. Because even if we do not have all the right answers, if we are not being intentional or thoughtful about equity, then it probably will not happen.
One thing is clear: there will be a lot of learnings as we work to adopt AI into health care.
About the authors:
Dr. Matt Austin is a principal faculty member at the Armstrong Institute for Patient Safety and Quality and Director for the Center for Meaningful Measures. His research focuses on performance measurement in health care and his current research interests include understanding the role of transparency of quality and safety data in driving improvements in care delivery, measuring disparities in the quality of care, and measuring the diagnostic performance of hospitals. Matt is a co-author of a newly published Agency for Health Care Research and Quality funded Digital Health Equity Framework and Implementation Guide that may be useful in the development, acquisition, and implementation phases of healthcare solutions, like AI.
Dr. Shannon Cole is the Senior Medical Writer and Editor at The Armstrong Institute for Patient Safety and Quality. They facilitate organizational communication and dissemination efforts between research and clinical operations, including leading the "AI on AI" series.
The opinions expressed here are those of the authors and do not necessarily reflect those of The Johns Hopkins University.