A Shared Language

Standards

Even if all the necessary policies and regulations existed, it is still not enough for a given field to advance and mature. This is where standards come into play. Standards play a fundamental role in everyday life and are necessary to ensure the quality, safety, and functionality of almost everything we interact with in our daily lives. The term standards refers to the specifications for a product, system, or service. Take a simple product, such as a battery. Because standards for batteries exist, we can buy an AAA battery, and no matter who we buy it from, we know it will be the same size and will work in the same way. Without standards, we would not know what to expect. Standards and regulation are sometimes confused with one another, and indeed, there are many similarities; however, regulation is enforceable by law. Standards, on the other hand, are often created and maintained by professional organizations, such as the International Organization for Standardization (ISO) or the International Electrotechnical Commission (IEC) that creates standards for electrical technologies.

One of the challenging aspects of developing AI-powered software and applications for healthcare, is the lack of standard definitions for AI-related terms. Many companies, organizations, and professions use AI terminology in different and nuanced ways. The same term can mean something different to each group, creating confusion and preventing effective communication and collaboration in pursuit of new ideas, research, and innovation. The inability to effectively communicate also creates challenges for engaging in open dialogue on the value and ethics of AI and the practical considerations for how regulatory guidance should be implemented.

As shared, common language is critical for any field to advance as it serves as the foundation for research and communication. To this end, the Consumer Technology Association (CTA) published the first ANSI-accredited standard for AI in healthcare in February of 2020.¹ Developed through the input of over 50 organizations, it defines terms such as ‘assistive intelligence’ and delineates algorithmic bias from model bias. Although it only tackles broad terms at a high level, it is a great first step and we will hopefully see more standards created specific to research and development.

As you have seen come up time and time again in earlier sections, despite best efforts, there remain many gaps in policy, regulation, and standards. Great strides have been taken by various public and private organizations to get us to where we are today, but we also recognize that we have to continue moving the needle to ensure that the development and deployment of AI benefits society without conferring additional risk.

Through this review, we hope you are beginning to better understand the current landscape of artificial intelligence. Before we wrap up, there are a just few more considerations around existing gaps that we want to visit.

Mind the Gap

Gaps in policy

When it comes to addressing the governance of AI in healthcare, it’s quicker to list the regulations that have been created rather than everything we still have left to do. Addressing the lack of guidance and regulation is important for many reasons, one of them being its ability to hinder innovation and growth. For many health tech companies or startups that wish to disrupt the healthcare space through novel AI applications, the lack of regulatory guidance has created considerable uncertainty from a business perspective. Companies are hesitant to build something if they may later have to take it off the market or spend considerable financial resources to retroactively adjust to comply with new regulatory requirements when they are developed. However, not all AI use-cases need or will fall under the oversight of federal regulation, examples may include clinical decision support systems or risk prediction algorithms. For these areas of AI, there is a strong need for the development and validation of best practices and standards/benchmarks to help guide organizations, researchers and clinicians in how to effectively and safely integrate existing AI systems into healthcare operations and care delivery processes in a safe and effective way that improves health outcomes, increase care access, reduces costs, and improves operational efficiency. As many experts as stated, a key factor in the successful deployment of AI at scale will be the ability to establish public trust in AI. There is still much we don’t know when it comes to defining and measuring trustworthiness. Standards will be critical in establishing the minimal criteria a model must achieve to be considered trustworthy, including addressing two key components of trustworthiness, explainability and verifiability.

Explainability

What types of healthcare decisions are appropriate for blackbox algorithms? Outside of healthcare, there is active debate around the use of black box algorithms for high-stakes decisions, such as deciding whether someone qualifies for a loan, or whether a defendant should be released on bail. Is it okay for high-stakes decisions to be determined by algorithms if we cannot verify how the algorithm arrived at its decision? To help address this, policies have been developed in some sectors, such as the financial sector, that require some institutions to explain how they came to a decision. For example, if an individual is denied a loan, the bank must provide an explanation to the individual as to why they were denied and what could be changed to reverse the decision in the future. Since blackbox algorithms are not intrinsically interpretable, special techniques or methods are used to help explain how they make their decisions. For example, of all the variables a model looks at, we can determine which ones played the biggest role in the final decision by changing each variable and seeing if the model’s output changes. If the output does not change, we can assume that the variable in question has little influence on the model's decision.

Unfortunately, explainability techniques are not a foolproof method, and researchers have demonstrated that these explainability models can be tricked.²

There are ways to game the explainability methods so that certain variables are not listed as a major contributor to the model’s decision. This is important when considering variables that can cause bias in a model, such as gender or race. As it turns out, because other variables exist that are closely correlated to variables related to bias, explainability models can be coded so that only the “acceptable” variables are considered. Researchers demonstrated this by creating a black box model designed to help judges decide whether a defendant should be released on bail or not. They included race and gender as inputs for the model to use in making its decision. However, when building a decision tree to help explain how the model made its decisions, the researchers excluded race and gender from being included. Therefore, even though the model actually uses race and gender to inform its decision, the researchers were able to produce a more “acceptable” alternative explanation of how the model could get to the same output. Explainability models alone are not enough to evaluate the trustworthiness of an algorithm.

Verifiability

When developing blackbox algorithms for high-stakes decisions in healthcare, what types of precautions and best practices are necessary to ensure that the algorithm will not cause unintentional harm? We do not yet have established standards and best practices for verifying and testing a blackbox model’s analytical and clinical accuracy. This is especially important when it comes to model's that are proprietary. Since we do not know the underlying logic behind the model, we need reliable methods to verify the model's claims.

Challenges in implementation

When it comes to autonomous AI systems, which are systems that can operate without human intervention, the FDA currently has a regulatory pathway for locked algorithms only. However, other types of AI models include adaptive algorithms. These are algorithms that continuously learn and change with new information. They also come with unique regulatory challenges. Algorithms that continuously adapt may begin to produce, for a given set of inputs, an output that diverges from the initial output produced when first validated. This is known as concept drift. At this time, although the FDA is working to address this, we do not have a way to regulate autonomous adaptive AI/ML systems. This is do in part to the fact that we do not have an adequate way to measure or quantify the degree of algorithm divergence or to determine what significance to assign to specific thresholds of change. Without this capability, it limits our ability to safely apply non-locked (i.e. dynamic) algorithms to direct patient care.

Summary

We have seen guidance come from diverse stakeholder groups, organizations, and country-level regulatory bodies. Although we have have a ways to go, there is a collective mission to "get the use of AI 'right' in healthcare." Responsible use of AI and stewardship of AI development require ensuring that AI is intended to be deployed for the benefit of society as well as the individual, and includes a commitment to values such as fairness, equity, transparency, and trustworthiness. If these are kept top of mind, we can be optimistic that AI is headed in the right direction.

Consumer Technology Association. Technology & Standards Dept. February 2020. Definitions/Characteristics of Artificial Intelligence in Health Care (ANSI/CTA-2089.1).
Lakkaraju H, Bastani O. “How do I fool you?”: Manipulating User Trust via Misleading Black Box Explanations. arXiv [csAI]. Published online November 15, 2019. http://arxiv.org/abs/1911.06473

A Shared Language

A Shared Language

Christy & Whitley