Testifying before a US Senate Committee on February 8, a Stanford University health policy professor recommended that Congress require healthcare organizations “to have robust processes to determine whether planned uses of “AI tools meet certain standards, including conducting an ethical review.”
Michelle M. Mello, JD, Ph.D., also recommended that Congress fund a network of AI assurance laboratories “to develop consensus-based standards and ensure that lower-resourced healthcare organizations have access to the experience and infrastructure needed to evaluate AI tools. “
Mello, a professor of health policy in the Department of Health Policy at Stanford University School of Medicine and professor of Law at Stanford Law School, is also an affiliate professor at the Institute for Human-Centered Artificial Intelligence from Stanford. She is part of a group of ethicists, data scientists and doctors at Stanford University involved in regulating how AI tools are used in patient healthcare.
In his written testimony before the US Senate Finance Committee, Mello noted that while hospitals are beginning to recognize the need to vet AI tools before use, most healthcare organizations They do not yet have robust review processes, and he wrote that there is much that Congress could help.
He added that to be effective, governance cannot focus solely on the algorithm, but must also encompass how the algorithm is integrated into clinical workflow. “A key area of research is the expectations placed on doctors and nurses to evaluate whether AI results are accurate for a given patient, given the information they have on hand and the time they will actually have. For example, long language models such as ChatGPT are used to write summaries of clinical visits and notes from doctors and nurses, and to write responses to patient emails. The developers trust that doctors and nurses will carefully edit those drafts before submitting them, but will they? “Research on human-computer interactions shows that humans are prone to automation bias: we tend to rely too much on computerized decision support tools and fail to detect errors and intervene where we should.”
Therefore, regulation and governance must address not only the algorithm, but also how the organization adopting it will use and monitor it, he emphasized.
Mello said he believes the federal government should set standards for organizations’ readiness and responsibility for using AI tools for healthcare, as well as for the tools themselves. But given how quickly technology is changing, “regulation must be adaptable or else it will risk becoming irrelevant or, worse, chilling innovation without producing any offsetting benefits. “The smarter path now is for the federal government to foster a consensus-building process that brings together experts to create national consensus standards and processes to evaluate proposed uses of AI tools.”
Mello suggested that through their operation and certification processes for Medicare, Medicaid, the Veterans Affairs Health System and other health programs, Congress and federal agencies can require participating hospitals and clinics to have a process to review any artificial intelligence tools that affect patient care before deployment and a plan for subsequent follow-up.
As an analogy, he said, the Centers for Medicare and Medicaid Services uses the Joint Commission, an independent nonprofit organization, to inspect health care facilities to certify their compliance with Medicare Conditions of Participation. “The Joint Commission recently developed a voluntary certification standard for responsible use of health data that focuses on how patient data will be used to develop algorithms and carry out other projects. “A similar certification could be developed for the use of AI tools in facilities.”
The ongoing initiative to create a network of “AI assurance labs” and consensus-building collaborations, such as the 1,400-member Coalition for AI in Health, can be critical supports for these facilities, Mello said. These initiatives can develop consensus standards, provide technical resources, and perform certain assessments of AI models, such as bias assessments, for organizations that do not have the resources to do so themselves. Adequate funding will be crucial to its success, she added.
Mello described the review process at Stanford: “For each AI tool proposed for implementation at Stanford hospitals, data scientists evaluate the model for bias and clinical utility. Ethicists interview patients, clinical care providers, and AI tool developers to learn what matters and concerns them. We found that with just a small investment of effort, we can detect potential risks, mismatched expectations, and questionable assumptions that we and the AI designers hadn’t thought about. In some cases, our recommendations may stop deployment; in others, they strengthen deployment planning. We designed this process to be scalable and exportable to other organizations.”
Mello reminded senators not to forget about health insurers. As with healthcare organizations, real patient harm can occur when insurers use algorithms to make coverage decisions. “For example, members of Congress have expressed concern about Medicare Advantage plans’ use of an algorithm marketed by NaviHealth in prior authorization decisions for post-hospital care of older adults. In theory, human reviewers made the final decisions while simply taking the algorithm’s output into account; In reality, they had little discretion to override the algorithm. “This is another example of why human responses to model outputs (its incentives and constraints) deserve oversight,” he said.