skip to Main Content
A blue-green image of researchers' hands, operating a computer and handling test tubes, surmounted by a semi-opaque white box bearing the word "News"

Physicians’ Struggle To Trust AI Diagnoses

AI can be more accurate than physicians. Can physicians use its predictions to improve patient care?

By Clara Silvestri


In June of this year, 304 real medical cases published by the New England Journal of Medicine (NEJM) were adapted into simulated diagnosis challenges. In these challenges, a patient’s care journey is presented along with an interactive option to ask questions and order tests before deciding upon an official diagnosis. This is not an uncommon practice — physicians-in-training frequently use NEJM cases to hone their diagnosing skills. What made this specific challenge special, however, was that there were no physicians-in-training attempting to crack the cases.

In fact, there were no humans at all. This was Microsoft’s Artificial Intelligence Diagnostic Orchestrator (MAI-DxO), and it accurately diagnosed 85.5 percent of the 304 cases. When these same cases were given to 21 physicians, each with 5-20 years of clinical experience, only about 20 percent of the cases were accurately diagnosed.

If these numbers hold up across broader real-world conditions, they hint at a potential transformation in the field of medicine. What was once the sole domain of human expertise and intuition could increasingly be a collaborative space with AI. Before this can occur, however, we must investigate how AI’s potential collaborators — physicians, that is —build a strong-enough sense of trust in AI diagnosis systems.

“When investigating these systems, it may be more helpful to focus on evaluating the interaction between humans and AI systems rather than pitting the human clinician’s performance and the AI’s performance against each other,” says Dr. Joshua Hatherley, an AI ethicist at the University of Copenhagen.

person holding white and orange plastic bottle
Photo by Towfiqu barbhuiya on Unsplash

Early Efforts to Implement AI in Medicine

The rise of ChatGPT has skyrocketed debates over AI into international attention in recent years, but the use of AI in medicine is not a new concept. In 1970, Dr. Jack Myers at the University of Pittsburgh arrived at the grim recognition that the knowledge base of medicine had simply become too large for one single person to grasp it all. Even if a physician did contain all of this knowledge, their busy work lifestyle may cause them to occasionally miss a correct diagnosis.

To combat this issue, Myers sought the help of computer scientist Dr. Harry Pople. Pople then developed the first-ever AI medical tool using an algorithm based on Myers’ personal clinical diagnosis reasoning process, parsing patient symptoms to arrive at a diagnosis. Pople and Myers called this software INTERNIST-1, and though it was revolutionary in concept, it never quite took off. Physicians found the software too slow and difficult to use for everyday clinical settings.

The next major AI diagnosis tool came in 1986 with DXplain, a software which was also able to diagnose patients based on inputted symptoms. DXplain was able to diagnose up to 500 diseases, more than INTERNIST-1 could. However, at the time of paper medical records and workflows, DXplain still proved difficult to integrate and was never actually used in real-world clinical settings.

Since 1986, several advancements have occurred in the field of AI in healthcare. We can predict a protein’s 3D structure from its amino acid sequence, we can analyze MRIs and X-rays in seconds, and we can even assist surgeries using AI-powered robots. These technologies all aim to save lives and preserve health more efficiently, but they do not come without controversy and skepticism.

Who to Trust: AI or Doctors?

Not all forms of AI impact medicine in the same way; some operate quietly and without much direct health impact, such as organizing patient histories behind-the-scenes, while others inch closer to the doctor-patient relationship. When AI begins to move from the background to the bedside, assisting with the diagnosis process itself, that urges us to consider the historic role of trust in medicine.

The first conversation between a patient and their doctor, when symptoms are shared and questions are asked, is one of medicine’s most intimate encounters. This sacred ritual is imperative to the clinical decision-making that results in diagnosis and eventual treatment. The majority of Americans are not ready to see that ritual be encroached upon by computers. Sixty percent of U.S. adults would feel uncomfortable if their own health care provider relied on AI to diagnose diseases, according to a 2022 survey by Pew Research Center.

A 2023 study published in the Public Library of Science (PLOS) by Christopher Robertson backed this finding, with 52.9 percent of surveyed American adults preferring a human physician to an AI software. This same study tested participants’ willingness to switch to AI after being given more information about it. When the researchers explained that AI was more accurate than the human physician, little change occurred. However, when participants were given that same information by a doctor, they were more likely to choose AI.

Patients’ higher trust in doctors versus AI was supported by a 2024 study published in Nature Medicine in which three groups were given identical medical advice and told that it came from either a physician only, AI only, or a physician and AI together. The group that believed they were getting only a physician’s advice were more willing to follow this advice than the groups that received the exact same recommendations from AI. This Nature study and the PLOS study imply a potential paradox–if patients trust physicians more than AI, what will happen if physicians begin to encourage their patients to opt for AI-assisted diagnoses? For this scenario to even occur, physicians must have a baseline level of trust in AI diagnosis systems.

Simple stethoscope on a pastel blue background.
Photo by Etactics Inc on Unsplash

AI Suggests and Doctors Evaluate

An integral aspect of the ethics behind AI in diagnosis is something called “algorithmic transparency.” The idea is that physicians should be able to clearly understand how AI reached its diagnosis, just as they are able to ask their colleagues to explain their personal clinical diagnosis reasoning.

In practice, however, the explanations behind how AI works are often difficult to make sense of. AI diagnosis tools produce binary answers, simply stating whether or not a patient has a certain diagnosis. These tools do not reason logically in the way a human would, instead, they rely on statistical patterns from massive datasets. It may be a difficult ask for physicians to trust a system of reasoning they themselves are not capable of, but some are quickly finding a work-around.

Physicians integrating AI into their clinical practice have begun using AI medical diagnoses as a starting point, not as the end result. A 2025 study published in Nature Medicine by Dr. Luciana D’Adderio observed physicians at three major U.K. stroke hubs who adopted AI applications in stroke care, starting with diagnosis and ending with a treatment plan. Many of the observed physicians approached the integration of AI by starting with AI’s suggested diagnosis and then working backwards to assess its validity. This process allows physicians to maintain clinical authority while still benefitting from AI’s diagnosis capabilities.

“This is an indication that clinicians have learnt to trust AI–they are prepared to let AI ‘go first’ in providing the diagnostic answer,” says D’Adderio. “It also means that the clinicians’ trust will be reinforced over time, assuming AI outputs prove consistently reliable.”

Who is to Blame for Errors?

A fear surrounding AI in diagnosis is that AI outputs will not be consistently reliable. If an AI diagnosis is incorrect and the mistake is not caught by a physician, who is to blame? Under current U.S. medical malpractice law, courts judge only the physician’s conduct when assigning blame. The designers and manufacturers of AI hold no legal responsibility in the event of an AI-assisted diagnosis gone wrong.

“Initially there was a lot of mistrust,” says D’Adderio. “AI had not been clinically validated, and the AI vendor’s performance data did not replicate at the local hospital level. This reluctance was also due to the fact that, ultimately, it is the clinician who is responsible for any errors, not AI or the vendor.”

The current practice of placing full responsibility on the physician is an exception from other fields which involve automation and machinery. For instance, when a plane malfunctions, responsibility is shared among pilots and manufacturers. Hatherley argues that reform is needed to shift responsibility for failures of AI in medicine.

“The people we should be focusing on when it comes to attributing responsibility for harms that result from these systems are the designers and developers of these systems,” Hatherley says. “The actual object of trust is the designers of the systems themselves.” Right now, there are little guardrails to hold the manufacturers of AI systems accountable. Hatherley’s concern is that if we begin attributing human characteristics to AI diagnosis software, we run the risk of exaggerating its capabilities in medicine.

“We can’t trust the system in the same way we can trust a physician, where they have some kind of motivation or intentionality to behave in ways that align with what we trust them to do,” says Hatherley.

AI in Radiology

This lack of clear motivation within AI has the potential to cause confusion during an AI-assisted diagnosis process. A 2022 study published in Organization Science by Dr. Sarah Lebovitz followed radiologists in a major U.S. hospital where AI tools were being used. Radiologists in three different departments were instructed to analyze imaging to come up with their own diagnosis first, then use an AI tool to produce a diagnosis. More often than not, AI diverged from the radiologists’ initial diagnosis. The radiologists often became frustrated because they could not understand why or how AI had reached the conclusion it did.

In two of the three departments, the radiologists’ frustrations caused them to ignore AI suggestions entirely. They were unable to understand why AI had flagged certain areas of the imaging, especially in breast cancer screenings. The time pressure and emotional weight of giving a cancer diagnosis, was, for these radiologists, irreconcilable with an AI tool that diverged from their initial opinion so often. AI’s reasoning for its recognition of imaging patterns remained unclear, so the radiologists trusted their own clinical judgement more.

This resistance persists despite growing evidence that AI-assisted diagnosis actually has a higher success rate of cancer detection than human radiologists do. A 2023 Swedish study published in The Lancet showed that among 80,000 women undergoing mammograms, AI-assisted screening resulted in the identification of 20 percent incidences of more breast cancer than physician alone screening. Another study published in the Journal of the American Medical Association suggests that AI alone may perform better than AI-assisted diagnosis. This 2023 study showed a median diagnosis performance score of 92 percent, compared with the AI-assisted score of 76 percent and the physician alone score of 74 percent.

In the last of the three departments in Lebovitz’s study, radiologists found it worthwhile to interrogate AI diagnoses and reconcile them with their own initial diagnoses. Ultimately, they integrated AI into their diagnosis, accepting some suggestions and overruling others. Radiologists in this department reported feeling more confident with their final diagnosis when it was AI-assisted. The AI tool, though its reasoning was still difficult to understand, actually helped restore trust in the radiologists’ diagnoses. Radiologists who collaborated with AI focused less on trying to understand exactly how AI came to its decision and more on trying to use their expertise to validate or deny AI’s decision.

“Transparency is overrated,” says D’Adderio. “One cannot understand how AI comes to its decisions, but they can verify the validity of both inputs–patient datasets–and outputs–AI’s decision.”

In one especially interesting case, a radiologist from the group which opted to integrate AI was reluctant to write an official diagnosis without the “second opinion” of AI (during a period of technical issues). While physicians’ trust in AI diagnoses is at the forefront of the conversation, some argue that incorporating AI into diagnosis may have long-term effects on physicians’ trust in themselves. D’Adderio brings up the concern in her 2025 study that physicians may fall victim to “automation basis”–the tendency to over-rely on outputs from automated systems–which could negatively impact clinicians’ expertise in the long run.

Physicians Need to Learn Using AI

The debate over AI in medical diagnosis often begins with an assessment of the AI system’s diagnosis accuracy. There is growing evidence showing that this accuracy is, in many cases, significantly higher than physicians. However, accuracy alone is not enough to earn physicians’ trust. The struggle for physicians’ trust in AI depends on accountability, collaboration, and education about AI. If physicians continue to confidently use their own clinical judgement when working with AI, long-term trust in these systems may begin to become more widespread.

It is unlikely that AI will replace physicians any time in the near future. AI does, however, have the potential to reshape how diagnosis is performed. As AI systems continue to advance, the question will no longer be about accuracy, but the manner in which AI is incorporated into the physicians’ domain. Increased trust among physicians in AI diagnoses is more likely to result from a focus on improving the relational aspect between physicians and AI during diagnosis rather than simply showing an increased accuracy of AI compared to human physicians.

When asked what will be essential for clinicians to build an appropriate level of trust towards AI, D’Adderio responded, “It requires a structured education regime all clinicians need to undergo as part of their formal training. This learning needs to be constantly reinforced and updated as new understandings about AI surface.”