Conference Presentation

Towards Transparency: A Quantitative Evaluation of Mammography AI False Negatives in a Prospective Multi-Site Clinical Deployment

RSNA 2022

Jiye G. Kim, Ryan Shnitman, Leeann Louis, Yun Boyer, William Lotter, Bryan Haslam

November 30, 2022


While there is growing evidence that AI shows promise in aiding radiologists with detecting breast cancer in screening mammography, radiologists are eager to know when AI might miss cancer in clinical practice. Here, we studied an FDA-cleared AI device deployed at 137 USA sites to better characterize when it detected cancer, and importantly, when it did not detect cancer in clinical use.

Materials and Method

AI results from more than 610,500 screening DBT mammography exams were analyzed. False negative exams were defined as screening exams which the AI did not flag as suspicious but were followed with malignant pathology. True positive exams were defined as those which the AI flagged as suspicious and were followed with malignant pathology. Clinical information for the false negative exams was compared with that of a sample of randomly selected true positive exams. The clinical information included family history of cancer, density, BIRADS scores, visibility of findings on mammograms, lesion type, whether or not the exam was read with prior exams, whether or not the screening mammogram was interpreted with other modality, cancer pathology and immunohistochemistry profile.


Across the dataset, a total of 2358 patients were diagnosed with breast cancer of which 2198 (93.2%) were flagged by the AI as suspicious (true positives) and 160 (6.8%) were not flagged as suspicious (false negatives). Compared to true positives, false negatives tended to be read more with priors (78.6% vs. 86.3%), were accompanied by lesions not visible on mammograms (2.0% vs. 11.9%) and were more likely to include asymmetry (12.5% vs. 39.0%). Other clinical factors were comparable across true positives and false negatives.


While the AI flagged the vast majority of cancer exams as suspicious, false negatives were more likely to be 1) read with priors indicating the unique role of radiologists in detecting changes over time, 2) not visible on mammograms (e.g., only visible on ultrasound) and 3) asymmetry lesions highlighting that some lesions may be more challenging to detect by the AI. These results were not unexpected given that the AI was not explicitly trained to 1) compare exams across time, 2) detect lesions on other imaging (e.g., ultrasound) or 3) compare lesions across lateralities. Understanding the strengths and weaknesses of AI can help radiologists interpret screening mammograms optimally by complementing AI.

Clinical Relevance

Transparency is essential for clinical adoption of AI. Large prospective data can help promote trust and how best to use the technology, so as to realize the potential of AI in improving patient care.