A letter from Dr. Ashirbani Saha
Hello BRIGHT Run Family,
I hope you are enjoying a nice winter.
I’d like to share some research findings with you. The study (our recent research paper), on which I was a team member, found out that the good-quality publications regarding artificial intelligence (AI)’s usage on automatically analysing tissue sample images in breast cancer (or suspected with breast cancer) for diagnosing, classifying, predicting treatment outcomes, and prognosis do not use a common protocol of data collection and reporting and have a lot of variability among them.
This is a significant finding, derived with scientific rigour, in the ongoing effort to improve AI’s usefulness for patients with breast cancer, worldwide.
The study is related to my student, Dr. Ricardo Gonzalez’s (Supervisor: Dr. Cynthia Lokker from McMaster’s eHealth program) master’s thesis and publication which reviews the published research studies in using breast cancer-related histopathology images. These images are microscopic images of tissues in the breast, either having invasive disease or ductal carcinoma in-situ or other suspicious, but non-malignant, tumours.
The review was done to specifically evaluate the performance of AI-models (using machine learning), analysing histopathology images, that have undergone external validation (a marker of higher quality in an AI model development study) and been reported through a journal or a conference publication.
External validation, in the context of AI-model evaluation, means to test your model on a dataset that is remarkably different from the dataset you trained your model on due to one or more reasons. For example, if you train your model in hospital A-based data and test in hospital B-based data, that is an external validation as there could be differences in the ways they record their data.
The validation is, typically, tougher when hospitals A and B are under different hospital systems or in different provinces/countries. There could be more than one way of forming an external dataset, such as, forming your test dataset from years not included in the training data.
Ricardo’s search identified 2,011 relevant publications about the intersection of breast cancer and histopathology imaging analysis using machine learning (a subset of AI) for screening. However, only 10 (0.5%) of these studies, reported external validation for diagnosis, classification, prediction of treatment outcomes, and prognosis.
A fair comparison among these studies (that performed external validation) became difficult as there was variability of disease in patients (e.g., triple-negative only patients vs high-grade DCIS), image types (whole-slide and tissue microarrays) among other things. All these details are important to consider when you train your models for use in the future. The data-driven models of AI can produce different results if data gets inconsistent or differ in quality and nature.
Therefore, a recommendation was made by us in this publication to standardize methods and protocols for this type of studies and to make some validation datasets available for researchers.
You can download and read the full publication here: https://www.sciencedirect.com/science/article/pii/S2153353923001621
As I have highlighted in my previous letters to you, team is an extremely vital component of research. I am fortunate to have worked with this wonderful team.
I know you can relate to this as you often work in teams to support the yearly fundraising for BRIGHT Run. I want to thank all the teams and all individual fundraising efforts that add a special dimension to our event. It surely feels wonderful to meet the team and individual goals.
You all add BRIGHTness to the BRIGHT Run family!
Stay safe and warm!
Dr. Ashirbani Saha is the first holder of the BRIGHT Run Breast Cancer Learning Health System Chair, a permanent research position established by the BRIGHT Run in partnership with McMaster University.