Hello BRIGHT Run Family,
This was one of those brighter days in winter. I took time off for my daily stroll and was walking along Pier 7. I saw several Canada geese, mallards, and swans engaged in their daily affairs and enjoying their swim in the cold water, in groups.
To my surprise, I found a solitary Canada goose, not swimming, but walking alone on the ice. It seemed to me that it wanted to accomplish something in solitude using a different route from the common ones. It is an odd behaviour in comparison to the behaviour of the rest of the waterfowls present that day.
In the data world, such anomalies are often called outliers. The problem of detecting an outlier becomes easier when you can properly define the properties of non-outliers or inliers.
Many a times, you can pre-define the properties of inliers. However, there are times when you cannot define the properties exhaustively beforehand. Your data is the only source to guide you to define the properties; with the possibility of refining the properties if you obtain more data.
This refinement may happen with more knowledge on the context of the data. For e.g., rainfall is a common phenomenon in weather. However, there are places on the Earth where rainfall is rare. Rainfall could be a weather anomaly in those places only.
Does rarity imply anomaly, then? Not necessarily. If you can capture enough instances of the rare or less occurring phenomenon, you can explore the details and context, and can probably change the rules of inclusivity for the inliers. This is an iterative process – one that requires repeated adjustment and reassessment – but must be followed in order to expand the bounds of knowledge.
Also, rarity/less in proportion doesn’t imply less importance. We are working on segmenting/detecting lesions or tumours in medical images. However, lesions occupy a small part compared to the whole organ. For e.g., the breast tumours (when detected early) will occupy a small portion of the breasts, apart from the healthy tissue. Thus, an AI-system will need to learn from small amounts of rarity.
All those were my reflections, upon finding the solitary goose!
Best,
Ashirbani
Dr. Ashirbani Saha is the first holder of the BRIGHT Run Breast Cancer Learning Health System Chair, a permanent research position established by the BRIGHT Run in partnership with McMaster University.