How AI is Getting Smarter: Combining Text and Images in 2025

Hello BRIGHT Run Family, 

Hope you had a nice start to 2025. As the year unfolds, we will gain new experiences and get enriched as human beings. 

Adding to the informtion we know about any object, event, or incident can potentially help us make better decisions. Suppose you want to buy a flower vase. A description of its feel, together with its look in the form of a photo, would help you have a better idea than just knowing its feel or look alone.  

The same is true for an Artifical Intellligence (AI) model that is designed to benefit from multiple sources of data (multimodal) instead of processing a single form of data. 

In several of my previous letters, I have discussed ChatGPT. ChatGPT takes text as input, processes it, and generates output as you need. ChatGPT is based on a ground-breaking AI architecture.  Architecture refers to the basic structure of an AI algorithm that takes inputs and processes them to produce an output called TRANSFORMERS (not from the planet Cyberton!).  

Transformers have revolutionized the performance of AI models. They can capture context and relevance of the information we  provide and we seek better than their predecessors using a mechanism called self-attention

Text is a type of data modality. Images (visuals) and audio are other types of modalities. Newer AI models with transformers can process multiple modalities of data.  

The transformer-based AI model that I am using in my research is called the vision-language model (VLM). VLMs process image (visuals) and text data together. They are helpful in applications such as image captioning (auto generation of captions of images such that visually impaired people may benefit), visual question answering (answering questions from visual content; helpful in education and customer support). 

As I provide you with research updates this year, we will explore some VLMs in action. 

Stay well. 

Best, 

Ashirbani 

Dr. Ashirbani Saha is the first holder of the BRIGHT Run Breast Cancer Learning Health System Chair, a permanent research position established by the BRIGHT Run in partnership with McMaster University.