MiWord of the Day Is… Fourier Transform!

Ok, a what Transform now??

In the early 1800s, Jean-Baptiste Joseph Fourier, a French mathematician and physicist, introduced the transform in his study of heat transfer. The idea seemed preposterous to many mathematicians at the time, but it has now become an important cornerstone in mathematics.

So, what exactly is the Fourier Transform? The Fourier Transform is a mathematical transform that decomposes a function into its sine and cosine components. It decomposes a function depending on space or time into a function depending on spatial or temporal frequency.

Before diving into the mathematical intricacies of the Fourier Transform, it is important to understand the intuition and the key idea behind it. The main idea of the Fourier Transform can be explained simply using the metaphor of creating a milkshake.

Imagine you have a milkshake. It is hard to look at a milkshake and understand it directly; answering questions such as “What gives this shake its nutty flavour?” or “What is the sugar content of this shake?” are harder to answer when we are simply given the milkshake. Instead, it is easier to answer these questions by understanding the recipe and the individual ingredients that make up the shake. So, how exactly does the Fourier Transform fit in here? Given a milkshake, the Fourier Transform allows us to find its recipe to determine how it was created; it is able to present the individual ingredients and the proportions at which they were combined to make the shake. This brings up the questions of how does the Fourier transform determine the milkshake “recipe” and why would we even use this transform to get the “recipe”? To answer the former question, we are able to determine the recipe of the milkshake by running it through filters that then extract each individual ingredient that makes up the shake. The reason we use the Fourier Transform to get the “recipe” is that recipes of milkshakes are much easier to analyze, compare, and modify than working with the actual milkshake itself. We can create new milkshakes by analyzing and modifying the recipe of an existing milkshake. Finally, after deconstructing the milkshake into its recipe and ingredients and analyzing them, we can simply blend the ingredients back to get the milkshake.

Extending this metaphor to signals, the Fourier Transform essentially takes a signal and finds the recipe that made it. It provides a specific viewpoint: “What if any signal could be represented as the sum of simple sine waves?”.

By providing a method to decompose a function into its sine and cosine components, we can analyze the function more easily and create modifications as needed for the task at hand.

 A common application of the Fourier Transform is in sound editing. If sound waves can be separated into their “ingredients” (i.e., the base and treble frequencies), we can modify this sound depending on our requirements. We can boost the frequencies we care about while hiding the frequencies that cause disturbances in the original sound. Similarly, there are many other applications of the Fourier Transform such as image compression, communication, and image restoration.

This is incredible! An idea that the mathematics community was skeptical of, now has applications to a variety of real-world applications.

Now, for the fun part, using Fourier Transform in a sentence by the end of the day:

Example 1:

Koby: “This 1000 puzzle is insanely difficult. How are we ever going to end up with the final puzzle picture?”

Eng: “Don’t worry! We can think of the puzzle pieces as being created by taking the ‘Fourier transform’ of the puzzle picture. All we have to do now is take the ‘inverse Fourier Transform’ and then we should be done!”

Koby: “Now when you put it that way…. Let’s do it!”

Example 2: 

Grace: “Hey Rohan! What’s the difference between a first-year and fourth-year computer science student?

Rohan: “… what?”

Grace: “A Fouri-y-e-a-r Transform”

Rohan: “…. (╯°□°)╯︵ ┻━┻ ”

I’ll see you in the blogosphere…

Parinita Edke

The MiDATA Word of the Day is…”clyster”

Holy mother of pearl! Do you remember when the first Pokémon games came out on the Game Boy? Never heard of Pokémon? Get up to speed by watching this short video. Or even better! Try out one of the games in the series, and let me know how that goes!

The name of the Pokémon in this picture is Cloyster. You may remember it from Pokémon Red or Blue. But! Cloyster, in fact, has nothing to do with clysters.

In olden days, clyster meant a bunch of persons, animals or things gathered in a close body. Now, it is better known as a cluster.

You yourself must identify with at least one group of people. What makes you human; your roles, qualities, or actions make you unique. But at the same time, you fall into a group of others with the same characteristics.

You yourself fall into multiple groups (or clusters). This could be your friend circle or perhaps people you connect with on a particular topic. At the end of the day, you belong to these groups. But is there a way we can determine that you, in fact, belong?

Take for example Jack and Rose from the Titanic. Did Jack and Rose belong together?

If you take a look at the plot to the right, Jack and Rose clearly do not belong together. They belong to two separate groups (clusters) of people. Thus, they do not belong together. Case closed!

But perhaps it is a matter of perspective? Let’s take a step back…

Woah! Now, you could now say that they’re close enough, they might as well be together! Compared to the largest group, they are more similar than they are different. And so, they should be together!

For the last time, we may have been looking at this completely wrong! From the very beginning, what are we measuring on the x-axis and on the y-axis of our graph?

Say it was muscle mass and height. That alone shouldn’t tell us if Rose and Jack belong together! And yet, that is exactly what we could have done. But if not those, then what..?

Now for the fun part (see the rules here), using clyster in a sentence by the end of the day:

Serious: Did you see the huge star clysters last night? I heard each one contained anywhere from 10,000 to several million stars…

Less serious: *At a seafood restaurant by the beach* Excuse me, waiter! I’d like one of your freshest clysters, please. – “I’m sorry. We’re all out!”

…I’ll see you in the blogosphere.

Stanley Hua

Stanley Hua in ROP299: Joining the Tyrrell Lab during a Pandemic

My name is Stanley Hua, and I’ve just finished my 2nd year in the bioinformatics program. I have also just wrapped up my ROP299 with Professor Pascal. Though I have yet to see his face outside of my monitor screen, I cannot begin to express how grateful I am for the time I’ve been spending at the lab. I remember very clearly the first question he asked me during my interview: “Why should I even listen to you?” Frankly, I had no good answer, and I thought that the meeting didn’t go as well as I’d hoped. Nevertheless, he gave me a chance, and everything began from there.

Initially, I got involved with quality assessment of Multiple Sclerosis and Vasculitis 3D MRI images along with Jason and Amar. Here, I got introduced to the many things Dmitrii can complain about taking brain MRI images. Things such as scanner bias, artifacts, types of imaging modalities and prevalence of disease play a role in how we can leverage these medical images in training predictive models.

My actual ROP, however, revolved around a niche topic in Mauro and Amar’s project. Their project sought to understand the effect of dataset heterogeneity in training Convolutional Neural Networks (CNN) by cluster analysis of CNN-extracted image features. Upon extraction of image features using a trained CNN, we end up with high-dimensional vectors representing each image. As a preprocessing step, the dimensionality of the features is reduced by transformation via Principal Component Analysis, then selecting a number of principal components (PC) to keep (e.g. 10 PCs). The question must then be asked: How many principal components should we use in their methodology? Though it’s a very simple question, I took way too many detours to answer this question. I looked at the difference between standardization vs. no standardization before PCA, nonlinear dimensionality reduction techniques (e.g. autoencoder) and comparisons of neural network image representation (via SVCCA) among other things. Finally, I proposed an equally simple method for determining the number of PCs to use in this context, which is the minimum number of PCs that gives the most frequent resulting value (from the original methodology).

Regardless of the difficulty of the question I sought to answer, I learned more about practices in research, and I even learned about how research and industry intermingle. I only have Professor Pascal to thank for always explaining things in a way that a dummy such as me would understand. Moreover, Professor Pascal always focused on impact; is what you’re doing meaningful and what are its applications?

 I believe that the time I spent with the lab has been worthwhile. It was also here that I discovered that my passion to pursue data science trumps my passion to pursue medical school (big thanks to Jason, Indranil and Amar for breaking my dreams). Currently, I look towards a future, where I can drive impact with data; maybe even in the field of personalized medicine or computational biology. Whoever is reading this, feel free to reach out! Hopefully, I’ll be the next Elon Musk by then…

Transiently signing out,

Stanley Bryan Z. Hua

Jessica Xu’s Journey in ROP299

Hello everyone! My name is Jessica Xu, and I’ve just completed my second year in Biochemistry and Statistics at the University of Toronto. This past school year, I’ve had the wonderful opportunity to do a ROP299 project with Dr. Pascal Tyrrell and I’d like to share my experience with you all!

A bit about myself first: in high school, I was always interested in life sciences. My favourite courses were biology and chemistry, and I was certain that I would go to medical school and become a doctor. But when I took my first stats course in first year, I really enjoyed it and I started to become interested in the role of statistics in life sciences. Thus, at the end of my first year, while I was looking through the various ROP courses, I felt that Dr. Tyrrell’s lab was the perfect opportunity to explore my budding interest in this area. I was very fortunate to have an interview with Dr. Tyrrell, and even more fortunate to be offered a position in his lab!

Though it may be obvious, doing a research project when you have no research experience is very challenging! Coming into this lab having taken a statistics course and a few computer science courses in first year, I felt I had a pretty good amount of background knowledge. But as I joined my first lab meeting, I realized I couldn’t be more wrong! Almost every other word being said was a word I’d never heard of before! And so, I realized that there was a lot I needed to learn before I could even begin my project.

I then began on the journey of my project, which was looking at how two dimension reduction techniques, LASSO and SES, performed in an ill-posed problem. It was definitely no easy task! While I had learned a little bit about dimension reduction in my statistics class, I still had a lot to learn about the specific techniques, their applications in medical imaging, and ill-posed problems. I was also very inexperienced in coding, and had to learn a lot of R on my own, and become familiar with the different packages that I would have to use. It was a very tumultuous journey, and I spent a lot of time just trying to get my code to work. Luckily, with help from Amar, I was able to figure out some of the errors and issues I was facing in regards to the code.

I learned a lot about statistics and dimension reduction in this ROP, more than I have learned in any other courses! But most importantly, I had learned a lot about the scientific process and the experience of writing a research paper. If I can provide any advice based on my experience, it’s that sometimes it’s okay to feel lost! It’s not expected of you to have devised a perfect plan of execution for your research, especially when it’s your first time! There will be times that you’ll stray off course (as I often did), but the most valuable lesson that I learned in this ROP is how to get back on track. Sometimes you just need to take a step back, go back to the beginning and think about the purpose of your project and what it is you’re trying to tell people. But it’s not always as easy to realize this. Luckily Dr. Tyrrell has always been there to guide us throughout our projects and to make sure we stay on track by reminding us of the goal of our research. I’m incredibly grateful for all the support, guidance, and time that Dr. Tyrrell has given this past year. It has been an absolute pleasure of having the experience of working in this lab.

Now that I’ve taken my first step into the world of research, with all the new skills and lessons I’ve learned in my ROP, I look forward to all the opportunities and the journey ahead!

Jessica Xu

Today’s MiWORD of the day is… Lasso!

Wait… Lasso? Isn’t a lasso that lariat or loop-like rope that cowboys use? Or perhaps you may be thinking about that tool in Photoshop that’s used for selecting free-form segments!

Well… technically neither is wrong! However, in statistics and machine learning, Lasso stands for something completely different: least absolute shrinkage and selection operator. This term was coined by Dr. Robert Tibshirani in 1996 (who was a UofT professor at that time!).

Okay… that’s cool and all, but what the heck does that actually mean? And what does it do?

Lasso is a type of regression analysis method, meaning it tries to estimate the relationship between predictor variables and outcomes. It’s typically used to perform feature selection or regularization.

Regularization is a way of reducing overfitting of a model, ie. it removes some of the “noise” and randomness of the data. On the other hand, feature selection is a form of dimension reduction. Out of all the predictor variables in a dataset, it will select the few that contribute the most to the outcome variable to include in a predictive model.

Lasso works by applying a fixed upper bound to the sum of absolute values of the coefficient of the predictors in a model. To ensure that this sum is within the upper bound, the algorithm will shrink some of the coefficients, particularly it shrinks the coefficients of predictors that are less important to the outcome. The predictors whose coefficients are shrunk to zero are not included at all in the final predictive model.

Lasso has applications in a variety of different fields! It’s used in finance, economics, physics, mathematics, and if you haven’t guessed already… medical imaging! As the state-of-the-art feature selection technique, Lasso is used a lot in turning large radiomic datasets into easily interpretable predictive models that help researchers study, treat, and diagnose diseases.

Now onto the fun part, using Lasso in a sentence by the end of the day! (see rules here)

Serious: This predictive model I got using Lasso has amazing accuracy for detecting the presence of a tumour!

Less serious: I went to my professor’s office hours for some help on how to use Lasso, but out of nowhere he pulled out a rope!

See you in the blogosphere!

Jessica Xu

Jacky Wang’s ROP399 Journey

My name is Jacky Wang, and I am just finishing my third year at the University of Toronto, pursuing a computer science specialist. Looking back on this challenging but incredible year, I was honoured to have the opportunity to work inside Dr. Tyrrell’s lab as part of the ROP399 course. I would love to share my experience studying and working inside the lab.

Looking back, I realize one of the most challenging tasks is getting onboard. I felt a little lost at first when surrounded by loads of new information and technologies that I had little experience with before. Though feeling excited by all the collision of ideas during each meeting, having too many choices sometimes could be overwhelming. Luckily after doing more literature review and with the help of the brilliant researchers in the lab (a big thank you to Mauro, Dimitri, and of course, Dr. Tyrrell), I start to get a better view of the trajectories of each potential project and further determine what to get out from this experience. I did not choose the machine learning projects, though they were looking shiny and promising as always (as a matter of fact, they turned out to be successful indeed). Instead, I was more leaning towards studying the sample size determination methodology, especially the concept of ill-posed problems, which often occur when the researchers make conclusions from models trained on limited samples. It had always been a mystery why I would get different and even contrasting results when replicating someone else’s work on smaller sample sizes. From there, I settled the research topic and moved onto the implementation details.

This year the ROP students are coming from statistics, computer science and biology etc. I am grateful that Dr. Tyrrell is willing to give anyone who has the determination to study in his lab a chance though they may have little research experience and come from various backgrounds. As someone who studies computer science with a limited statistics background, the real challenge lies in understanding all the statistical concepts and designing the experiments. We decided to apply various dimension reduction techniques to study the effect of different sample sizes with many features. I designed experiments around the principal component analysis (PCA) technique while another ROP student Jessica explored the lasso and SES model in the meantime. It was for sure a long and memorable experience with many debugging when implementing the code from scratch. But it was never more rewarding than seeing the successful completion of the code and the promising results.

I feel lucky and grateful that Dr. Tyrell helped me complete my first research project. He broke down the long and challenging research task into clear and achievable subgoals within our reach. After completing each subgoal, I could not even believe it sent us close to the finished line. It felt so different taking an ROP course than attending the regular lessons. For most university courses, most topics are already determined, and the materials are almost spoon-fed to you. But sometimes, I start to lose the excitement of learning new topics, as I am not driven by the curiosity nor the application needs but the pressure of being tested. However, taking the ROP course gives me almost complete control of my study. For ROP, I was the one who decides what topics to explore, how to design the experiment. I could immediately test my understanding and put everything I learned into real applications.

I am so proud of all the skills that I have picked up in the online lab during this unique but special ROP experience. I would like to thank Dr. Tyrrell for giving me this incredible study experience in his lab. There are so many resources out there to reach and so many excellent researchers to seek help from. I would also like to thank all members of the lab for patiently walking me through each challenge with their brilliant insights.

Jacky Wang

MiWord of the Day Is… dimensionality reduction!

Guess what?

You are looking at a real person, not a painting! This is one of the great works by a talented artist Alexa Meade, who paints on 3D objects but creates a 2D painting illusion. Similarly in the world of statistics and machine learning, dimensionality reduction means what it sounds like: reduce the problem to a lower dimension. But only this time, not an illusion.

Imagine a 1x1x1 data point living inside a 2x2x2 feature space. If I ask you to calculate the data density, you will get ½ for 1D, ¼ for 2D and 1/8 for 3D. This simple example illustrates that the data points become sparser in higher dimensional feature space. To address this problem, we need some dimensional reduction tools to eliminate the boring dimensions (dimensions that do not give much information on the characteristics of the data).

There are mainly two approaches when it comes to dimension reduction. One is to select a subset of features (feature selection), the other is to construct some new features to describe the data in fewer dimensions (feature extraction).

Let us consider an example to illustrate the difference. Suppose you are asked to come up features to predict the university acceptance rate of your local high school.

You may discard the “grade in middle school” for its many missing values; discard “date of birth” and “student name” as they are not playing much role in applying university; discard “weight > 50kg” as everyone has the same value; discard “grade in GPA” as it can be calculated. If you have been through a similar process, congratulations! You just performed a dimension reduction by feature selection.

What you have done is removing the features with many missing values, the least correlated features, the features with low variance and one of the highly correlated. The idea behind feature selection is that the data might contain some redundant or irrelevant features and can be removed without losing too much loss information.

Now, instead of selecting a subset of features, you might try to construct some new features from the old ones. For example, you might create a new feature named “school grade” based on the full history of the academic features. If you have been through a thought process like this, you just performed a dimensional reduction by feature extraction

If you would like to do a linear combination, principal component analysis (PCA) is the tool for you. In PCA, variables are linearly combined into a new set of variables, known as the principal components. One way to do so is to give a weighted linear combination of “grade in score”, “grade in middle school” and “recommend letter” …

Now let us use “dimensionality reduction” in a sentence.

Serious: There are too many features in this dataset, and the testing accuracy seems too low. Let us apply dimensional reduction techniques to reduce overfit of our model…

Less serious:

Mom: “How was your trip to Tokyo?”

Me: “Great! Let me just send you a dimensionality reduction version of Tokyo.”

Mom: “A what Tokyo?”

Me: “Well, I mean … photos of Tokyo.”

I’ll see you in the blogosphere…

Jacky Wang

MiWORD of the Day Is… Radiomics FM: Broadcasting the Hidden Stories in Medical Images

At first glance, radiomics sounds like the name of a futuristic radio station:
“Welcome back to Radiomics FM, where all your favorite tumors are top hits!”

But no, radiomics isn’t about DJs, airwaves, or tuning into late-night medical jams. Instead, it’s about something even cooler: finding hidden patterns buried deep inside medical images and letting ML models “listen” to what those patterns are trying to say.

Imagine staring at a blurry shadow on the wall. Is it a cat? A chair? A really bad haircut?

Medical images, like CT scans, MRIs, and ultrasounds, can feel just as mysterious to the naked eye. They’re full of shapes, textures, and intensity patterns that look like a mess… until you start digging deeper.

That’s where radiomics comes in. Radiomics acts like a detective with a magnifying glass, picking out tiny, subtle clues inside the fuzziness. It systematically extracts hundreds, sometimes even thousands, of quantitative features from images, including:

  • Texture features (like entropy, smoothness, or roughness)
  • Shape descriptors (capturing the size, compactness, or irregularity of objects)
  • First-order intensity statistics (how bright or dark different regions are)
  • Higher-order patterns (relationships between pixel groups, like GLCM and GLRLM matrices)

Each of these features gets transformed into structured data, powerful numbers that machine learning models can analyze to predict clinical outcomes. Instead of relying only on human interpretation, radiomics opens a new window into understanding:

  • Will the tumor grow fast or stay slow?
  • Will the patient respond well to a certain treatment?
  • Could we detect early signs of disease long before symptoms appear?

Fun Fact: Radiomics can spot differences so subtle that even expert radiologists can’t always detect them. It’s like giving X-ray vision… to an already X-rayed image. By turning complex images into rich datasets, radiomics is revolutionizing how we approach personalized medicine. It allows researchers to build predictive models, identify biomarkers, and move toward earlier, more accurate diagnoses without the need for additional invasive biopsies or surgeries.

Radiomics reminds us that in science, and in life, what we see isn’t always the full truth. Sometimes, it’s the quiet, hidden patterns that matter most. So next time you see a grayscale ultrasound or a mysterious CT scan, remember: Behind those shadows, there’s a secret world of patterns and numbers just waiting to be uncovered.

Now, try using radiomics in a sentence by the end of the day!

Serious: “Radiomics enables earlier detection of subtle tumor changes that are invisible to the human eye.”

Not so serious: “I’m using radiomics to decode my friend’s emotions, because reading faces is harder than reading scans.”

See you next time in the blogosphere, and don’t forget to tune out Radiomics FM!

Phoebe (Shih-Hsin) Chuang

Phoebe (Shih-Hsin) Chuang’s ROP299 Journey 

Hi everyone! My name is Phoebe (Shih-Hsin) Chuang, and I’m a third-year Computer Science Specialist student with a minor in Statistics and a focus in Artificial Intelligence. This year, I had the opportunity to work on my first formal research project involving machine learning in the field of medical imaging. Although the experience was often stressful and full of challenges, it has definitely been one of the most meaningful and transformative learning experiences of my undergraduate academic journey so far.

Before starting this ROP, I had no prior experience in either machine learning or medical imaging. Choosing a research topic initially felt overwhelming. Formulating a good research question required a deep understanding of the current state of the field, so I spent a great deal of time reading papers to grasp major trends such as image generation, multimodal learning, image segmentation, and classification tasks. Eventually, I decided to focus on adnexal mass classification using ultrasound images from the lab.

A major challenge for this project was the small dataset size compared to those typically used in current literature. Recognizing this limitation, I explored approaches specifically designed for small data scenarios. I found that radiomics was particularly promising, especially given that deep learning models typically require large datasets to generalize well. To make my approach more nuanced, I chose not just to use extracted radiomics features in numeric form, but to generate radiomic feature maps. This allowed me to integrate them directly into convolutional neural networks, leveraging CNNs’ strengths in learning from images.

Although this may appear minor, aside from selecting the research topic and technical exploration, one of the biggest lessons I learned was the importance of keeping my code, folders, and documentation organized. Without a clear structure from the beginning, it became very easy to get lost, especially when I paused work for a few days. If I could redo the project, I would definitely prioritize setting up a consistent, organized structure early on to save a lot of confusion and debugging time later.

Looking back, I am deeply grateful to Dr. Tyrrell for offering me this invaluable research opportunity. Through weekly meetings, Dr. Tyrrell emphasized that the primary goal of this experience was not simply achieving great results, but learning the full research process, from identifying gaps in knowledge to formulating research questions and hypotheses, designing experiments, and performing rigorous statistical analyses (since this was a statistics department course!). I would also like to sincerely thank Noushin, our postdoc, whose insightful feedback and support helped me greatly in refining my research questions and overcoming challenges during implementation. Finally, I want to thank everyone else in the lab for their encouragement, shared experiences, and thoughtful suggestions during meetings. It was both inspiring and motivating to see everyone’s projects evolve alongside mine.

This ROP journey has definitely been a steep but rewarding learning curve. It has brought me one step closer to becoming an independent researcher, and I look forward to carrying the skills, mindset, and resilience I built this year into my future research and career endeavours.

Xin Lei’s Personal Reflection

Hi! I’m Xin Lei! I was a second-year Computer Science Specialist and Molecular Genetics major student when I began my ROP with Professor Tyrrell.

My project focused on developing a framework that uses Latent Diffusion Models (LDMs) to generate high-fidelity gastrointestinal (GI) medical images from segmentation masks. 

I trained a two-stage pipeline: first, a VQ-GAN model to encode the structure of unlabeled GI images into a latent space and then conditioned a Latent Diffusion Model on segmentation masks to generate corresponding realistic GI tract images. To enhance anatomical diversity, I also designed a novel mask interpolation pipeline to create intermediate anatomical configurations, encouraging the generation of diverse and realistic segmentation-image pairs. It was challenging to tackle the challenge of synthesizing new, varied, and coherent medical images for segmentation tasks, and to push beyond the limitations of existing inpainting and stitching-based generation methods.

Overall, it was a lot of paper reading, GitHub repositories visited, and overnight coding session, all of which would have been impossible without Professor Tyrrell’s continual support and advice! My biggest mistake was not spending enough time reading about the best current methods for solving my problem of interest. Indeed, countless hours would have been saved, if I had found the right repositories and research papers earlier, where others had already implemented parts of the ideas I was trying to build!

Reflecting on my ROP journey, the most difficult part was avoiding the endless rabbit holes of technical optimizations. I would often find myself spending days obsessing over marginal model improvements, investigating every possible architectural tweak or hyperparameter adjustment I could think of. While these deep dives were fun and intellectually stimulating, they were dangerous because no project could ever be delivered on time if perfection was the only goal.

I owe a huge thanks to Professor Tyrrell, who repeatedly pulled me back out of these tangents and helped me refocus on moving the project forward. His guidance taught me one of the most valuable lessons of research: perfect is the enemy of good. A deliverable, working project is far more valuable than an imaginary, flawless one stuck in perpetual revision.

In the end, I am proud of what I accomplished, not just technically, but also in learning how to think more strategically about research. This experience has cemented my excitement about applying AI to real-world medical problems, and I am deeply grateful to Professor Tyrrell and the MiDATA lab for giving me this incredible opportunity.

I can’t wait to see where this journey will take me next!

Xin Lei Lin

MiWord of the Day is… Diffusion!

OK, what comes to mind when you hear the word diffusion? Perfume spreading through a room? A drop of ink swirling into a glass of water? When I first heard the terms “diffusion model”, I thought of my humidifier, chaotically diffusing water droplets in my room.

But today, diffusion has taken on a very new meaning in the world of medical imaging!

You’ve probably heard a lot about GPT recently, models that can generate almost anything: stories, poems, even computer code. But did you know that alongside GPT for text, there are other types of models that generate images, like beautiful paintings, photorealistic pictures… and yes, even medical images?

This is where the “diffusion” in diffusion models comes in! Just like my humidifier slowly releases tiny water droplets into the air, diffusion models spread random noise across an image and then cleverly gather it back together to form something meaningful! In my case, instead of a cat jumping because they saw a cucumber, I generate gastrointestinal tract images from their segmentation masks! (Yes, I agree with you, I am cooler)

But what are segmentation masks?

Elementary, my dear Watson! Segmentation masks are like a topological map, showing the exact locations that Sherlock Holmes (in this case, the radiologist) would search for hidden clues, such as tumors, organs, vessels, to uncover cancerous Moriarty’s next plan. Super important when doctors need to know exactly where to operate or how a disease is spreading.

Until recently, generating these masks required lots of manual work from radiologists, or tons of carefully labeled data. But now?

By training diffusion models properly, we can synthesize realistic segmentation masks, even when data is limited. That means more diverse, more accurate, and more creative ways to augment medical datasets for training better AI models.

It’s like equipping our medical research toolbox with a team of colorful GPUs, each one working like a tireless laboratory assistant, swiftly and precisely creating endoscopy images at the click of a button, generating in moments what used to take hours of painstaking effort. This lets you breathe easy, knowing that your next endoscopy won’t need to be fed into an AI model, thus sparing patient privacy and giving medical professionals more time to focus on what truly matters!

Thank you for reading, and I’ll see you in the blogosphere!

Xin Lei Lin

Nathan Liu’s STA299 Journey

Hi everyone! My name is Nathan Liu, and I am currently a second-year student at the University of Toronto, specializing in Statistics. From May to August 2025, I had the privilege of conducting an independent research project under the supervision of Dr. Pascal Tyrell. I am deeply grateful for his guidance throughout this journey. This was my first time having an independent research experience in data science, and it proved to be both challenging and rewarding. I would love to share some of the lessons I learned during this summer.

At the core of my project, I focused on the problem of automated grading of knee osteoarthritis (KOA) using deep learning. While recent work has shown promising results, the classification of Kellgren–Lawrence grade 2 (KL2) remains particularly unreliable. My study explored how self-supervised learning (SSL), specifically SimCLR embeddings, could be used to relabel ambiguous KL2 cases and improve classification performance. I designed four experimental pipelines: a baseline, a hard relabeling approach, a confidence-based relabeling approach, and a weighted loss strategy. Along the way, I incorporated quantitative evaluations such as bootstrap confidence intervals and McNemar’s test to assess improvements in KL2 reliability.

Before joining this project, I was already interested in the medical applications of machine learning, but I had never worked directly with this kind of research. I still remember my first lab meeting: Dr. Tyrell introduced a wide range of ongoing projects on different diseases, and I felt both excited and overwhelmed by the amount of new information. He warned us that the beginning would be the most difficult stage, but I underestimated just how challenging it would be. As I started exploring public databases, I quickly realized that many were incomplete, with missing labels and ambiguous annotations. This left me uncertain about how to begin. At this stage, I am thankful for the help I received from Noushin and Dr. Tyrell, as well as advice from a previous student in the lab. Their input helped me realize that I needed to commit to working with my own chosen dataset and design a study that I could take full ownership of.

During the research process, I encountered multiple challenges. The KL grading system itself is inherently noisy, and KL2 is especially difficult to identify consistently. On top of that, my dataset was imbalanced, which made model training unstable. Technically, training SimCLR models was not straightforward—convergence was slow, embeddings were difficult to interpret, and results were often not what I expected. Under Dr. Tyrell’s guidance, I learned to compare different baseline models, and switching from ResNet to EfficientNet immediately improved performance. He also encouraged me to experiment with visualization approaches beyond clustering, which eventually led me to explore spatial distance methods for relabeling KL2 cases. Noushin provided very practical advice on tuning SimCLR hyperparameters to maximize feature learning, which was critical to stabilizing my experiments. Throughout this process, I gained a new appreciation for how problem-solving in research often requires a mix of independent exploration, peer support, and careful reading of the literature.

Looking back, I am especially grateful for the structure of weekly lab meetings. They pushed me to stay disciplined, improve my efficiency, and keep refining my research plan. Just as importantly, they gave me the chance to see how other students tackled projects in different medical domains. I was struck by how many of us faced similar problems—unstable models, imperfect data, unexpected results—and it was reassuring to realize I was not alone. Watching others troubleshoot their difficulties often gave me ideas for my own work.

Overall, this project taught me valuable lessons both technically and personally. On the technical side, I became much more comfortable with self-supervised learning, parameter tuning, and methods for quantifying and visualizing results. On the personal side, I developed patience, resilience, and the ability to adapt when experiments did not go as planned. I also improved my academic writing skills and learned how to present my findings in a structured and convincing way. Most importantly, I am thankful to Dr. Tyrell for his constructive advice whenever I felt uncertain, and to Noushin for patiently answering many of my technical questions—even the simplest ones. I also want to thank my peers and all the lab members for their support, encouragement, and good company. This experience has not only strengthened my skills but has also made me more confident about pursuing research in medical imaging and machine learning in the future.

MiWORD of the Day is… McNemar Test!

Remember that famous Spider-Man meme where two Spider-Men are pointing at each other, yelling “You’re me!”? That’s basically the spirit of the McNemar Test. It’s a statistical tool that checks whether the same group of people changes their answers under two different conditions.

Think of it like this: yesterday everyone swore bubble tea was the best, but today half of them suddenly insist black coffee is the only way to survive finals. The McNemar Test is the referee here—it counts how many people actually flipped sides and asks, “Okay, is this change big enough to matter, or is it just random mood swings?”

The McNemar Test works on paired data. The total numbers don’t matter as much as the people who changed their minds.

People who said “yes” before and still say “yes” after → not interesting.

People who said “no” before and still say “no” after → also not interesting.

The stars of the show? Those who said “yes” before and “no” after, and those who said “no” before and “yes” after. The test compares these two groups. If the difference between them is large, it means the change is real, not just random noise.

In clinical research this is super important. Suppose a study tests whether a new drug actually helps with a disease. A total of 314 patients are observed both before and after treatment. Here’s the data:

Here’s what’s going on: 101 stayed sick before and after. 33 stayed healthy before and after.

121 improved (from sick → healthy). 59 worsened (from healthy → sick).

Now, McNemar steps in with this formula:

That comes out to 21.35, which is way too extreme to happen by chance (p < 0.001). Translation: the drug worked—the number of patients who got better is significantly higher than those who got worse.

In medicine (or in evaluating machine learning models), it’s not enough to just report an overall accuracy. What really matters is whether the changes—improvements or mistakes—are meaningful and consistent. The McNemar Test is a simple way to check if those differences are statistically real.

Now let’s use McNemar Test in a sentence.

Serious: In a clinical trial, the McNemar Test showed that significantly more patients improved after treatment than worsened, proving the drug’s effectiveness.

Less Serious: Yesterday my friend swore pizza was the best food on earth. Today she switched to sushi. According to McNemar, this isn’t just random—it’s a statistically significant betrayal.

See you in the blogosphere!

Nathan Liu

MiWORD of the Day is… Blur!

Have you ever tried to take a perfect vacation photo in Toronto, only to find your friend’s face is a mysterious smudge and the CN Tower looks like it’s melting? Blur has a way of sneaking into our lives, and it is everywhere. Sometimes it is more fascinating than you might think.

The smudge you see in your photo is blur. Blur has existed since the first camera was invented because film or sensors need time to gather light. If either the subject or the camera moves during this exposure time, the image appears blurred. In our discussion, we will focus on motion blur caused by fast movement, rather than unrelated effects like pixelation or mosaic artifacts. You might have experienced motion blur when taking a shaky phone photo, wearing foggy glasses, or watching a baseball fly past at incredible speed. But blur is not always a flaw.

In the world of art, blur has often been a feature rather than a mistake. Think of Claude Monet’s Water Lilies (link, copyright by The MET — highly recommend seeing it in person and viewing it from different distances): soft edges, blended colors, shapes shimmering in the light. Or consider long-exposure photographs of city traffic, where headlights stretch into glowing ribbons. In these cases, blur captures motion, mood, and mystery, transforming the ordinary into something extraordinary. Even in classic cinema, motion blur helps create a sense of speed or dreamlike atmosphere. In sports, blur can tell an entire story. The fastest recorded baseball pitch reaches 105.8 miles per hour, far too fast for the human eye to follow clearly. To freeze it, cameras must shoot at over 1,000 frames per second. A racecar streaking past the finish line or a sprinter in motion may appear as streaks of color, yet our brains still understand exactly what is happening. Motion blur, in these cases, is not a mistake; it is evidence of speed and energy.

In science, blur can reveal a very different kind of truth. Consider echocardiography, an ultrasound imaging method for the heart. These moving pictures help doctors assess heart function, blood flow, and valve performance. Yet even the tiniest shake of the probe, a restless patient, or the natural motion of the heartbeat can smear crucial details. There is even a trade-off between frame rate and depth of view: a typical knee ultrasound operates at around 20 frames per second, while heart ultrasound often reaches about 50 frames per second. A blurry heart chamber is more than an inconvenience; it can obscure the clues doctors need to make the right decision. Other imaging fields, such as X-ray or MRI, face similar challenges with motion blur. Interestingly, scientists also study the patterns of blur to improve image quality, since sometimes the “smudge” itself contains useful information about movement or structure.

Blur can be playful, expressive, and at times essential. It reminds us that seeing clearly is not always straightforward and that what appears imperfect can still hold meaning. From the sweep of a painter’s brush to the rhythm of a beating heart on a screen, blur reflects a world that is always moving and changing. Sometimes, beauty and truth live within that very imperfection.

Now for the fun part — using blur in a sentence by the end of the day:
Serious: Did you notice the blur in the long-exposure shot of the city at night? The headlights look like flowing rivers of light.
Less serious: While running to catch the bus, I accidentally created a blur of people in my phone photo. What a perfect accidental art piece.

…I’ll see you in the blogosphere.

Qifan Yang

Qifan Yang’s Personal Reflection

My name is Qifan Yang, and I am an incoming third-year student with Statistics Major and Mathematical Applications in Finance and Economics Specialist at the University of Toronto. This past summer, I had the opportunity to work on an ROP299 research project with Professor Tyrrell, and I would like to share my four-month journey in research, a completely new experience for me.

When I started, I was a complete novice in medical imaging and unfamiliar with the full process of scientific research. Before our first meeting, I felt quite nervous. I still remember Professor Tyrrell, during the interview, warning me about the potential challenges ahead. Coming from a statistics and mathematics background, I initially found both machine learning concepts and medical terminology quite intimidating. Although I had completed a few Kaggle courses, I lacked hands-on experience with building models from raw datasets and running end-to-end training and testing.

My research journey began along two paths: first, learning the fundamentals of machine learning and medical imaging, where review papers became my best starting point, and second, exploring rheumatic heart disease (RHD) and its potential for automated diagnosis using transthoracic echocardiography (TTE). The first obstacle I encountered was the lack of publicly available, large-scale datasets for RHD with detailed labels. This led me to pivot toward studying image quality in TTE, since I found a large echocardiography database with quality labels. However, a second challenge soon emerged: I struggled to identify a research question that was both technically meaningful and scientifically impactful.

This is where Professor Tyrrell’s mentorship made all the difference. In one group meeting, he mentioned severe motion blur he had observed in knee ultrasound images. That sparked the idea for my project: detecting and correcting non-uniform motion blur in echocardiography using deep learning. This was the turning point when the project truly began to take shape.

The real research work involved splitting and labeling datasets, designing a neural network model, training and testing on GPUs, and visualizing and evaluating results. Each of these steps was entirely new to me, requiring both technical learning and persistent problem-solving. I am deeply grateful for the guidance of Professor Tyrrell, as well as the support from Giuseppe, Noushin, and other members of the lab, including previous students whose work provided valuable reference points.

By the end of the summer, I had taken full charge of the project, running it from start to end. This responsibility taught me far more than technical skills. I developed a stronger sense of self-motivation, learned to manage my time effectively, and built the resilience needed to handle research setbacks. I realized that research is not just about repetitive lab work; it is about thinking critically, asking meaningful questions, and telling a compelling story through data and results.

The experience was more than an introduction to the research world; it taught me to think boldly and work carefully. I learned not to let ideas live only in conversation or in my head, but to translate them into small, testable experiments that turn speculation into evidence. Each modest prototype, whether a quick data split, a minimal model, or a rough visualization, sharpened my questions, exposed constraints, and informed the next step. Gradually, those incremental wins compounded into a coherent pipeline and credible results. The discipline I gained is simple but powerful: think wild, start small, measure honestly, and move steadily. This balance of wild curiosity with careful craftsmanship now guides how I approach complex, unfamiliar problems, and it’s the mindset I’ll carry into future research and professional work.

Winnie Ye in STA299

This was my first course related to research, and also my first time working with medical imaging. When I heard that we would be doing independent research, I immediately realized that this course would undoubtedly be a great challenge for me. Independent research meant there was no clear “standard answer”; instead, I had to explore and persist on my own.

At the beginning of my ROP project, I was actually the first student in the class to finalize a research direction. I quickly chose skin tone bias in melanoma detection as my topic and decided to work with the ISIC dataset. At that time, I felt well prepared: even though I noticed that dark-skin samples were rare, I believed the number would be “enough.” I even imagined finishing the project in less than two months.

But soon, reality hit me. Out of more than 30,000 ISIC images, there were almost no dark-skin cases. After that, I kept switching datasets: PAD, Fitzpatrick17k, MSKCC. However, each of them had serious problems: some had almost no melanoma cases, some had almost no dark-skin samples, some images contained a lot of background noise rather than just lesions, and some lacked skin tone labels altogether. Even when I combined them, the total number of dark-skin melanoma images was barely more than one hundred. During that period, I felt like I was constantly “starting over,” and every time I thought I had found a breakthrough, it quickly fell apart.

In this struggle, I tried almost everything I could think of. I trained my own U-Net, experimented with CLIP, SVM, EfficientNet, and ResNet; I tested light-skin-trained models directly on dark-skin data; I even used YOLO to crop lesions in order to reduce background noise. My research focus also shifted again and again: from melanoma, to pigmented lesions, and finally to red scaly diseases; and my tasks shifted from classification to segmentation and back again. Altogether, I must have attempted more than a dozen different approaches, yet none of them produced satisfactory results.

As the deadline drew closer, my anxiety grew stronger. By the last month, despite all the models, tasks, and research objects I had tried, I still had no meaningful results to show. At times I felt completely lost, unsure of what else I could even do. In desperation, I wrote Dr. Tyrrell a very long email, confessing that I might not be able to continue and even considered abandoning the project altogether. I told him that if I could start over, I would never choose to study bias so hastily, but would first spend more time carefully understanding the limitations of the datasets.

That month was probably the hardest part of the entire ROP. I stayed up late almost every day, exhausted and anxious, sometimes even afraid to run my code because I expected yet another failure. Dr. Tyrrell was sometimes worried and even a bit frustrated, which made me feel sad, but I was also deeply grateful that he cared so much. In the final weeks, Giuseppe also began to support me more closely, and I truly appreciated his help. During that time, even the smallest result—no matter how unrepresentative—felt important enough for me to immediately share with Dr. Tyrrell and Giuseppe for feedback.

Finally, near the very end, something changed. About ten days before the deadline, I obtained a result that was still imperfect, but at least demonstrated a sign of bias. It was not a breakthrough, but it was enough to build a conclusion. In the last week, I focused on writing the report, experimenting with bias-mitigation methods, and managed to finish everything just in time.

Looking back on these four months, I went through so many emotions: the early excitement of being “ahead,” the anxiety of being overtaken, the regret and despair of repeated failures, and the relief of a small last-minute success. If you ask me what kept me going, I honestly don’t know, perhaps the support from Dr. Tyrrell and Giuseppe, perhaps the stubborn voice in my head saying “try one more time,” or perhaps just a little bit of luck.

Through this course, I developed a new understanding of medical imaging and machine learning: they are not only technical problems but also involve fairness, data limitations, and persistence throughout the research process. I realized that the true value of research is not in quickly achieving a perfect result, but in continuously experimenting, reflecting, and learning from failures. In the future, I hope to further explore fairness in medical imaging, especially to investigate why my findings differed from previous studies and how I can avoid or better explain such discrepancies. I believe this will not only help me improve my research methods but also allow me to move forward more confidently on my academic path.