MiWord of the Day Is… Fourier Transform!

Ok, a what Transform now??

In the early 1800s, Jean-Baptiste Joseph Fourier, a French mathematician and physicist, introduced the transform in his study of heat transfer. The idea seemed preposterous to many mathematicians at the time, but it has now become an important cornerstone in mathematics.

So, what exactly is the Fourier Transform? The Fourier Transform is a mathematical transform that decomposes a function into its sine and cosine components. It decomposes a function depending on space or time into a function depending on spatial or temporal frequency.

Before diving into the mathematical intricacies of the Fourier Transform, it is important to understand the intuition and the key idea behind it. The main idea of the Fourier Transform can be explained simply using the metaphor of creating a milkshake.

Imagine you have a milkshake. It is hard to look at a milkshake and understand it directly; answering questions such as “What gives this shake its nutty flavour?” or “What is the sugar content of this shake?” are harder to answer when we are simply given the milkshake. Instead, it is easier to answer these questions by understanding the recipe and the individual ingredients that make up the shake. So, how exactly does the Fourier Transform fit in here? Given a milkshake, the Fourier Transform allows us to find its recipe to determine how it was created; it is able to present the individual ingredients and the proportions at which they were combined to make the shake. This brings up the questions of how does the Fourier transform determine the milkshake “recipe” and why would we even use this transform to get the “recipe”? To answer the former question, we are able to determine the recipe of the milkshake by running it through filters that then extract each individual ingredient that makes up the shake. The reason we use the Fourier Transform to get the “recipe” is that recipes of milkshakes are much easier to analyze, compare, and modify than working with the actual milkshake itself. We can create new milkshakes by analyzing and modifying the recipe of an existing milkshake. Finally, after deconstructing the milkshake into its recipe and ingredients and analyzing them, we can simply blend the ingredients back to get the milkshake.

Extending this metaphor to signals, the Fourier Transform essentially takes a signal and finds the recipe that made it. It provides a specific viewpoint: “What if any signal could be represented as the sum of simple sine waves?”.

By providing a method to decompose a function into its sine and cosine components, we can analyze the function more easily and create modifications as needed for the task at hand.

 A common application of the Fourier Transform is in sound editing. If sound waves can be separated into their “ingredients” (i.e., the base and treble frequencies), we can modify this sound depending on our requirements. We can boost the frequencies we care about while hiding the frequencies that cause disturbances in the original sound. Similarly, there are many other applications of the Fourier Transform such as image compression, communication, and image restoration.

This is incredible! An idea that the mathematics community was skeptical of, now has applications to a variety of real-world applications.

Now, for the fun part, using Fourier Transform in a sentence by the end of the day:

Example 1:

Koby: “This 1000 puzzle is insanely difficult. How are we ever going to end up with the final puzzle picture?”

Eng: “Don’t worry! We can think of the puzzle pieces as being created by taking the ‘Fourier transform’ of the puzzle picture. All we have to do now is take the ‘inverse Fourier Transform’ and then we should be done!”

Koby: “Now when you put it that way…. Let’s do it!”

Example 2: 

Grace: “Hey Rohan! What’s the difference between a first-year and fourth-year computer science student?

Rohan: “… what?”

Grace: “A Fouri-y-e-a-r Transform”

Rohan: “…. (╯°□°)╯︵ ┻━┻ ”

I’ll see you in the blogosphere…

Parinita Edke

The MiDATA Word of the Day is…”clyster”

Holy mother of pearl! Do you remember when the first Pokémon games came out on the Game Boy? Never heard of Pokémon? Get up to speed by watching this short video. Or even better! Try out one of the games in the series, and let me know how that goes!

The name of the Pokémon in this picture is Cloyster. You may remember it from Pokémon Red or Blue. But! Cloyster, in fact, has nothing to do with clysters.

In olden days, clyster meant a bunch of persons, animals or things gathered in a close body. Now, it is better known as a cluster.

You yourself must identify with at least one group of people. What makes you human; your roles, qualities, or actions make you unique. But at the same time, you fall into a group of others with the same characteristics.

You yourself fall into multiple groups (or clusters). This could be your friend circle or perhaps people you connect with on a particular topic. At the end of the day, you belong to these groups. But is there a way we can determine that you, in fact, belong?

Take for example Jack and Rose from the Titanic. Did Jack and Rose belong together?

If you take a look at the plot to the right, Jack and Rose clearly do not belong together. They belong to two separate groups (clusters) of people. Thus, they do not belong together. Case closed!

But perhaps it is a matter of perspective? Let’s take a step back…

Woah! Now, you could now say that they’re close enough, they might as well be together! Compared to the largest group, they are more similar than they are different. And so, they should be together!

For the last time, we may have been looking at this completely wrong! From the very beginning, what are we measuring on the x-axis and on the y-axis of our graph?

Say it was muscle mass and height. That alone shouldn’t tell us if Rose and Jack belong together! And yet, that is exactly what we could have done. But if not those, then what..?

Now for the fun part (see the rules here), using clyster in a sentence by the end of the day:

Serious: Did you see the huge star clysters last night? I heard each one contained anywhere from 10,000 to several million stars…

Less serious: *At a seafood restaurant by the beach* Excuse me, waiter! I’d like one of your freshest clysters, please. – “I’m sorry. We’re all out!”

…I’ll see you in the blogosphere.

Stanley Hua

Stanley Hua in ROP299: Joining the Tyrrell Lab during a Pandemic

My name is Stanley Hua, and I’ve just finished my 2nd year in the bioinformatics program. I have also just wrapped up my ROP299 with Professor Pascal. Though I have yet to see his face outside of my monitor screen, I cannot begin to express how grateful I am for the time I’ve been spending at the lab. I remember very clearly the first question he asked me during my interview: “Why should I even listen to you?” Frankly, I had no good answer, and I thought that the meeting didn’t go as well as I’d hoped. Nevertheless, he gave me a chance, and everything began from there.

Initially, I got involved with quality assessment of Multiple Sclerosis and Vasculitis 3D MRI images along with Jason and Amar. Here, I got introduced to the many things Dmitrii can complain about taking brain MRI images. Things such as scanner bias, artifacts, types of imaging modalities and prevalence of disease play a role in how we can leverage these medical images in training predictive models.

My actual ROP, however, revolved around a niche topic in Mauro and Amar’s project. Their project sought to understand the effect of dataset heterogeneity in training Convolutional Neural Networks (CNN) by cluster analysis of CNN-extracted image features. Upon extraction of image features using a trained CNN, we end up with high-dimensional vectors representing each image. As a preprocessing step, the dimensionality of the features is reduced by transformation via Principal Component Analysis, then selecting a number of principal components (PC) to keep (e.g. 10 PCs). The question must then be asked: How many principal components should we use in their methodology? Though it’s a very simple question, I took way too many detours to answer this question. I looked at the difference between standardization vs. no standardization before PCA, nonlinear dimensionality reduction techniques (e.g. autoencoder) and comparisons of neural network image representation (via SVCCA) among other things. Finally, I proposed an equally simple method for determining the number of PCs to use in this context, which is the minimum number of PCs that gives the most frequent resulting value (from the original methodology).

Regardless of the difficulty of the question I sought to answer, I learned more about practices in research, and I even learned about how research and industry intermingle. I only have Professor Pascal to thank for always explaining things in a way that a dummy such as me would understand. Moreover, Professor Pascal always focused on impact; is what you’re doing meaningful and what are its applications?

 I believe that the time I spent with the lab has been worthwhile. It was also here that I discovered that my passion to pursue data science trumps my passion to pursue medical school (big thanks to Jason, Indranil and Amar for breaking my dreams). Currently, I look towards a future, where I can drive impact with data; maybe even in the field of personalized medicine or computational biology. Whoever is reading this, feel free to reach out! Hopefully, I’ll be the next Elon Musk by then…

Transiently signing out,

Stanley Bryan Z. Hua

Jessica Xu’s Journey in ROP299

Hello everyone! My name is Jessica Xu, and I’ve just completed my second year in Biochemistry and Statistics at the University of Toronto. This past school year, I’ve had the wonderful opportunity to do a ROP299 project with Dr. Pascal Tyrrell and I’d like to share my experience with you all!

A bit about myself first: in high school, I was always interested in life sciences. My favourite courses were biology and chemistry, and I was certain that I would go to medical school and become a doctor. But when I took my first stats course in first year, I really enjoyed it and I started to become interested in the role of statistics in life sciences. Thus, at the end of my first year, while I was looking through the various ROP courses, I felt that Dr. Tyrrell’s lab was the perfect opportunity to explore my budding interest in this area. I was very fortunate to have an interview with Dr. Tyrrell, and even more fortunate to be offered a position in his lab!

Though it may be obvious, doing a research project when you have no research experience is very challenging! Coming into this lab having taken a statistics course and a few computer science courses in first year, I felt I had a pretty good amount of background knowledge. But as I joined my first lab meeting, I realized I couldn’t be more wrong! Almost every other word being said was a word I’d never heard of before! And so, I realized that there was a lot I needed to learn before I could even begin my project.

I then began on the journey of my project, which was looking at how two dimension reduction techniques, LASSO and SES, performed in an ill-posed problem. It was definitely no easy task! While I had learned a little bit about dimension reduction in my statistics class, I still had a lot to learn about the specific techniques, their applications in medical imaging, and ill-posed problems. I was also very inexperienced in coding, and had to learn a lot of R on my own, and become familiar with the different packages that I would have to use. It was a very tumultuous journey, and I spent a lot of time just trying to get my code to work. Luckily, with help from Amar, I was able to figure out some of the errors and issues I was facing in regards to the code.

I learned a lot about statistics and dimension reduction in this ROP, more than I have learned in any other courses! But most importantly, I had learned a lot about the scientific process and the experience of writing a research paper. If I can provide any advice based on my experience, it’s that sometimes it’s okay to feel lost! It’s not expected of you to have devised a perfect plan of execution for your research, especially when it’s your first time! There will be times that you’ll stray off course (as I often did), but the most valuable lesson that I learned in this ROP is how to get back on track. Sometimes you just need to take a step back, go back to the beginning and think about the purpose of your project and what it is you’re trying to tell people. But it’s not always as easy to realize this. Luckily Dr. Tyrrell has always been there to guide us throughout our projects and to make sure we stay on track by reminding us of the goal of our research. I’m incredibly grateful for all the support, guidance, and time that Dr. Tyrrell has given this past year. It has been an absolute pleasure of having the experience of working in this lab.

Now that I’ve taken my first step into the world of research, with all the new skills and lessons I’ve learned in my ROP, I look forward to all the opportunities and the journey ahead!

Jessica Xu

Today’s MiWORD of the day is… Lasso!

Wait… Lasso? Isn’t a lasso that lariat or loop-like rope that cowboys use? Or perhaps you may be thinking about that tool in Photoshop that’s used for selecting free-form segments!

Well… technically neither is wrong! However, in statistics and machine learning, Lasso stands for something completely different: least absolute shrinkage and selection operator. This term was coined by Dr. Robert Tibshirani in 1996 (who was a UofT professor at that time!).

Okay… that’s cool and all, but what the heck does that actually mean? And what does it do?

Lasso is a type of regression analysis method, meaning it tries to estimate the relationship between predictor variables and outcomes. It’s typically used to perform feature selection or regularization.

Regularization is a way of reducing overfitting of a model, ie. it removes some of the “noise” and randomness of the data. On the other hand, feature selection is a form of dimension reduction. Out of all the predictor variables in a dataset, it will select the few that contribute the most to the outcome variable to include in a predictive model.

Lasso works by applying a fixed upper bound to the sum of absolute values of the coefficient of the predictors in a model. To ensure that this sum is within the upper bound, the algorithm will shrink some of the coefficients, particularly it shrinks the coefficients of predictors that are less important to the outcome. The predictors whose coefficients are shrunk to zero are not included at all in the final predictive model.

Lasso has applications in a variety of different fields! It’s used in finance, economics, physics, mathematics, and if you haven’t guessed already… medical imaging! As the state-of-the-art feature selection technique, Lasso is used a lot in turning large radiomic datasets into easily interpretable predictive models that help researchers study, treat, and diagnose diseases.

Now onto the fun part, using Lasso in a sentence by the end of the day! (see rules here)

Serious: This predictive model I got using Lasso has amazing accuracy for detecting the presence of a tumour!

Less serious: I went to my professor’s office hours for some help on how to use Lasso, but out of nowhere he pulled out a rope!

See you in the blogosphere!

Jessica Xu

Jacky Wang’s ROP399 Journey

My name is Jacky Wang, and I am just finishing my third year at the University of Toronto, pursuing a computer science specialist. Looking back on this challenging but incredible year, I was honoured to have the opportunity to work inside Dr. Tyrrell’s lab as part of the ROP399 course. I would love to share my experience studying and working inside the lab.

Looking back, I realize one of the most challenging tasks is getting onboard. I felt a little lost at first when surrounded by loads of new information and technologies that I had little experience with before. Though feeling excited by all the collision of ideas during each meeting, having too many choices sometimes could be overwhelming. Luckily after doing more literature review and with the help of the brilliant researchers in the lab (a big thank you to Mauro, Dimitri, and of course, Dr. Tyrrell), I start to get a better view of the trajectories of each potential project and further determine what to get out from this experience. I did not choose the machine learning projects, though they were looking shiny and promising as always (as a matter of fact, they turned out to be successful indeed). Instead, I was more leaning towards studying the sample size determination methodology, especially the concept of ill-posed problems, which often occur when the researchers make conclusions from models trained on limited samples. It had always been a mystery why I would get different and even contrasting results when replicating someone else’s work on smaller sample sizes. From there, I settled the research topic and moved onto the implementation details.

This year the ROP students are coming from statistics, computer science and biology etc. I am grateful that Dr. Tyrrell is willing to give anyone who has the determination to study in his lab a chance though they may have little research experience and come from various backgrounds. As someone who studies computer science with a limited statistics background, the real challenge lies in understanding all the statistical concepts and designing the experiments. We decided to apply various dimension reduction techniques to study the effect of different sample sizes with many features. I designed experiments around the principal component analysis (PCA) technique while another ROP student Jessica explored the lasso and SES model in the meantime. It was for sure a long and memorable experience with many debugging when implementing the code from scratch. But it was never more rewarding than seeing the successful completion of the code and the promising results.

I feel lucky and grateful that Dr. Tyrell helped me complete my first research project. He broke down the long and challenging research task into clear and achievable subgoals within our reach. After completing each subgoal, I could not even believe it sent us close to the finished line. It felt so different taking an ROP course than attending the regular lessons. For most university courses, most topics are already determined, and the materials are almost spoon-fed to you. But sometimes, I start to lose the excitement of learning new topics, as I am not driven by the curiosity nor the application needs but the pressure of being tested. However, taking the ROP course gives me almost complete control of my study. For ROP, I was the one who decides what topics to explore, how to design the experiment. I could immediately test my understanding and put everything I learned into real applications.

I am so proud of all the skills that I have picked up in the online lab during this unique but special ROP experience. I would like to thank Dr. Tyrrell for giving me this incredible study experience in his lab. There are so many resources out there to reach and so many excellent researchers to seek help from. I would also like to thank all members of the lab for patiently walking me through each challenge with their brilliant insights.

Jacky Wang

MiWord of the Day Is… dimensionality reduction!

Guess what?

You are looking at a real person, not a painting! This is one of the great works by a talented artist Alexa Meade, who paints on 3D objects but creates a 2D painting illusion. Similarly in the world of statistics and machine learning, dimensionality reduction means what it sounds like: reduce the problem to a lower dimension. But only this time, not an illusion.

Imagine a 1x1x1 data point living inside a 2x2x2 feature space. If I ask you to calculate the data density, you will get ½ for 1D, ¼ for 2D and 1/8 for 3D. This simple example illustrates that the data points become sparser in higher dimensional feature space. To address this problem, we need some dimensional reduction tools to eliminate the boring dimensions (dimensions that do not give much information on the characteristics of the data).

There are mainly two approaches when it comes to dimension reduction. One is to select a subset of features (feature selection), the other is to construct some new features to describe the data in fewer dimensions (feature extraction).

Let us consider an example to illustrate the difference. Suppose you are asked to come up features to predict the university acceptance rate of your local high school.

You may discard the “grade in middle school” for its many missing values; discard “date of birth” and “student name” as they are not playing much role in applying university; discard “weight > 50kg” as everyone has the same value; discard “grade in GPA” as it can be calculated. If you have been through a similar process, congratulations! You just performed a dimension reduction by feature selection.

What you have done is removing the features with many missing values, the least correlated features, the features with low variance and one of the highly correlated. The idea behind feature selection is that the data might contain some redundant or irrelevant features and can be removed without losing too much loss information.

Now, instead of selecting a subset of features, you might try to construct some new features from the old ones. For example, you might create a new feature named “school grade” based on the full history of the academic features. If you have been through a thought process like this, you just performed a dimensional reduction by feature extraction

If you would like to do a linear combination, principal component analysis (PCA) is the tool for you. In PCA, variables are linearly combined into a new set of variables, known as the principal components. One way to do so is to give a weighted linear combination of “grade in score”, “grade in middle school” and “recommend letter” …

Now let us use “dimensionality reduction” in a sentence.

Serious: There are too many features in this dataset, and the testing accuracy seems too low. Let us apply dimensional reduction techniques to reduce overfit of our model…

Less serious:

Mom: “How was your trip to Tokyo?”

Me: “Great! Let me just send you a dimensionality reduction version of Tokyo.”

Mom: “A what Tokyo?”

Me: “Well, I mean … photos of Tokyo.”

I’ll see you in the blogosphere…

Jacky Wang

Diana Escoboza’s ESC499 Journey

Hello there! My name is Diana Escoboza, and I’ve just finished my undergraduate studies at UofT in Machine Intelligence Engineering. I am very fortunate to have Prof. Tyrell as my supervisor while I worked on my engineering undergraduate thesis project ESC499 during the summer. I believe such an experience is worth sharing!

My project consisted of training an algorithm to identify/detect the anatomical landmarks on ultrasounds for the elbow, knee, and ankle joints. In medical imaging, it is challenging to correctly label large amounts of data since we require experts, and their time is minimal and costly. For this reason, I wanted my project to compare the performance of different machine learning approaches when we have limited labelled data for training.

The approaches I worked on were reinforcement and semi-supervised learning. Reinforcement learning is based on learning optimal behaviour in an environment through decision-making. In this method, the model would ‘see’ a section of the image and choose a direction to move towards the target landmark. In semi-supervised learning, both labelled and unlabelled data are used for training, and it consists of feeding the entire image to the model for it to learn the target’s location. Finally, I analysed the performance of both architectures and the training resources used to determine the optimal architecture.

While working on my project, I sometimes got lost in the enthusiasm and possibilities and overestimated the time I had. Prof. Tyrell was always very helpful in advising me throughout my progress to keep myself sensible on the limited time and resources I had while still giving me the freedom to work on my interests. The team meetings not only provided help, but they were also a time we would talk about AI research and have interesting discussions that would excite us for our projects and future possibilities. We also had a lot of support from the grad students in the lab, providing us with great help when encountering obstacles. A big shout-out to Mauro for saving me when I was freaking out my code wasn’t working, and time was running out.

Overall, I am very grateful for having the opportunity to work with such a supportive team and for everything I learned along the way. With Prof. Tyrell, I gained a better understanding of scientific research and advanced my studies in machine learning. I want to thank the MiData team for all the help and for providing me with such a welcoming environment.

MiWORD of the Day is… Domain Shift!

From looking at the image from two different domains, could you tell what are they?
Hmmm? Is this a trick question or not, aren’t they the same? You might ask.
Yes, you are right. They are all bags. They are generally the same object, and I am sure you can easily tell just at a glimpse. However, unlike human beings, if you let a machine learning model read these images from two different domains, it would easily get confused by them, and eventually, make mistakes in identifying them. This is known as domain shift in Machine Learning.

Domain shift, also known as distribution shift, usually occurs in deep learning models
when the data distribution changes when the model reads the data. For instance, let’s say a deep learning model is trained on a dataset containing the images of backpacks on domain 1 (see the backpack image above). The model itself would then learn the specific features of the backpack image from domain 1 like the size, shape, angle of the picture taken etc. When you take the exact same model to test or retrain on the backpack images from domain 2, due to a slight variation in the background angle, the data distribution of the model encounters shifts a little bit, which would most likely result in a drop in model performance.

Deep learning models, such as a CNN model, are also widely used in the medical
imaging industry. Researchers have been implementing deep learning models in image
classification, segmentation and other tasks. However, because different imaging centers might use different machines, tools, and protocols, the datasets on the exact same image modality across different imaging centers might differ. Therefore, a model might experience a domain shift when it encounters a new unseen dataset which has variation in the data distribution.

Serious:
Me: “What can we do if a domain shift exists in a model between the source and target dataset?”
Professor Tyrrell: “Try mixing the target dataset with some images from the source dataset! ”

Less serious:
Mom: “I heard that your brother is really good at physics, what is your domain?”
Me: “I used to be an expert on Philosophy, but now due to my emerging interest in AI, I shift my domain to learning Artificial Intelligence.”
Mom: “Oh! A domain shift!”

Will Wu’s ROP299 Journey

Hey folks! My name is Will Wu. I have just finished my second year at the University of
Toronto, currently pursuing a Statistics Specialist and Computer Science minor. Recently, I have just wrapped up my final paper on the ROP project with Professor Pascal Tyrrell. Looking back on the entire experience of doing this ROP, I feel grateful that I could have such an opportunity to learn and engage in research activities, so I find it meaningful to share my experience in the lab!

In the first couple of meetings that I attend, I sometimes find it difficult to follow up and
understand the concepts or projects that they discuss or introduce during the lab meeting, but Professor Tyrrell would usually explain these concepts that we are unfamiliar with. As I work more on the slide deck about Machine Learning, I begin to be familiar with some of the common AI knowledge, the logic behind the neural network and most importantly its significance in medical imaging.

When I am looking for an area of research that is related to Machine Learning as well as
medical imaging, Professor Tyrrell introduced us to a few interesting topics, and one of them is about domain shift. After a bit of literature review on this topic, I further grasp some knowledge about catastrophic forgetting, domain adaptation and out-of-distribution shift. Domain shift represents a shift in the data distribution when a deep learning model sees an unseen new set of data from a different dataset. This often occurs in the medical imaging area as images from different imaging centers have different acquisition tools or rules, which might lead to a difference between datasets. Therefore, I found it interesting to see the impact domain shift would bring on the performance of a CNN model, and how to quantify such a shift, especially on regular CT scans and low-dose CT scans.

For my project, it would require training and retraining the CNN model to observe such
an impact on the model performance, and it often leads to frustration for me as errors and
potential risks for overfitting keep showing up. Most of the time, I would look online for a quick fix and adjust the model as well as the dataset to eliminate such a problem. Mauro and Atsuhiro also provided tremendous help in sorting out the potential mistakes I might make during the experiment. The weekly ROP meeting was super helpful as well because Professor Tyrrell often listens to our follow-ups and gives us valuable suggestions to aid our research experience.

Throughout the entire research experience, there have been frustrations, endeavours and
success. This is overall a wonderful experience for me. I not only learned a lot about Statistics, Machine learning and its implementation in medical imaging, but I also got to know how research is generally being conducted, and most importantly the skills I have acquired throughout the Journey. Thank you for the kind help from the lab members to guide me through such an experience, it is such an intriguing experience!

MiWORD of the Day is… Self-Supervised Learning!

Having just finished my intro to psychology course, I found that a useful way to learn the material (and guarantee a higher score on the test) was giving myself pop quizzes. In particular, I would write out a phrase and erase a keyword, then I will try to fill in the blank. Turns out this is such an amazing study method that even machines are using it! This method of learning, where the learner learns more in-depth by creating a quiz (e.g. fill-in-the-blank) for itself constitutes the essence of self-supervised learning.

Many machine learning models follow the encoder-decoder architecture, where an encoder network first extracts useful representations from the input data and the decoder network then uses the extracted representations to perform some task (e.g. image classification, semantic segmentation). In a typical supervised learning setting, when there is a large amount of data but only little of which is labelled, all the unlabelled data would have to be discarded as supervised learning requires a model be trained using input-label pairs. On the other hand, self-supervised learning utilizes the unlabelled data by introducing a pretext task to first pretrain the encoder such that it extracts richer and more useful representations. The weights of the pretrained encoder can then be transferred to another model, where it is fine-tuned using the labelled data to perform a specified downstream task (e.g. image classification, semantic segmentation). The general idea of this process is that better representations can be learned through mass unlabelled data, which provides the decoder with a better starting point and ultimately improves the model’s performance on the downstream task.

The choice of pretext task is paramount for pretraining the encoder as it decides what kind of representations can be extracted. One powerful pretrain method that has yielded higher downstream performance on image classification tasks is SimCLR. In the SimCLR framework, a batch of  images are first sampled, and each image is applied two different augmentations  and , resulting in  images. Two augmented versions of the same image is called a positive pair, and two augmented versions of different images is called a negative pair. Each pair of the  images is passed to the encoder  to produce representations  and , and these are then passed to a projection layer  to obtain the final representations  and . A contrastive loss defined using cosine similarity operates on the final representations such that the encoder  would produce similar representations for positive pairs and dissimilar representations for negative pairs. After pretraining, the weights of the encoder  could then be transferred to a downstream image classification model.

Although better representations may be extracted by the encoder using self-supervised learning, it does require a large unlabelled dataset (typically >100k images for vision-related tasks).

Now, using self-supervised learning in a sentence:

Serious: Self-supervised learning methods such as SimCLR has shown to improve downstream image classification task performance.

Less Serious: I thought you will be implementing a self-supervised learning pipeline for this project, why are you teaching the machine how to solve a rubik’s cube instead? (see Self-supervised Feature Learning for 3D Medical Images by Playing a Rubik’s Cube)

See you in the blogosphere!

Paul Tang

Paul Tang’s STA299 Journey

Hi! My name is Paul Tang and I just finished my second year at UofT studying computer science specialist and cognitive science major. During this summer, I enrolled in STA299 under the supervision of Prof. Pascal Tyrrell to learn how to conduct research, and I will be sharing my experience in this reflection blog post.

The first phase of my ROP experience concerns formulating a research question. Having a keen interest in machine learning, I got my inspiration for combining it with my research from a weekly lab meeting where Mauro presented his graduate research work (on the generation of synthetic ultrasound image data). I decided to focus on the problem that the amount of annotated data in the field of medical imaging is often limited for effective supervised training. Eventually, by reading papers and discussing my ideas with Prof. Tyrrell during the first few weeks, the solution I decided on was to use self supervised learning to pretrain a machine learning model for improving its performance. In particular, I chose the contrastive learning based self supervised learning method called DenseCL. Luckily, I got my data right at the lab using the ultrasound knee recess distension dataset for semantic segmentation. My ROP project dealt with comparing the effect of using DenseCL pretraining on the segmentation performance.

At first, I was doubtful of my research question: afterall, many papers I read already showed using self supervised pretraining did improve task performance, so wouldn’t my research be too “obvious”? However, I realized along the way that some interesting gaps still existed (e.g. current self supervised pretrain methods used in the domain of medical images do not extract local image features, which could be helpful for segmentation tasks), and these gave me confidence and excitement for my research.

Getting to work, I first identified the github repositories I would use in my project. Setting up the environment and the repositories to work with my dataset took much longer than expected (in fact, I had to switch to a different github repository due to “false advertising” from the original one), and I learned that checking with lab members (Mauro, Atsuhiro) and asking for ideas when starting to work on anything could save much needed time. I made several mistakes while training my models. When I first obtained the performance result (mIoU) from my segmentation model, I was relieved that it was consistent with previous results obtained in the lab. However, using this model in another experiment produced highly untypical results, which led me back to debug the model. Eventually the problem was found to be due to small batch size. Although this mistake cost me much training time, it did allow me to explore and gain familiarity with the configurations of a machine learning model, which I find very rewarding.

Eventually, I obtained results that show a small performance improvement in using DenseCL pretraining for the segmentation of ultrasound knee distention images. My project still had its limitations: my result was not statistically rigorous as I didn’t account for randomness in the training process. Furthermore, the amount of images I used for DenseCL pretraining is much fewer than what would typically be used in a self supervised learning setting. These limitations served as great motivation for further research.

This research experience taught me how humbling doing research was: many things I took for granted require careful testing, and that many gaps still exist in the current literature upon closer inspection. I am thankful to Prof. Tyrrell’s openness for allowing us to choose our own research questions, and I am thankful to all the help the lab members (especially Mauro and Atsuhiro) provided to me.

Paul Tang

MiWORD of the Day is… Attention!

In cognitive psychology, attention refers to the process of concentrating mental effort on sensory or mental events. When we attenuate to a certain object over others, our memory associated with that object is often better. Attention, according to William James, also involves “withdrawing from some things in order to effectively deal with others.” There are lots of things that are potential objects of our attention, but we attend to some things and ignore others. This ability helps our brain save processing resources by suppressing irrelevant features.

In image segmentation, attention is the process of highlighting the relevant activations during training. Attention gates can learn to focus on target features automatically through training. Then during testing, they can highlight salient information useful for a specific task. Therefore, just like when we allocate attention to specific tasks our performance would be improved, the attention gates would also improve model sensitivity and accuracy. In addition, models trained with attention gates also learn to suppress irrelevant regions as humans do; hence, reducing the computational resources used on irrelevant activations.

Now let’s use attention in a sentence by the end of the day!

Serious: With the introduction of attention gates in standard U-Net, the global information of the recess distention is obtained, and the irrelevant background noise is suppressed which in turn increases the model’s sensitivity and leads to smoother and more complete segmentation.

Less serious:
Will: That lady said I am a guy worth paying attention to (。≖ˇ∀ˇ≖。)
Nana: Sadly, she said that to the security guard…

Nana Ye’s STA299 Journey

Hi everyone! My name is Nana Ye, and I am finishing my second year at the University of
Toronto as a statistical science specialist and cognitive science major. I am grateful to participate in an ROP (Research Opportunities Program) project with the guidance of Professor Tyrrell during the summer of 2022. This project provides me with a valuable opportunity to learn about machine learning and understand scientific research. I would love to share my experiences with you all!

My project is analyzing the effect of additional attention gates in U-Net for knee recess
distention ultrasound segmentation. The recess distention area detected by the ultrasonic signal is similar to the image background and the ultrasound image often has a large amount of noise, distortion, and shadow which causes blurred local details, lots of dark areas, and no obvious division. Thus, I wanted to see whether implementing the additional attention gates in standard U-Net would improve segmentation accuracy. Prior to this project, I had not learned about machine learning; therefore, being able to implement a machine learning model on real-world patient data is exciting and challenging.

The journey of my ROP had a rocky start. I started off hoping to do a different project that dealt with comparing Vision Transformers and Convolutional Neural Networks on segmentation tasks for objects located in different regions of the image (central and non-central). However, when I was searching for a ViT model, I struggled with its implementation on my dataset and since ViT is new in the lab I could not get much help with its implementation from others. Thus, I made the decision to change my project. Professor Tyrrell was supportive of my decision and provided me with several articles to read which led me to my current project. When I was worried about falling behind because others were already training their models, Professor Tyrrell encouraged me that understanding what is feasible in a given time frame is also a valuable lesson. Atsuhiro and Mauro also offered me lots of help along the way. When I was having a tough time understanding the technical aspect of image processing, Atsuhiro scheduled a meeting with me to
explain the concept and answer all my questions. With their help, I was able to finish my first research project in machine learning and obtained promising results.

Overall, it is a completely unique experience from other lectures at the university. Researching as an ROP student in Professor Tyrrell’s lab gives me the opportunity to do a research project from the very beginning of doing background research and picking a topic to the very end of analyzing the results and revising the report. In the entire process, not only did I learn technical knowledge about machine learning and medical imaging, but also, I learned to manage the timeline for a project efficiently, think critically, and problem-solve independently. I feel privileged to be one of the ROP students in Professor Tyrrell’s lab and gain such worthwhile experience that would benefit my academic career.

Adele Lauzon’s ROP399 Journey

Hi there! My name is Adele Lauzon, and I’ve just finished up my 3rd year at UofT with a major in statistics and minors in computer science and psychology. A huge highlight of my year has been my ROP399 with Professor Tyrell, where I got to do a deep dive into the intersection of statistics, computer science, and biomedical data.

A little bit about my background–I went to high school in Houston, Texas, which is where I first fell in love with statistics. I remember my AP Statistics teacher beginning our first class with a quote by esteemed statistician John Tukey, where he claimed statistics was the best discipline because it meant you got to “play in everyone’s backyard.” As I’ve gotten farther along in my statistics education, I’ve realized how much truth is behind that phrase. Statistics is wonderful because it allows you to understand other fields simply based on the data you use. Through this ROP, I’ve been able to learn a bit more about the field of medicine.

My project was about measures of confidence in binary classification algorithms using biomedical data. Specifically, I investigated error consistency and error agreement–meaning I took a close look at what was happening when the model was making incorrect predictions. I’m not going to lie, probably the hardest part of this project was just getting started. I have a little bit of programming experience due to my computer science minor, but I had a lot of catching up to do compared to my classmates. A word of advice–set yourself on the GPUs early. Running my code locally made for a frighteningly overheated laptop.

Probably my biggest takeaway from this course was how the process of research actually works. While the scientific method is helpful, it doesn’t account for all of the back-and-forth you are guaranteed to be doing. This is where documenting all of your steps really comes in handy. If you reach an obstacle and need to reevaluate, keep a record of what you were doing beforehand in case you need to regress again. I made this mistake, and ended up having to do some work that I had already done.

All in all, this ROP has been such a valuable experience to me. Many thanks to Professor Tyrrell and the rest of the MiDATA team for their unwavering patience!

Today’s MiWORD of the day is… Agreement!

You know that magical moment where you and your friend finally agree on a place to eat, or a movie to watch, and you wonder what lucky stars had to align to make that happen? When the chance of agreement was so small that you didn’t think you’d ever decide? If you wanted to capture how often you and your friend agree on a restaurant or a movie in such a way that accounted for whether it was due to random chance, Cohen’s Kappa is the choice for you.

Agreement can be calculated just by taking the number of agreed upon observations divided by the total observations; however, Jacob Cohen believed that wasn’t enough. As agreement was typically used for inter-rater reliability, Cohen argued that this measure didn’t account for the fact that sometimes, people just guess–especially if they are uncertain. In 1960, he proposed Cohen’s Kappa as a counter to traditional percent agreement, claiming his measure was more robust as it accounted for random chance agreement.

Cohen’s Kappa is used to calculate agreement between two raters–or in machine learning, it can be used to find the agreement between the prediction sets of two models. It is calculated by subtracting the probability of chance agreement from the probability of observed agreement, all over one minus the probability of chance agreement. Like many correlation metrics, it ranges from -1 to +1. A negative value of Cohen’s Kappa indicates that there is no relationship between the raters, or that they had a tendency to give different ratings. A Cohen’s Kappa of 0 indicates that there is no agreement between the predictors above what would be expected by chance. A Cohen’s Kappa of 1 indicates that the raters are in complete agreement.

As Cohen’s Kappa is calculated using frequencies, it can be unreliable in measuring agreement in situations where an outcome is rare. In such cases, it tends to be overly conservative and underestimates agreement on the rare category. Additionally, some statisticians disagree with the claim that Cohen’s Kappa accounts for random chance, as an explicit model of how chance affected decision making would be necessary to say this decisively. The chance adjustment of Kappa simply assumes that when raters are uncertain, they completely guess an outcome. However, this is highly unlikely in practice–usually people have some reason for their decision.

Let’s use this in a sentence, shall we?
Serious: The Cohen’s Kappa score between the two raters was 0.7. Therefore, there is substantial agreement between the raters’ observations.
Silly: A kappa of 0.7? They must always agree on a place to eat!

Today’s MiWORD of the day is… Artifact!

When the ancient Egyptians built the golden Mask of Tutankhamun or carved a simple message into the now infamous Rosetta Stone, they probably didn’t know that we’d be holding onto them centuries later, considering them incredible reflections of Egyptian history.

Both of these are among the most famous artifacts existing today in museums. An artifact is a man-made object that’s considered to be of high historical significance. However, in radiology, an artifact is a lot less desirable – it refers to parts of an image that appear differently and inaccurately reflect the body structures they are taken of.

Artifacts in radiography can happen to any image. For instance, they can occur from improper handling of machines used to take medical scans, patient movement during imaging, external objects (i.e. jewelry, buttons) and other unwanted occurrences.

Why are artifacts so important? They can lead to misdiagnoses that could be detrimental to a patient. Consider a hypothetical scenario where a patient goes in for imaging for a tumor. The radiologist identifies the tumor as benign, but in reality, due to mishandling of a machine, an image artifact exists on the image that hides the fact that it is in fact malignant. The outcome would be catastrophic in this case!

Of course, this kind of diagnosis is highly unlikely (especially with modern day medical imaging) and there a ton of factors at play with diagnosis. A real diagnosis, especially nowadays, would not be so simple (or we would be wrong not to severely lament the state of medicine today). However, even if artifacts don’t cause a misdiagnosis, they can pose obstacles to both radiologists and researchers working with these images.

One such area of research is the application of machine learning into the field of medical imaging. Whether we’re using convolutional neural networks or vision transformers, all of these machine learning models rely on images produced in some facility. The quality of these images, including the presence and frequency of artifacts, can affect the outcome of any experiments conducted with them. For instance, imagine you build a machine learning model to classify between two different types of ultrasound scans. The performance of the model is certainly a main factor – but the concern that the model might be focusing on artifacts within the image rather than structures of interest would also be a huge consideration.

In any case, the presence of artifacts (whether in medical imaging or in historical museums) certainly gives us a lot more to think about!

Now onto the fun part, using artifact in a sentence by the end of the day:

Serious: My convolutional neural network could possibly be focusing on artifacts resulting from machine handling in the ultrasound images during classification rather than actual body structures of interest. That would be terrible.

Less serious: The Rosetta Stone – a phenomenal, historically significant, hugely investigated Egyptian artifact that happened to be a slab of stone on which I have no idea what was inscribed.

I’ll see you in the blogosphere!

Jeffrey Huang