Manav Shah’s Journey in ROP399

Hi! My name is Manav Shah, and I am finishing the third year of Computer Science Specialist and Statistics Minor at UofT. This past academic year, I had the opportunity to do an ROP399 research project under the guidance of Professor Pascal Tyrrell, and I would like to share experience on this blog.

My ROP project dealt with comparing the effect of decrease in sample size on Vision Transformer’s against Convolutional Neural Networks on a Chest X-Ray classification task using the NIH Chest X-Ray dataset. Convolutional Neural Networks have been predominantly used in medical imaging tasks as they are easy to train and perform very well with any image modality. However, in recent years, Vision Transformers (ViTs) have been shown to outperform on Convolutional Neural Networks. However, they have only been shown to do so only when trained/pretrained on extremely large amounts of data. Given that large amounts of labelled data are hard to come by in the field of Medical Imaging, it is important to set up some baselines for performance and gauge whether future work and research is warranted in this arena. This exploratory aspect made my project very exciting.

I started the project not knowing anything about ViTs. I had some experience training and using CNNs or Resnets before. Thus, I started with reading up everything I could about Vision Transformers. However, since it is a relatively new class of models, it was hard to gain an initial intuitive understanding of what was happening in the research papers I read. I did not know where I should start. To not waste time, I started by cleaning my data and preparing a binary classification dataset from the NIH Chest X-Ray dataset, to detect infiltration within the lungs. I trained a small CNN classifier from scratch to see if the results made sense. I was getting an accuracy of around 60%, which I knew was not good enough. Then, I spoke to Prof. Tyrrell and Atsuhiro, who pointed to the fact that my dataset might have some noise relating to the same patients being in the positive and negative class of images. Thus, I cleaned my data some more and made sure there was little correlation between the negative and positive class of images.

I then proceeded to train a small CNN again, with fair results. However, when I tried training a ViT from scratch on my datasets, it would only learn to output “No Infiltration” for all images as that was the majority class. So, I did some more research and tried a lot of different techniques, but to no avail. However, in trying to debug the ViT model, I gained an in-depth understanding of some concepts like learning rate scheduling, training regimes, transfer learning, self-attention etc. I learned a lot from a lot of failures that I encountered in the project. I was close to giving up, had it not been for Prof. Tyrrell’s patience and encouraging words. I also spoke to my Neural Networks professor and some friends for advice and learned a lot. In the end, I decided to use transfer learning, which ended up giving me very fruitful results.

More than technical knowledge, I learned how to stick with tough projects and what to expect when navigating one. I found Prof. Tyrrell’s attitude towards failures in projects very inspiring, which gave me the confidence to persevere through. The experience, in my opinion, teaches you how tough research actually is, and more importantly, how you can still overcome challenges and only get better having gone through them.

 Manav Shah

Grace Yu’s STA299 Journey

Hi everyone! My name is Grace Yu and I’m finishing my second year at the University of Toronto, pursuing a computer science specialist and a molecular genetics major. From September 2021 to April 2022, I was fortunate to have the opportunity to do a STA299 project with Professor Tyrrell through the Research Opportunity Program. I am excited to share my experience with you all!

My project was landmarking with reduced sample size in MSK ultrasound images for knees. Similar to many other ROP students, this was my first research experience. Prior to this project, I have no idea about how machine learning works. However, I am always interested in the intersection between computer science and medical field, and that’s what drives me in this opportunity.

The start of the project was interesting but not easy. There were many times I did not know if I was doing the right thing, or if I was making the efforts towards the correct path. Luckily, Professor Tyrrell, and people in the lab were always very patient and helpful. I begin by reading some research papers on developing new semi-supervised learning models, but found them difficult to comprehend and time-consuming. Mauro kindly provided the suggestions on which parts to focus when doing the literature research, and advised me to pay more attention in selecting a model instead of focusing on the technical details about how the model is constructed. In addition, as I spent much time in choosing a model, I fell behind others. Professor Tyrrell reminded me of the timeline of my project and the next steps I should take on as soon as possible, which was to find a dataset. Fortunately, with the help of lab, we prepared a dataset together and my project went back to schedule. Looking back, I appreciated the period of exploring and experimenting, and the guidance provided by others. The starting point of a project can be difficult and sometimes we do not know what we are doing, but really that’s ok. For me, the time I spent in the beginning paid off by having extra suitable model and leading to a nice comparison. In addition, this experience also allows me to get on new projects or new fields more quickly.

I am very grateful to having the opportunity to work in the MiDATA lab this year. Not only did I had more understanding of statistical and computer science concepts, but also I learned the methods and process of conducting research. I would like to thank professor Tyrrell, Majid, Mauro, and Atsuhiro for their guidance and feedback on my way of doing this project. With this experience, I am more confidence and looking forward to applying what I have learned to my future research journey.

Grace Yu

The MiDATA Word of the Day is… “AP”

AP? Average Precision! What is it? And how is it useful?

Imagine you are given a prediction model that can identify common objects, and you want to know how well the model performs. So you prepare a picture that contains 2 people, and labels them with bounding boxes in yellow yourself. Then you applied the model on this image, and the model boxes the people in red with different confidence scores. Not bad right? But how can you tell if this prediction is correct?

That’s where Intersection of Union (IoU) comes in, the first stop on our journey to AP. Looking at the boxes in the picture, you can see some parts of yellow box and red box overlap. IoU is the proprotion of their overlapping region over the union. For example, the prediction for the person on the left will have smaller IoU than the prediction for the other person.

If we set the cutoff the IoU to be 0.8, then the prediction on the left will be classified as false positive (FP) since it does not reach the threshold, whereas the prediction on the right will be true positive (TP).

Now final piece before calculating AP. In this image of cats, we labeled 5 cats in red, and predictions are made in yellow. We rank the predictions on descending confidence score, and calculate the precision and recall. Precision is the proportion of TP out of all predictions, and Recall is the proportion of TP out of all ground-truth.

Here is a summary of calculations.

Rank of predictionsCorrect (Y/N)PrecisionRecall
1T10.2
2T10.4
3F0.670.4
4T0.750.6
5T0.80.8
6F0.670.8

Then we plotted the precicion over recall curve.

Generally as recall increases, the precision decreases. AP is the area under the precision-recall curve! It is from 0 to 1, the higher the better.

Whoa! That’s a complicated definition. Often AP can be calculated directly by the model. Next time you see AP, you know it represents how good your model is.

Now for the fun part, using AP in a sentence by the end of the day:

Serious: AP is a measurement of accuracy in object detection model.

Less serious:

Child: Hey mom! I need some help with the assignment in boxing all the cars on the road.

Mother: Try this model! It has AP of 0.8, and it may be better at this than I do.

…I’ll see you in the blogosphere.

Grace Yu

Linxi Chen’s STA399 Journey

Hi everyone! My name is Linxi Chen and I’m finishing my third year at the University of Toronto, pursuing a statistics specialist and a mathematics major. I did an STA399 research opportunity program with Professor Pascal Tyrrell from May 2021 – August 2021 and I am very grateful that I can have this opportunity. This project provided me an opportunity in understanding machine learning and scientific research. I would love to share my experience with you all.

Initially, I had no experience with machine learning and convolutional neural network and integrating machine learning with medical imaging was a brand-new area for me. Therefore, at the beginning of this project, I searched loads of information on machine learning to gain a general picture of this area. The first assignment in this project was to make a slide deck on machine learning in medical imaging. Extracting and simplifying the gathered information helped me understand this area more deeply. 

My research project is to find out an objective metric for heterogeneity and explore how dataset heterogeneity will affect heterogeneity as measured by the CNN training image features with sample size. At first, how to specifically define the term “heterogeneity” was a big challenge for me, since there are various kinds of definitions on Google and there was so little information that was directly related to my project. By comparing the information on websites and talking with Professor Tyrrell in the weekly meeting, I managed to define the term “between-group heterogeneity” as the extent to which the measurements of each group vary within a dataset, considering the mean of each subgroup and the grand mean of the population. Next, designing the experiment setup was also challenging, because I have to ensure the experiment steps are applicable and explicable. The datasets were separated into different groups according to the label of each image. We introduced new groups into the dataset while keeping the total sample size the same in each case. There was a total of four cases and the between-group heterogeneity was measured using Cochran’s Q which is a test statistic based on chi-squared distribution. The experiment setup was modified several times, because problems came up from time to time. For example, I planned to use multi-classification CNN model at first, but it showed that I have to use different number of output neurons for the model in different cases, which made the results not comparable. Professor Tyrrell and Mauro suggested I use a binary-classification model with pseudo label, which successfully solved this problem. Luckily, I found some code on the website and with the help of Mauro, I managed to come up with the code that was applicable to use. Next came the hardest part that I encountered. Although idealistically my expectation could be explained very well, still the output results I got were not what I expected. After modifying the model and the sample size several times, I finally managed to get the expected results.

Overall, I have learned a lot from this ROP program. With the guidance of Professor Tyrrell and the help of students in the lab, I have gained an in-depth understanding of machine learning, neural network and the process of scientific research. Also, I have become more familiar with writing a formal scientific research paper. The most valuable thing that I got from this experience is the ability of problem-solving and never be frustrated when things get wrong. I would like to thank Professor Tyrrell for giving me this opportunity to learn about scientific research and helping me overcome all the challenges that I encountered during this process. I’m very grateful that I have gained so many valuable skills in this project. Also, I would like to thank Mauro and all the members in the lab for giving me so much help with my project.

Linxi Chen

MiWORD of the Day Is… Heterogeneity!

Today we are going to talk about the variation within a dataset, which is different from the term “pure variance” that we commonly use. So, what exactly is heterogeneity? 

There are three different kinds of data heterogeneity within the dataset: clinical heterogeneity, methodological heterogeneity, and statistical heterogeneity. Inevitably, the observed individuals in a dataset will differ from each other, which from the perspective of medical imaging, a set of images might be different from the average pixel intensities, RGB values, border on the images, and so on. Therefore, any kind of variability within the dataset is likely to be termed heterogeneity. 

However, there are some differences between variance and heterogeneity. If a population has lots of variance, it only means that there are a lot of differences between the grand mean of the population and the individuals. Variance is a measure of dispersion, meaning how far a set of numbers is spread out from their average value. However, with respect to data heterogeneity, it means that there are several subpopulations in a dataset, and these subpopulations are disparate from each other. Therefore, we consider the between-group heterogeneity which represents the extent to which the measurements of each group vary within a dataset, considering the mean of each subgroup and the grand mean of the population. 

Chart, box and whisker chart

Description automatically generated
Chart, scatter chart

Description automatically generated

For example, if we are studying the height of a population, it is expected that the height of people from different regions (e.g., north, south, east, west of Canada) will be disparate from each other. If we separate the population into groups according to the region, we can calculate heterogeneity by measuring the variation of height between each group. If a population has a high value of heterogeneity, it will cause some problems to model training, causing a low testing accuracy.

Now for the fun part, using heterogeneity in a sentence by the end of the day!

Serious: The between-group heterogeneity in the training dataset made some negative impacts to the model training and therefore resulted in low testing accuracy.

Less serious: Today’s dinner was so wonderful! We had stewing beef, fried chicken, roasted lamb, and salads. There is so much heterogeneity in today’s dinner!

See you in the blogosphere!

Linxi Chen

MiWORD of the day is…logistic regression!

In a neuron, long, tree-like appendages called dendrites receive chemical signals – either excitatory or inhibitory – from many different surrounding neurons. If the net signal received in the neuron’s cell body exceeds a certain threshold, then the neuron fires and the electrochemical signal is transmitted onwards to other neurons. Sure, this process is fascinating, but what does it have to do with statistics and machine learning?

Well, it turns out that the way a neuron functions – taking a whole bunch of weighted inputs, aggregating them, and then outputting a binary response – is a good analogy for a method known as logistic regression. (In fact, Warren McCulloch and Walter Pitts proposed the “threshold logic unit” in 1943, an early computational representation of the neuron that works exactly like this!)

Perhaps you’ve heard of linear regression, which is used to model the relationship between a continuous scalar response variable and at least one explanatory variable. Linear regression works by fitting a linear equation to the data, or, in other words, finding a “line of best fit.” Logistic regression is similar, but it instead “squeezes” the output of a linear equation between 0 and 1 using a special sigmoid function. In other words, linear regression is used when the dependent variable is continuous, and logistic regression is used when the dependent variable is categorical.

Since the output of the sigmoid function is bounded between 0 and 1, it’s treated as a probability. If the sigmoid output for a particular input is greater than the classification threshold (for instance, 0.5), then the observation is classified into one category. If not, it’s classified into the other category. This ability to divide data points into one of two binary categories makes logistic regression very useful for classification problems.

Let’s say we want to predict whether a particular email is spam or not. We might have a dataset with explanatory variables like the number of typos in the email or the name of the sender. Once we fit a logistic regression model to this data, we can calculate “odds ratios” for each of the two explanatory variables. If we got an odds ratio of 2 for the variable representing the number of typos in the email, for example, we know that every additional typo doubles the estimated odds chance of the email being spam. Much like the coefficients in linear regression, odds ratios can give us a sense of a variable’s “importance” to the model.

Now let’s use “logistic regression” in a sentence.

Serious: I want to predict whether this tumour is benign or malignant based on several tissue characteristics. Let’s fit a logistic regression model to the data!

Less serious: 

Person 1: I built a neural network!

Person 2: Hey – that’s cheating! You only used a *single* neuron, so you’re basically just doing logistic regression…

See you in the blogosphere!

Jacqueline Seal

Jacqueline Seal’s Journey in ROP299: Simulating clinical data!

Hi! My name is Jacqueline, and I’m going into my second year at U of T, pursuing a major in Computer Science and a specialist in Bioinformatics. This past summer, I had the opportunity to do a STA299 project with Professor Tyrrell through the Research Opportunity Program, and I’m excited to share my experiences here!

My ROP project for the summer dealt with the simulation of clinical variables relevant to detecting intra-articular hemarthrosis – basically bleeding into the joint – among patients with hemophilia, a disease where patients lack sufficient clotting proteins and are prone to regular, excessive bleeding. Since hemophilia is quite rare, clinical data is often unavailable and so simulation can help us understand what the real data might look like under different sets of plausible assumptions. The ultimate goal was to demonstrate that adding clinical data to Mauro’s existing binary CNN classifier for articular blood detection could boost model performance, as compared to a model trained exclusively on ultrasound images.

Having just completed first year, I went into this ROP with a very limited statistics background and was initially overwhelmed by all the stats jargon being used in lab meetings and in conversations with other lab members. Concepts like “odds ratios,” “ROC,” and “sensitivity analysis” were completely new to me, and I spent many hours just familiarizing myself with these fundamentals. 

After a bit of a slow start, I began my project by identifying physical presentation and clinical history variables to use in my simulation. I was fortunate enough to speak with two distinguished hematologists from Novo Nordisk, Drs. Brand-Staufer and Zak, about the features most relevant to diagnosing a joint bleed. Based on this conversation, I selected two of these variables as a starting point and simulated them according to assumed distributions. Next, I simulated the probability of an articular bleed based on a logistic regression model and used that probability to simulate the “true” presence of a bleed based on a Bernoulli distribution.

Then, I took a bit of a fun detour: figuring out how to best match simulated data to real-world bleed probabilities output by Mauro’s model. With some guidance from Professor Tyrrell, I developed a matching algorithm that allowed us to control the strength of the positive correlation between clinical simulated probabilities and classifier probabilities. Perhaps the most difficult part of my project was ensuring that the simulated dataset captured the desired relationships between my explanatory variables and between each explanatory variable and the response variables. Thanks to the advice of Guan and Sylvia, however, I was able to verify these relationships and report on them in a statistically sound manner.

Despite all the obstacles I encountered along the way, despite changing the details of my methodology several times, despite making slow progress and occasionally feeling like I was going in circles, I’m very grateful to have had this opportunity. Not only did I gain a greater understanding of important statistical concepts and greater familiarity with machine learning techniques, but I also got first-hand experience navigating the research process, from beginning to end. Ultimately, my experience in the MiDATA lab was simultaneously challenging and rewarding, and I would like to thank Dr. Tyrrell for all his guidance this summer – whether it was setting up impromptu meetings to discuss unexpected issues in my data, providing feedback on my results, or simply sharing humorous anecdotes in our weekly lab meetings. Regardless of where this next year takes me, I’m confident that I’ll carry the lessons I learned this summer with me.

Jacqueline Seal

Today’s MiWORD of the day is … YOLO!

YOLO? You Only Live Once! Go and take adventures before we waste life in the common days, as in The Motto by Drake.

Well, maybe we should go back from the lecture hall of PCS100 (Popular Culture Study) to the classroom of computer science and statistics. In the world of algorithms, YOLO refers to You Only Look Once. Its name has indicated that it is very powerful with full confidence on its efficiency. But what is such a powerful algorithm and how does it work?

YOLO is an algorithm of bounding box regression that performs object detection. It can recognize the classes of objects in images and bound those objects with predicted boxes, where the tasks of classification and localization are completed at the same time. Compared with previous region-based algorithms like R-CNN, YOLO is more efficient because it is region-free.

Object detection methods usually use sliding windows to go through the whole image and see whether there is an object in each window. Region-based algorithms like R-CNN apply Region Proposal to reduce the number of windows to check. YOLO is different as it makes predictions on the entire image at the same time. As an analogy for fishing, R-CNN first divides the regions and picks those regions where fish might occur, while YOLO puts a fishing net and catch fishes together. YOLO divides the image into grids where each grid recognizes an object whose center is inside the grid by its bounding boxes. When several grids declare that an object occurs inside, non-maximal suppression is applied to only keep the grid with highest confidence. Thus, the combination of grid confidence and grid predicted bounding boxes could tell the final classification and localization of each object in the image. 

As the development of region-free algorithms, there have been several versions of YOLO. One practical and advanced version is YOLOv3, which is also the version that I put in my project. It is widely applied in many fields, including the popular auto-driving and … also medical imaging analysis! YOLOv3 is popular because of its efficiency and simple usage, which could save much time for any potential user.

Now we can go to the fun part! Using YOLO in a sentence by the end of the day (I put both serious and not together):

Manager: “Where is Kolbe? He was supposed to finish his task of detecting all the tumors in these CT images tonight! Had he already gone through all thousands of images during the past hour?”

Yvonne: “Well, he was pretty stressed about his workload and asked me if there is any quick method that can help. I said YOLO.”

Manager: “That sounds good. The current version has good performance in many fields, and I bet it could help. Wait, but where did he go? He should be training models right now.”

Yvonne: “No idea. He just got excited and shouted YOLO, turned off the computer and left quickly without any message. I guess he was humming like Tik Tok when phoning with his friends.”

Manager: “Okay, I can probably guess what happened. I need a talk with him tomorrow…”

See you in the blogosphere! 

Jihong Huang

Jihong Huang’s ROP399 Journey

Hi, my name is Jihong Huang and I have finished my third year in computer science and statistics at the University of Toronto. During this summer, I had the great chance to work on my ROP399 project under the guide of Dr. Pascal Tyrell. In such a pandemic, everything was a bit different from usual, including this program. Still, I would like to share my experience and lessons from this summer with you!

After three years in the university and so many different courses in statistics and computer science, I thought that I was totally prepared to take a try in some research projects with knowledge learnt in lectures. However, it turned out that my thoughts were completely wrong! Everything was different from the lectures, where professors will teach step by step with detailed notes. I needed to create my own proposal and design the experiments, independently like a scholar instead of a student. Despite Dr. Tyrrell’s help, I struggled to figure out my schedule for the project. Such an experience was quite unique and special to me compared with time in lecture assignments.

After all the setups, I began to handle the coding part of my project. I picked YOLOv3 as my application of bounding box regression. YOLOv3 is one of the most popular bounding box regression algorithms and it already has excellent performances in many fields. At the same time, it has its complex structures and mechanisms that are longer and more complicated than any code that I have ever learnt. It looks like only the combination of classification and localization, where each single algorithm is easy to understand but the combination is much more advanced than my lectures notes! It took me weeks to roughly figure out its mechanism. Then, I devoted myself to debugging the code. That was difficult, as I was not familiar with most of the packages used. Some issues were caused by different versions of packages, while some were made by subtle wrong code. The adjustments of hyperparameters were also annoying as I usually could not find the optimal solutions for them. Thanks to the great help from Mauro, I finally made my code work on the server successfully.

At the end of the whole trip in my project, I gained a lot of advanced knowledge about bounding box regression and many relating packages, which I would probably never touch before my graduation if I did not take this project. However, my most precious lessons are not about any specific coding ability. The most important lesson is what scientific research is and how it should be done. I learnt that it is very important to make a clear and specific proposal as the plan in the beginning as it would provide the guidelines for any further experiments on coding. Otherwise, it would be easy to go off track and lose the initial goal when thousands of lines of code overwhelm. Also, there could always be failures in scientific research. I spent more than half of my time making and fixing mistakes during the project, which frustrated me a lot in the process. My final conclusion was suggesting that the algorithm selected was not performing well. But they were all common in scientific research. As we learn from failures, the failures are meaningful, and we could make further progress based on them. Thanks to the help from Dr. Tyrrell and all other lab members, it was them that helped me out of frustration during the project and offered me valuable advice.

After this project of three months, I learnt a lot from my first try in the world of scientific research, including coding skills and scientific spirits. This experience provided me with important guidance on my future direction of study and I think all the time and efforts are worthwhile.

– Jihong Huang

Rui Zhu’s ROP399 Journey

I am Rui Zhu. I’ve just completed my third year in the computer science program. I’ve been working in Dr. Tyrrell’s lab on my ROP399 project in the past summer, which is a new and wonderful experience for me.

When I am writing down this reflection, and at some other decision-making moment in the future, it reminds me of the interview with Dr. Tyrrell, where he asked me why I chose his lab and why he chose me. I had tons of reasons for choosing his lab. However, honestly, it was hard for me to put up a whole sentence to answer why he would choose me. “I haven’t done research before, and everything needs a start,” I remember I said unconfidently, “so I need this chance to see if I am really interested in it and see how it goes”. Fortunately, I received Dr. Tyrrell’s offer a few days after the interview, and my very first research experience started.

My ROP project is on imperfect gold standard, which is the consensus of the readers. More specifically, the project is about training models on dataset labelled by readers who make mistakes. At first, I started by reading a lot of papers on robust learning. However, when I had my first meeting with Dr. Tyrrell and Atsuhiro, who kindly helped me with my project, I could not answer what is the definition of imperfect gold standard and why we need consensus of the readers. Atsuhiro helped me out. He explained the problem in real-world applications, where multiple readers annotate a huge dataset without looking at each other’s labels, because it is time-consuming and costly. I learnt the lesson that doing research starts by thinking why I am doing this rather than thinking how to do it. I kept getting questions like why my project is meaningful.

After sorting my mind, I began writing my premise, purpose, hypothesis, and objectives. I thought it was difficult to write up a whole page for these things, but after finding out that I should not assume people know why I am doing the project, I explained everything to readers in my introduction. It was easier than I thought to put up a whole page. After finishing my premise, purpose, hypothesis, and objectives, I combined them together to be a full introduction. Everything flowed like water.

When I was writing the actual code for my project, not many difficulties were met, as I was getting help from Atsuhiro and Mauro. I wanted to thank them for their help. Mauro taught me how to use Pytorch Lightning, which structured Pytorch code in a way easy to understand. Atsuhiro helped me confirm my experimental methodology and gave me guidance on what robust learning techniques to use for my project. Moreover, I started very early to familiarize myself with the code.

Overall, the journey on my summer ROP research was wonderful. I learnt how to start research from scratch and some knowledge in robust learning, although I am only scratching the surface of it. It was a pleasure for me to work in Dr. Tyrrell’s lab this summer. I look forward to what I can do in the future in the world of research.

– Rui Zhu