Christine Wang’s STA299 Journey

Hi! My name is Christine Wang, and I’m finishing my third year at the University of Toronto pursuing a specialist in statistics with a focus on cognitive psychology. The STA299 journey through the whole year has been a really amazing and challenging experience.

My research project involved assessing whether the heterogeneity of medical images affects the clustering of image features extracted from the CNN model. Initially, I found it quite challenging to understand the difference between my research and the previous work done by Mauro, who analyzed the impact of heterogeneity on the generalizability of CNN by testing the overall model performance on the test clusters. Many thanks to the discussions in the ROP meeting every week, I understood that I needed to retrain the CNN model using the images in each of the clusters in the training set to see how heterogeneity could affect the clustering of image features. By checking whether the retrained CNN models from each cluster perform differently, I was able to show that heterogeneity could affect the clustering of image features. However, the most challenging part of the research is not just achieving the desired results, but rather interpreting what I could learn from those results. For instance, even though I obtained results that showed the retrained models perform differently, I spent a lot of time trying to understand what the clusters represent and why some retrained models perform better than others. I am very grateful to Professor Pascal Tyrrell for helping me understand my project and providing me with essential advice to check the between-cluster distances. This enabled me to interpret the results and identify a possible pattern: the retrained models with similar performance come from clusters that are also close to each other. However, further research is still required because the two datasets I used were not large enough. Looking back, I realize that it would have been better if I used the dataset in our lab, as finding the appropriate dataset and code was very challenging. I would like to thank Mauro, Atshuhiro, and Tristal for their generous help in teaching me how to do feature extraction and cluster analysis.

Before starting the project, I was fascinated by the high accuracy and excellent performance of ML techniques. However, during the ROP journey, I realized that achieving high model performance is not the most important thing. As Professor Pascal mentioned, the most crucial aspect of doing research is truly understanding what we are doing and focusing on interpreting what we can learn from the results we obtain. It is not enough to just have tables and figures; we need to go further by choosing appropriate statistical analysis to understand our results.

Alice Zhang’s STA299 Journey

Hi friends! My name is Alice Zhang. I am finishing my third year of undergrad pursuing a
statistical science specialist with a focus on genetics and biotechnology, as well as a biology minor. It was a blessing to take part in STA299Y ROP with Professor Tyrrell and his MiDATA lab. As this experience comes to an end, I would like to share about my incredible journey.

Coming into the lab, I held great interest but zero research experience and zero knowledge about machine learning. I remember being completely lost and worried in my very first lab meeting. Looking back, I’m actually quite proud of how far I’ve come. My project was to compare multiple-instance classifiers and single-instance classifiers for diagnosing knee recess distension ultrasounds. I also explored factors that may influence multiple-instance model training.

The start of my project was rather smooth compared to others since it was more application-based than theoretical. I was able to grasp key concepts through literature searches and gather usable models and datasets (thanks to Mauro) needed to begin the project. However, with a lack of research experience and weak background in programming, I soon faced obstacles, confusion, panic and doubts. I had the tools in hand, but the hard part was designing, running and interpreting appropriate experiments. How do I modify and apply the code to my ultrasound data? How do I fairly compare two dissimilar algorithms? How do I unbiasedly alter and compare training factors? How do I give rational interpretations of the outcomes and unusual observations?

As the project progressed, I constantly felt that I was falling behind; I was still doubting and
modifying my experiments while my peers obtained results, I was still training my models while others were starting the write-up. To be honest, I panicked in every ROP meeting, but I was supported by Professor Tyrrell, lab members and my peers. I was able to power through. I am so grateful for having Professor Tyrrell as my guide through the first doorstep of research. He taught me that research isn’t about finding and reporting a standard answer, it is a process of discovering and then solving problems, and there’s no template for it. I was constantly encouraged to reflect on the “what”, “how” and “why” of the process. I also greatly appreciate the help from Mauro, who prepared the dataset and spent many hours guiding me through programming and model training.

Progressing through the project, I was later able to solve problems and modify bugs
independently. I started from zero to now completing my very first research project in machine learning. It feels like I’ve raised my first “research baby”! I would like to once again thank Professor Tyrrell and the lab members for their support, I couldn’t have gained this marvellous learning experience without them.

Diana Escoboza’s ESC499 Journey

Hello there! My name is Diana Escoboza, and I’ve just finished my undergraduate studies at UofT in Machine Intelligence Engineering. I am very fortunate to have Prof. Tyrell as my supervisor while I worked on my engineering undergraduate thesis project ESC499 during the summer. I believe such an experience is worth sharing!

My project consisted of training an algorithm to identify/detect the anatomical landmarks on ultrasounds for the elbow, knee, and ankle joints. In medical imaging, it is challenging to correctly label large amounts of data since we require experts, and their time is minimal and costly. For this reason, I wanted my project to compare the performance of different machine learning approaches when we have limited labelled data for training.

The approaches I worked on were reinforcement and semi-supervised learning. Reinforcement learning is based on learning optimal behaviour in an environment through decision-making. In this method, the model would ‘see’ a section of the image and choose a direction to move towards the target landmark. In semi-supervised learning, both labelled and unlabelled data are used for training, and it consists of feeding the entire image to the model for it to learn the target’s location. Finally, I analysed the performance of both architectures and the training resources used to determine the optimal architecture.

While working on my project, I sometimes got lost in the enthusiasm and possibilities and overestimated the time I had. Prof. Tyrell was always very helpful in advising me throughout my progress to keep myself sensible on the limited time and resources I had while still giving me the freedom to work on my interests. The team meetings not only provided help, but they were also a time we would talk about AI research and have interesting discussions that would excite us for our projects and future possibilities. We also had a lot of support from the grad students in the lab, providing us with great help when encountering obstacles. A big shout-out to Mauro for saving me when I was freaking out my code wasn’t working, and time was running out.

Overall, I am very grateful for having the opportunity to work with such a supportive team and for everything I learned along the way. With Prof. Tyrell, I gained a better understanding of scientific research and advanced my studies in machine learning. I want to thank the MiData team for all the help and for providing me with such a welcoming environment.

Will Wu’s ROP299 Journey

Hey folks! My name is Will Wu. I have just finished my second year at the University of
Toronto, currently pursuing a Statistics Specialist and Computer Science minor. Recently, I have just wrapped up my final paper on the ROP project with Professor Pascal Tyrrell. Looking back on the entire experience of doing this ROP, I feel grateful that I could have such an opportunity to learn and engage in research activities, so I find it meaningful to share my experience in the lab!

In the first couple of meetings that I attend, I sometimes find it difficult to follow up and
understand the concepts or projects that they discuss or introduce during the lab meeting, but Professor Tyrrell would usually explain these concepts that we are unfamiliar with. As I work more on the slide deck about Machine Learning, I begin to be familiar with some of the common AI knowledge, the logic behind the neural network and most importantly its significance in medical imaging.

When I am looking for an area of research that is related to Machine Learning as well as
medical imaging, Professor Tyrrell introduced us to a few interesting topics, and one of them is about domain shift. After a bit of literature review on this topic, I further grasp some knowledge about catastrophic forgetting, domain adaptation and out-of-distribution shift. Domain shift represents a shift in the data distribution when a deep learning model sees an unseen new set of data from a different dataset. This often occurs in the medical imaging area as images from different imaging centers have different acquisition tools or rules, which might lead to a difference between datasets. Therefore, I found it interesting to see the impact domain shift would bring on the performance of a CNN model, and how to quantify such a shift, especially on regular CT scans and low-dose CT scans.

For my project, it would require training and retraining the CNN model to observe such
an impact on the model performance, and it often leads to frustration for me as errors and
potential risks for overfitting keep showing up. Most of the time, I would look online for a quick fix and adjust the model as well as the dataset to eliminate such a problem. Mauro and Atsuhiro also provided tremendous help in sorting out the potential mistakes I might make during the experiment. The weekly ROP meeting was super helpful as well because Professor Tyrrell often listens to our follow-ups and gives us valuable suggestions to aid our research experience.

Throughout the entire research experience, there have been frustrations, endeavours and
success. This is overall a wonderful experience for me. I not only learned a lot about Statistics, Machine learning and its implementation in medical imaging, but I also got to know how research is generally being conducted, and most importantly the skills I have acquired throughout the Journey. Thank you for the kind help from the lab members to guide me through such an experience, it is such an intriguing experience!

Paul Tang’s STA299 Journey

Hi! My name is Paul Tang and I just finished my second year at UofT studying computer science specialist and cognitive science major. During this summer, I enrolled in STA299 under the supervision of Prof. Pascal Tyrrell to learn how to conduct research, and I will be sharing my experience in this reflection blog post.

The first phase of my ROP experience concerns formulating a research question. Having a keen interest in machine learning, I got my inspiration for combining it with my research from a weekly lab meeting where Mauro presented his graduate research work (on the generation of synthetic ultrasound image data). I decided to focus on the problem that the amount of annotated data in the field of medical imaging is often limited for effective supervised training. Eventually, by reading papers and discussing my ideas with Prof. Tyrrell during the first few weeks, the solution I decided on was to use self supervised learning to pretrain a machine learning model for improving its performance. In particular, I chose the contrastive learning based self supervised learning method called DenseCL. Luckily, I got my data right at the lab using the ultrasound knee recess distension dataset for semantic segmentation. My ROP project dealt with comparing the effect of using DenseCL pretraining on the segmentation performance.

At first, I was doubtful of my research question: afterall, many papers I read already showed using self supervised pretraining did improve task performance, so wouldn’t my research be too “obvious”? However, I realized along the way that some interesting gaps still existed (e.g. current self supervised pretrain methods used in the domain of medical images do not extract local image features, which could be helpful for segmentation tasks), and these gave me confidence and excitement for my research.

Getting to work, I first identified the github repositories I would use in my project. Setting up the environment and the repositories to work with my dataset took much longer than expected (in fact, I had to switch to a different github repository due to “false advertising” from the original one), and I learned that checking with lab members (Mauro, Atsuhiro) and asking for ideas when starting to work on anything could save much needed time. I made several mistakes while training my models. When I first obtained the performance result (mIoU) from my segmentation model, I was relieved that it was consistent with previous results obtained in the lab. However, using this model in another experiment produced highly untypical results, which led me back to debug the model. Eventually the problem was found to be due to small batch size. Although this mistake cost me much training time, it did allow me to explore and gain familiarity with the configurations of a machine learning model, which I find very rewarding.

Eventually, I obtained results that show a small performance improvement in using DenseCL pretraining for the segmentation of ultrasound knee distention images. My project still had its limitations: my result was not statistically rigorous as I didn’t account for randomness in the training process. Furthermore, the amount of images I used for DenseCL pretraining is much fewer than what would typically be used in a self supervised learning setting. These limitations served as great motivation for further research.

This research experience taught me how humbling doing research was: many things I took for granted require careful testing, and that many gaps still exist in the current literature upon closer inspection. I am thankful to Prof. Tyrrell’s openness for allowing us to choose our own research questions, and I am thankful to all the help the lab members (especially Mauro and Atsuhiro) provided to me.

Paul Tang

Nana Ye’s STA299 Journey

Hi everyone! My name is Nana Ye, and I am finishing my second year at the University of
Toronto as a statistical science specialist and cognitive science major. I am grateful to participate in an ROP (Research Opportunities Program) project with the guidance of Professor Tyrrell during the summer of 2022. This project provides me with a valuable opportunity to learn about machine learning and understand scientific research. I would love to share my experiences with you all!

My project is analyzing the effect of additional attention gates in U-Net for knee recess
distention ultrasound segmentation. The recess distention area detected by the ultrasonic signal is similar to the image background and the ultrasound image often has a large amount of noise, distortion, and shadow which causes blurred local details, lots of dark areas, and no obvious division. Thus, I wanted to see whether implementing the additional attention gates in standard U-Net would improve segmentation accuracy. Prior to this project, I had not learned about machine learning; therefore, being able to implement a machine learning model on real-world patient data is exciting and challenging.

The journey of my ROP had a rocky start. I started off hoping to do a different project that dealt with comparing Vision Transformers and Convolutional Neural Networks on segmentation tasks for objects located in different regions of the image (central and non-central). However, when I was searching for a ViT model, I struggled with its implementation on my dataset and since ViT is new in the lab I could not get much help with its implementation from others. Thus, I made the decision to change my project. Professor Tyrrell was supportive of my decision and provided me with several articles to read which led me to my current project. When I was worried about falling behind because others were already training their models, Professor Tyrrell encouraged me that understanding what is feasible in a given time frame is also a valuable lesson. Atsuhiro and Mauro also offered me lots of help along the way. When I was having a tough time understanding the technical aspect of image processing, Atsuhiro scheduled a meeting with me to
explain the concept and answer all my questions. With their help, I was able to finish my first research project in machine learning and obtained promising results.

Overall, it is a completely unique experience from other lectures at the university. Researching as an ROP student in Professor Tyrrell’s lab gives me the opportunity to do a research project from the very beginning of doing background research and picking a topic to the very end of analyzing the results and revising the report. In the entire process, not only did I learn technical knowledge about machine learning and medical imaging, but also, I learned to manage the timeline for a project efficiently, think critically, and problem-solve independently. I feel privileged to be one of the ROP students in Professor Tyrrell’s lab and gain such worthwhile experience that would benefit my academic career.

Adele Lauzon’s ROP399 Journey

Hi there! My name is Adele Lauzon, and I’ve just finished up my 3rd year at UofT with a major in statistics and minors in computer science and psychology. A huge highlight of my year has been my ROP399 with Professor Tyrell, where I got to do a deep dive into the intersection of statistics, computer science, and biomedical data.

A little bit about my background–I went to high school in Houston, Texas, which is where I first fell in love with statistics. I remember my AP Statistics teacher beginning our first class with a quote by esteemed statistician John Tukey, where he claimed statistics was the best discipline because it meant you got to “play in everyone’s backyard.” As I’ve gotten farther along in my statistics education, I’ve realized how much truth is behind that phrase. Statistics is wonderful because it allows you to understand other fields simply based on the data you use. Through this ROP, I’ve been able to learn a bit more about the field of medicine.

My project was about measures of confidence in binary classification algorithms using biomedical data. Specifically, I investigated error consistency and error agreement–meaning I took a close look at what was happening when the model was making incorrect predictions. I’m not going to lie, probably the hardest part of this project was just getting started. I have a little bit of programming experience due to my computer science minor, but I had a lot of catching up to do compared to my classmates. A word of advice–set yourself on the GPUs early. Running my code locally made for a frighteningly overheated laptop.

Probably my biggest takeaway from this course was how the process of research actually works. While the scientific method is helpful, it doesn’t account for all of the back-and-forth you are guaranteed to be doing. This is where documenting all of your steps really comes in handy. If you reach an obstacle and need to reevaluate, keep a record of what you were doing beforehand in case you need to regress again. I made this mistake, and ended up having to do some work that I had already done.

All in all, this ROP has been such a valuable experience to me. Many thanks to Professor Tyrrell and the rest of the MiDATA team for their unwavering patience!

Tong Su’s ROP299 Journey

Hi everyone! My name is Tong Su, and I have just wrapped up my ROP299 project in Professor Tyrrell’s Lab, as well as my second year at the University of Toronto, pursuing a computer science specialist and statistics major. It is a great pleasure to complete my whole second-year journey along with this research experience. I have learned a lot of things about both artificial intelligence topics and the process of scientific research. I would like to share my experiences with you here.

My ROP project is the effect of compression and downsampling on the accuracy of the Convolutional Neural Network (CNN)-based histological image binary classification model. Advances in medical imaging systems have made medical images more details. They also increased the size of medical images as scarification. Compared to other images, medical images are larger and occupy more storage space. Therefore, most medical images were downsampled or compressed before they were stored. While some compressions are reversible, most of the others are irreversible. Once the image is compressed, perceptible information is lost and could not be restored. When these modified medical images are used for training machine learning algorithms, the information loss during compression may affect the algorithms’ accuracy. This study aims to investigate how compression and downsampling ratio to medical imaging affect the accuracy of CNN.

Similar to other ROP students, I decided on my research topic early by selecting from a bunch of topics in different areas. However, the focus of my research has slightly adjusted as I progressed through my project. Initially, my research topic is “Can we compress training data without degrading accuracy?”. This topic only illustrates the effect of compression on the accuracy of the algorithm and at the end of the research, I need to propose the best compression ratio that is suitable for medical images storage without much loss of accuracy.

Among all the compression types, I decided to work with JPEG2000 as it is one of the most commonly used compression types in medical imaging. The dataset chosen consisted of 100,000 different image patches from histological images of human colorectal cancer (CRC) and normal tissue. It was organized into 9 for each image. The next step is to choose the machine learning model. I decided to work on binary classification with the CNN model. The two categories were picked for the binary classification model that classifies whether a given tissue image is cancer-associated stroma (STR) (1) or is normal colon mucosa (NORM) (0).

The next step is compressing the dataset. I used Python Image Library (PIL) to compress the dataset using JPEG2000. However, the binary classification model does not support the dataset with format j2k. In this case, I needed to include another process of converting the j2k images to a type that is supported by the model. I decided to convert the image to TIFF as it is the same as the dataset’s original format.

During my research about compression, Professor Tyrrell pointed out another image size reduction method, downsampling. Although both methods are used to reduce the image size, there are some differences between them. This aroused my interest that which image size reduction method performs better than the machine learning algorithm. In that case, I started to add another purpose to my project to compare the difference between downsampling and compression and state which image size reduction method is more suitable for medical imaging.

Despite all the obstacles I encountered along the way, such as changing the dataset halfway through the project, making modifications to the model and rerunning everything, and the unexpected 54.39% error for high compression ratio, etc., I have successfully come to the end and concluded my excellent ROP experience through this reflection. Now I have a greater understanding of the process of research and deep learning algorithm. At the end of this reflection, I want to thank Professor Tyrrell for offering me this opportunity and guiding my research progress through the weekly meetings. I also want to thank Dr. Atsuhiro Hibi for providing me with endless guidance and support for the whole research project through meetings and frequent email exchange even when he was busy. Without their help, I would not be able to have such an excellence research experience.

Tong Su

Manav Shah’s Journey in ROP399

Hi! My name is Manav Shah, and I am finishing the third year of Computer Science Specialist and Statistics Minor at UofT. This past academic year, I had the opportunity to do an ROP399 research project under the guidance of Professor Pascal Tyrrell, and I would like to share experience on this blog.

My ROP project dealt with comparing the effect of decrease in sample size on Vision Transformer’s against Convolutional Neural Networks on a Chest X-Ray classification task using the NIH Chest X-Ray dataset. Convolutional Neural Networks have been predominantly used in medical imaging tasks as they are easy to train and perform very well with any image modality. However, in recent years, Vision Transformers (ViTs) have been shown to outperform on Convolutional Neural Networks. However, they have only been shown to do so only when trained/pretrained on extremely large amounts of data. Given that large amounts of labelled data are hard to come by in the field of Medical Imaging, it is important to set up some baselines for performance and gauge whether future work and research is warranted in this arena. This exploratory aspect made my project very exciting.

I started the project not knowing anything about ViTs. I had some experience training and using CNNs or Resnets before. Thus, I started with reading up everything I could about Vision Transformers. However, since it is a relatively new class of models, it was hard to gain an initial intuitive understanding of what was happening in the research papers I read. I did not know where I should start. To not waste time, I started by cleaning my data and preparing a binary classification dataset from the NIH Chest X-Ray dataset, to detect infiltration within the lungs. I trained a small CNN classifier from scratch to see if the results made sense. I was getting an accuracy of around 60%, which I knew was not good enough. Then, I spoke to Prof. Tyrrell and Atsuhiro, who pointed to the fact that my dataset might have some noise relating to the same patients being in the positive and negative class of images. Thus, I cleaned my data some more and made sure there was little correlation between the negative and positive class of images.

I then proceeded to train a small CNN again, with fair results. However, when I tried training a ViT from scratch on my datasets, it would only learn to output “No Infiltration” for all images as that was the majority class. So, I did some more research and tried a lot of different techniques, but to no avail. However, in trying to debug the ViT model, I gained an in-depth understanding of some concepts like learning rate scheduling, training regimes, transfer learning, self-attention etc. I learned a lot from a lot of failures that I encountered in the project. I was close to giving up, had it not been for Prof. Tyrrell’s patience and encouraging words. I also spoke to my Neural Networks professor and some friends for advice and learned a lot. In the end, I decided to use transfer learning, which ended up giving me very fruitful results.

More than technical knowledge, I learned how to stick with tough projects and what to expect when navigating one. I found Prof. Tyrrell’s attitude towards failures in projects very inspiring, which gave me the confidence to persevere through. The experience, in my opinion, teaches you how tough research actually is, and more importantly, how you can still overcome challenges and only get better having gone through them.

 Manav Shah

Grace Yu’s STA299 Journey

Hi everyone! My name is Grace Yu and I’m finishing my second year at the University of Toronto, pursuing a computer science specialist and a molecular genetics major. From September 2021 to April 2022, I was fortunate to have the opportunity to do a STA299 project with Professor Tyrrell through the Research Opportunity Program. I am excited to share my experience with you all!

My project was landmarking with reduced sample size in MSK ultrasound images for knees. Similar to many other ROP students, this was my first research experience. Prior to this project, I have no idea about how machine learning works. However, I am always interested in the intersection between computer science and medical field, and that’s what drives me in this opportunity.

The start of the project was interesting but not easy. There were many times I did not know if I was doing the right thing, or if I was making the efforts towards the correct path. Luckily, Professor Tyrrell, and people in the lab were always very patient and helpful. I begin by reading some research papers on developing new semi-supervised learning models, but found them difficult to comprehend and time-consuming. Mauro kindly provided the suggestions on which parts to focus when doing the literature research, and advised me to pay more attention in selecting a model instead of focusing on the technical details about how the model is constructed. In addition, as I spent much time in choosing a model, I fell behind others. Professor Tyrrell reminded me of the timeline of my project and the next steps I should take on as soon as possible, which was to find a dataset. Fortunately, with the help of lab, we prepared a dataset together and my project went back to schedule. Looking back, I appreciated the period of exploring and experimenting, and the guidance provided by others. The starting point of a project can be difficult and sometimes we do not know what we are doing, but really that’s ok. For me, the time I spent in the beginning paid off by having extra suitable model and leading to a nice comparison. In addition, this experience also allows me to get on new projects or new fields more quickly.

I am very grateful to having the opportunity to work in the MiDATA lab this year. Not only did I had more understanding of statistical and computer science concepts, but also I learned the methods and process of conducting research. I would like to thank professor Tyrrell, Majid, Mauro, and Atsuhiro for their guidance and feedback on my way of doing this project. With this experience, I am more confidence and looking forward to applying what I have learned to my future research journey.

Grace Yu