My name is Yiyun Gu and I am a fourth-year student studying mathematics and statistics at University of Toronto. After taking some statistical courses and machine learning courses, I was quite interested in applying machine learning methods and statistical methods to practice. Medical imaging is a popular field where machine learning methods have great impacts. Therefore, I contacted Dr. Pascal Tyrrell and he would like to supervise me.

Last September, my initial research direction was Bayesian optimization on hyperparameters of Convolutional Neural Networks based on the previous model information and the distributions. Besides Dr. Pascal Tyrrell’s instruction, he introduced his graduate student who was also interested in this field. We had weekly meetings to discuss how to make the idea implementable. I read many papers and learned relevant knowledge of Gaussian process, acquisition functions and surrogate functions. However, there was a huge challenge on how to update the hyperparameters of the prior distribution based on the information from the CNNs model. I was anxious about the progress. Dr. Pascal Tyrrell encouraged me to shift the direction a little bit because he cared about what a student learned and felt about the project.

Since November, out of interest in Bayesian concepts, I have been working on a project about comparing frequentist CNNs and Bayesian CNNs for the projects with sample size restrictions. Because there might not be sufficient data in medical imaging, I would like to determine whether Bayesian CNNs would benefit from prior information for small datasets and outperform frequentist CNNs. Bayesian CNNs update the distributions of weights and bias while frequentist CNNs use point estimates. The resources of the codes of Bayesian CNNs were limited. I tried to make full use of and modify the codes so that I could run the experiments from training sample size equal to 500 to training sample size equal to 50000. I applied customized architectures and AlexNet to MNIST and CIFAR-10 datasets. I found out that Bayesian CNNs didn’t perform well as I expected. Frequentist CNNs achieved higher accuracy and took less time compared to Bayesian CNNs. However, there is an interesting feature of Bayesian CNNs. Bayesian CNNs incorporate uncertainty measure. Since Bayesian CNNs have the distributions of weights, the models can also output the distributions of outputs. Therefore, Bayesian CNNs could tell how confident the decision is made.

I hope to apply more architectures of Bayesian CNNs to more datasets in medical imaging projects because architectures and datasets have great influences on the performance. Also, I would like to try more prior distributions and learn how to determine which distributions are more appropriate.

I had great research experience in this project with Dr. Pascal Tyrrell’s guidance and other graduate students’ help. It was my first time to write scientific report. Dr. Pascal Tyrrell kept instructing me how to write the report and offered great advice. I really appreciated the guidance and enjoyed the unique research experience in the end year of my undergraduate life. I look forward to contributing to medical imaging research and more opportunities to apply machine learning methods!

Yiyun Gu

Amar Dholakia: Some Thoughts as I Wrap Up My STA498Y Project (and Undergrad!)

Hi everyone! I’m Amar Dholakia and I’m a fourth-year/recent graduate having majored in Neuroscience and Statistics, and am starting a Masters’ in Biostatistics at UofT in the fall of 2020. I’ve had the pleasure of being a part of Dr. Tyrrell’s lab for almost two years now and would like to take the opportunity to reflect on my time here.

I started in Fall 2018 as a work-study student, tasked with managing the Department of Medical Imaging’s database. A highlight was discussing and learning about my peers’ work, which sparked my initial interest in the field of artificial intelligence and data science.

The following fall, I began a fourth-year project in statistics, STA498Y under the supervision of Dr. Tyrrell. My project investigated the viability of clustering of image features to assess dataset heterogeneity on deep convolutional network accuracy. Specifically, I compared the behaviour of six clustering algorithms to see if the choice of algorithm affected the ability to capture heterogeneity.

My project started out with reaching out to my labmate and good friend Mauro Mendez, who had recently undertaken a project very similar to mine. He sent me his paper, which I read, and re-read, and re-re-read… It took me about four months to only begin to grasp what Mauro had explored, and how I could use what he had learned to develop my project. But months of struggle was definitely worth the “a-ha!” moment.

First I started by replicating Mauro’s results using Fuzzy K as a clustering to make sure I was on the right track. Reading, coding, and testing the very first time was a nightmare – I had some Python experience but had never applied it before. It took a lot of back and forth with Mauro and Dr. Tyrrell , a lot of learning, understanding, and re-learning what I THOUGHT I understood to get me on the right track. By the start of the Winter term, I had finally conjured preliminary results – banging my head on the wall was slowly becoming worth it.

Once I had the code basics down, getting the rest of the results was relatively smooth sailing. I computed and plotted changes in model accuracy with sample size, and heterogeneity in model accuracy with sample size, as captured by different clustering methods. My results for one model were great from the get go – I was set! I thought to challenge myself by generalizing to a second model – and that was far from easy. But by taking that extra challenge, I felt I learned more about my project, and importantly, how to scientifically justify my results. The results didn’t match up, and I had to support my rationale with evidence (from the literature). If I couldn’t find an explanation, I may have done something incorrectly. And lo and behold, my ‘inexplicable’ results were in fact due to human error – something I very painstakingly troubleshooted, but now I understand much more and justify.

Ultimately, we showed that regardless of clustering technique, or CNN model, clustering could effectively detect how heterogeneity affected CNN accuracy. To me, this was an interesting result as I expected vastly different behaviour between partition-based and density-based

clustering. Nonetheless, it was welcome, as it suggested that any clustering method could be used to assess CNN.

I struggled most with truly appreciating what my research aimed to solve. I attribute this partially to not being as proactive with my readings and questions to Dr. Tyrrell to really verify my understanding. And to be honest, exploring this project is still a work-in-progress – something I will continue learning about this summer!

My advice to any future students – read, read, read! Diving into a specific academic niche is truly a wonderful experience. The learning curve was steep and initially involved a lot of trying, failing, fixing, and then trying again. But this experience only reinforced my notions of “success through failure” and “growth through struggle”. It may be challenging at first, but with some perseverance and support from a wonderful PI – like Dr. Tyrrell – you’ll be able to accomplish so much more than you originally imagined.

Lessons Along the Way

https://betakit.com/startupcfo-explains-the-long-windy-road-to-a-closed-funding-round/
 
 
With summer almost here, it’s a good time to reflect on lessons learned from the academic year gone by. Since September, I’ve been working under Dr. Pascal Tyrrell’s supervision on a systematic review (SR) project investigating sample size determination methods (SSDMs) in machine learning (ML) applied to medical imaging. Shout out to the Department of Statistical Sciences where I completed my independent studies course! Here, I share important lessons I learned in the hopes that they may resonate with you.
 
Despite being a stats student (as you know from my previous posts!), I was initially new to ML and confronted with the task of critically reviewing theoretically-dense primary articles. I came to appreciate the first step was to develop a solid background – starting from high-level YouTube videos and lessons on DataCamp, to reading ML blogs and
review articles – all until I was confident enough to evaluate articles on my own. For me, the key to learning a complex subject was to build on foundational concepts and keep things as clear as possible. As Einstein once said: “If you can’t explain it simply, you don’t understand it well enough”.
 
Next, it was time to conduct a systematic search. The University of Toronto library staff were especially helpful at guiding me in use of OVID Medline and Embase, databases with methodical search procedures and a careful search syntax relying on various operators. To be thorough, we also sent a request out to the rest of our research team, who hand-searched through their own stash of literature. Along the way, we garnered support from the university, successfully receiving the Undergraduate Research
Fund grant. The lessons for me here? The importance of seeking expert help where appropriate, and that being resourceful can pay off (literally)! Finally, I valued our strong team culture, without which none of this would have been possible.
 
While working on the SR, I also conducted a subsampling experiment using a medical imaging dataset, testing the effect of class imbalance on a classifier’s performance. Hands-on/practical experiences are critical in developing a more nuanced understanding of subject material – in my case, an understanding that translated to my SR.
 
So now you are probably wondering about the results! The subsampling experiment helped us develop a model for the deleterious effect of class imbalance on classification accuracy and demonstrated that this effect was sensitive to total sample size. Meanwhile in our SR, we observed great variability in SSDMs and model assessment measures, calling for the need to standardize reporting practices.
 
That was a whirlwind recap of the year and I hope some of the lessons I learned resonate with you!
 
See you in the
blogosphere,
 
Indranil Balki
 
A special thanks to Dr. Pascal Tyrrell, as well as Dr.
Afsaneh Amirabadi & Team

MiCUP… Runneth Over?

An interesting quotation from the Hebrew bible. Basically it means that I have sufficient for my needs and I am good with that. So, where am I going with this you ask? Well, let me introduce you to my program MiCUP – Medical imaging Collaborative Undergraduate Program. 


The goal of the program is to bring together students from the faculty of Arts and Sciences and my faculty (Medicine) to learn about medical research in the world of medical imaging. I have a sprinkling of students every term from various programs such as Research Opportunity Program, Independent Studies, Youth Study Program, and MiVIP. It is only a modest number of students BUT provides ample brain power to get some really cool research done. My cup certainly runneth over. 


Have a look below at the timelines from my two recent ROP students.


Great work Kevin and Sylvia!!!




See you in the blogosphere,


Pascal Tyrrell



Kevin Chen ROP F/W 2014

Sylvia Urbanik F/W 2014