Adam Adli’s ROP399 Journey in Machine Learning and Medical Imaging

My name is Adam Adli and I am finishing the third year of my undergraduate studies at the University of Toronto specializing in Computer Science. I’m going to start this blog post by talking a little bit about myself. I am a software engineer, an amateur musician, and beyond all, someone who loves to solve problems and treats every creation as art. I have a rather tangled background; I entered university as a life science student, but I have been a programmer since my pre-teen years. Somewhere along the way, I realized that I would flourish most in my computer science courses and so I switched programs in at the beginning of my third year.
 
While entering this new and uncertain phase in my life and career, I had the opportunity of meeting Dr. Pascal Tyrrell and gaining admission to his research opportunity program (ROP399) course that focused on the application of Machine Learning to Medical Imaging under the Data Science unit of the Department of Medical Imaging.
 
Working in Dr. Tyrrell’s lab was one of the most unique experiences I have had thus far in university, allowing me to bridge both my interest in medicine and computer science in order to gain valuable research experience. When I first began my journey, despite having a strong practical background in software development I had absolutely no previous exposure to machine learning nor high-performance computing.
 
As expected, beginning a research project in a field that you have no experience in is frankly not easy. I spent the first few months of the course trying to learn as much about machine learning algorithms and convolutional neural networks as I could; it was like learning to swim in an ocean. Thankfully, I had the support and guidance of my colleagues in the lab and my professor Dr. Tyrrell throughout the way. With their help, I pushed my boundaries and learned the core concepts of machine learning models and their development with solutions to real-world problems in mind. I finally had a thesis for my research.
 
My research thesis was to experimentally show a relationship that was expected in theory: smaller training sets tend to result in over-fitting of a model and regularization helps prevent over-fitting so regularization should be more beneficial for models trained on smaller training sets in comparison to those trained on larger ones. Through late nights of coding and experimentation, I used many repeated long-running computations on a binary classification model for dental x-ray images in order to show that employing L2 regularization is more beneficial for models training on smaller training samples than models training on larger training samples. This is an important finding as often times in the field of medical imaging, it may be difficult to come across large datasets—either due to the bureaucratic processes or financial costs of developing them.
 
I managed to show that in real-world applications, there is an important trade-off between two resources: computation time and training data. L2 regularization requires hyperparameter tuning which may require repeated model training which may often be very computationally expensive—especially in complex convolutional neural networks trained on large amounts of data. So, due to the diminishing returns of regularization and the increased computational
costs of its employment, I showed that L2 regularization is a feasible procedure to help prevent over-fitting and improve testing accuracy when developing a machine learning model with limited training data.
 
Due to the long-running nature of the experiment, I tackled my research project as not only a machine learning project but also a high-performance computing project as well. I so happened to be taking some systems courses like CSC367: Parallel Programming and CSC369: Operating Systems at the same time as my ROP399, which allowed me to better appreciate the underlying technical considerations in the development of my experimental
machine learning model. I harnessed powerful technologies like Intel AVX2 vectorization instruction set for things like image pre-processing on the CPU and the Nvidia CUDA runtime environment through PyTorch to accelerate tensor operations using multiple GPUs. Overall, the final run of my experiment took about 25 hours to run even with all the high-level optimizations I considered—even on an insane lab machine with an Intel i7-8700 CPU and an Nvidia GeForce GTX Titan X!
 
Overall, my ROP not only opened a door to the world of machine learning and high-performance computing for me but in doing so, it taught me so much more. It strengthened my independent learning, project management, and software development skills. It taught me more about myself. I feel that I never experienced so much growth as an academic, problem-solver, and software engineer in such a condensed period of time.
 
I am proud of all the skills I’ve gained in Dr. Tyrrell’s lab and I am extremely thankful for having received the privilege of working in his lab. He is one of the most supportive professors I have had the pleasure of meeting.
 
Now that I have completed my third year of school, I’m off to begin my year-long software engineering internship at Intel and continue my journey.
 
Signing out,

Adam
Adli

Step 1 in ROP399 – What’s my project?

This week I finally decided on my project topic!

During last week’s lab meeting, Dr. Tyrrell brought up some potential topics for us to choose from. This included determining the appropriate sample size for machine learning, class imbalance problem, participating in the dental project and the ultrasound project that has just been brought up.

After the lab meeting, I talked to Wenda and Ariana regarding the dental project that they have been working on. This was the project that I wanted to be in the most primarily because I intend to go to dental school after graduation, and being involved in a dental project would offer more exposure to this field. However, after the brief introduction and update on the current progress by Wenda and Ariana, I realized that there might not be much to do as a complete project. Hanatu, an independent research student, would also be working on this project, leaving fewer gaps that need to be addressed for the project. Because my expectation is to work on a project independently on a topic where there’s plenty of freedom, I decided to change gears and look at other ideas.

The class imbalance topic was the next thing that caught my interest. Indranil, who happened to be my mentor before I joined the lab, has been working on the class imbalance project before. I immediately contacted him regarding this project and got his project report. I was told that this topic is more technical and less clinical than the dental project, so I didn’t know if I would like the topic. Surprisingly, I found it really interesting and has great implications. Indranil studied the effect of class imbalance using images in the IRMA database and applied the random forest model. By manually changing the sample size of one class, he found that as the proportion of the imbalanced set goes up, the overall accuracy of the model decreases, while the accuracy for the imbalanced class increases. I found it interesting and useful, as class imbalance can be very common in any dataset, especially in medical imaging. Studying its effect can help identify this issue when machine learning is applied to assist with medical imaging.

I then met with Indranil on the possible projects on this topic, and the most natural one would just be investigating which method can better mitigate the class imbalance problem – as a continuation after studying its effects. Next, I researched on any existing literature on this topic specifically in medical imaging, and very little was found. The most commonly used methods for class imbalance include over-sampling, under-sampling, and changing the weight for the imbalanced class coefficient in the cost function. I met with Dr. Tyrrell, he liked the idea for my project, and suggested that I focus on these 3 main methods (mentioned above).

I am excited about my project (and most importantly, really interested). I decide to ask for the code that Indranil used to do the image preprocessing and creating imbalanced classes as a starting point. For my next steps, I’m also planning to learn more about the different methods in addressing this problem as well as how to code in Python.

Looking forward to working on my project!

Wendi
Sep.28, 2018

Upcoming Medical Imaging – Artificial Intelligence Workshop in Calgary!

MiDATA will be offering a free Mi-AI workshop on Tuesday December 11th from 8:30 am-12:00 noon at the Alberta Children’s Hospital, Calgary.

The focus will be on introducing participants to the concepts of AI, deep learning, and machine learning in medical imaging research.

Workshop objectives will include answering the following questions:
-How to design a research question for AI?
-I have an idea, a laptop, and a few scans. Am I good to go?
-What do I need to get started?
-What is currently being used like AI techniques for medical applications?

Squeezing in a Little Time for ML this Past Summer: John Valen’s Experience

My name is John Valen. Having recently completed my undergraduate degree in statistics and economics here at U of T, and soon moving on to pursue my Master’s in statistics in Europe, the Medical Imaging Volunteer Internship program seemed almost tailored to my goal of getting valuable research experience within a constrained time window. Over the course of only several months this summer, I’ve had the pleasant and enriching experience of contributing ideas and code to the project that summer ROP student Wenda Zhao undertook for the dentistry department at U of T, along with the guidance and contributions of ML lab leader Hershel Stark.

Wenda’s blog post (see here) neatly summarizes the goal of this project, one whose aim is to determine the likelihood that a misdiagnosis may occur, depending on the degree of damage to the dental plate being used for X-rays. Contributions I’ve helped make in particular include:

– Creating sparse matrix representations of the grey scale X-ray images themselves in order to economize on memory and run-time performance
– Hand-engineering features: once the artifacts (damage such as scratches, dents,
blotches, etc) were segmented out via DBSCAN, they were characterized by a variety of different metrics: size (pixel count), average pixel intensity (images are grey scale), location (relative to the center of the plate image), etc. 

– Training a K-Means algorithm to cluster segmented artifacts from the dental plate images based on these hand-engineered features, whereby clustering them in this unsupervised manner gave us insight on their properties;

And much more. If you are not familiar with this machine learning lingo, then do not worry; I was hardly exposed to it myself before I started working in this lab. I went in knowing close to nothing practical and a whole lot theoretical, and came out knowing quite a little more in the way of the first one. Fine, a lot more: or
so I like to think. It may not seem clear how my contributions can be used in the future to help answer the ultimate question. The truth is, nothing is really clear at the moment. The project is still on-going and I intend to keep up with it, making contributions remotely to it while I am away in Belgium pursuing my Master’s degree. This is the greatness of it all, the amount of flexibility we have in answering these questions leaves a lot of room for creativity and contemplation. 

All in all, from my own perspective (which has been greatly expanded over the course of the summer), the volunteer program was a perfect means to experience the sheer amount of work that is enthusiastically undertaken by serious students in answering these important questions. I hope that I too can now consider myself at the very least climbing to their ranks while I move on to other and more numerous serious pursuits in my life. 

Good luck to you all, and do not underestimate yourselves.


John Valen

Summer 2018 ROP: Wenda’s in the house!

Hello everyone, my name is Wenda Zhao. I’m starting my fourth year in September majoring in neuroscience and pathobiology. I did a research opportunity project (ROP) 399 course with Dr. Tyrrell this summer. And I’m here to share some of my experiences with you.
Today is a hot and humid Friday in southeast China, where I’m back home from school for the rare luxury of a short break before everything gets busy again. Summer is coming to an end, so is my time with Dr. Tyrrell and his incredible team, some of whom I have got to know, spent most of the summer working with and befriend. I have just handed in my report for the project I did over the past three months on the segmentation, characterization and superimposition of dental
X-ray artifacts.
And now, looking back, it was one of the best learning experiences I have ever had, through an enormous amount of self-teaching, practicing, troubleshooting, discussing and debating. As with all learning experiences, the process can be long and bewildering, sometimes even tedious; yet rewarding in the end.
 
It all began on a cold April morning, with me sitting nervously in Dr. Tyrrell’s
office, waiting for him to print out my ROP application and start off the interview. At that point, I just ended my one-year research at a plant lab and was clueless of what I was going to do for the following summer. Coming from a life science background, I went into this interview for a machine learning project in medical imaging knowing that I wasn’t the most competitive candidate nor the most suitable person to do the job. Although I tried presenting myself as someone who had had some experience dealing with statistics by showing Dr. Tyrrell some clumsy work I did for my previous lab, the flaws were immediately noticed by him. I then found myself facing a series of questions which I had no answers to and the interview quickly turned into what I thought to be a disaster for me. I was therefore very shocked when I received an email a week later from Dr. Tyrrell informing me that I had been accepted. I happily went onboard, but joys aside, part of me also had this big uncertainty and doubt that later followed me even to my first few weeks at the lab.
 
At the beginning, everything was new. I started off learning the software KNIME, an open-source data analytics platform that is capable of doing myriads of machine learning tasks. I had my first taste doing a classification problem, where we trained a decision tree model to identify a given X-ray to either be of a hand or a chest. It was a good introductory task to illustrate all the basic concepts in machine learning such as “training set”, “test set”, “input” and “output/label”. We ended up obtaining an accuracy of around 90% on the test set. That was the first time I witnessed the power of machine learning and I was totally amazed by it. I spent the next week or so watching more videos on the topic including state of the art algorithms such as convolutional neural network (CNN). While absorbing knowledge everyday was fun, I was at the same time a little lost about the future of my project. I began to realize that this experience is going to be very different from my past ones in wet labs, where a lot of the times you were already told what to do and all you need is to conduct the experiments and get the results. Here the amount of freedom that I have on my schedule, task and even the project itself was refreshing but at the same time terrifying. On retrospect, I considered myself lucky for that it was around that time of lost when the Faculty of Dentistry proposed a collaboration with us, which ended up being my project for the summer.
 
The dentistry project, as we so called, concerns a type of dental X-ray sensor called Phosphor Storage Plates (PSPs) which are very commonly used because of its easy placement in the oral cavity and the resulting minimum discomfort. The sensors, however, can accumulate damages over time, which would show up in the final image as artifacts with various appearances. Such artifacts could get in the way of diagnosis; thus, the plates need to be discarded before it’s too damaged. But how damaged is too damaged? For the moment, nobody has answers to that. Our goal is to use machine learning to learn the relationship between artifacts and whether they would affect diagnosis. Eventually, we can use that model to make predictions for a given plate and offer dentists advice as in when to discard it. The entire project is huge and the part we played in this summer mainly contributes as preparatory work. We segmented the artifacts from the image and clustered them into five groups based on 9 hand-engineered features. This characterization of the single artifacts can serve as the input for the model. We also created a library of superimposed images of artifact masks and real teeth backgrounds to mimic images taken with damaged sensors in real clinical settings. We did this so that dentists can take a look at these images and give a diagnosis. Comparing that with the true diagnosis, we can obtain the labels for whether a given artifact will affect diagnosis or not. And this will be the output of the model. The testing of these images is currently underway, and the results will be available in early September for further analysis.
 
With the project established and concrete goals ahead, the feeling of uncertainty
gradually went away. But it was never going to be easy. There were times when
we hit the bottleneck; when our attempts have failed miserably; when we had to give up on a brilliant idea because it didn’t go our ways. But
after stumbling through all the challenges and pitfalls, we found ourselves new. I was a bit lost at the beginning of this summer. But over the summer I learned
a lot about the very cool and growingly crucial field of machine learning; I grew a newfound appreciation for statistics and methodology; I picked up the programming language python, which I had been wanting to do for years and, most importantly, I did more thinking than I ever would if I were to just follow instructions blindly. And in the end, I believe that science is all about thinking. So for you guys out there reading the blog, if you’re coming to this lab from a totally different background and not entirely sure about the future, don’t be afraid. And I hope you find what you come here looking for, just like I did.
 
Finally, I want to thank the people who’s helped me along the way and who’s made the lab such an enjoyable place: Hershel, Henry, Rashmi, John and Trevor; and last but not least, Dr. Tyrrell, without whose kindly offer and guidance I would never have had such an amazing experience. Here’s to an unforgettable summer and a strong start of the new school year. Cheers!
 
Wenda Zhao

Indranil Balki Receives Undergraduate Research Fund Prize

Recently from our centre, Indranil Balki, under the supervision of Dr. Pascal Tyrrell, received the Undergraduate Research Fund Prize, a prestigious, semi-annual award presented for innovative research at the University of Toronto. The grant has helped to fund the purchase of a Graphics Processing Unit (GPU) at the Data Science unit in the Department of Medical Imaging. The GPU will add versatility and flexibility to the machine learning tools available for students and staff at the lab – supporting projects that leverage AI in medical image analysis and aid in the investigation of broader issues ranging from class imbalance to sample size determination in machine learning.

Indranil is enrolled in medical school at the University of Toronto and recently completed his undergraduate degree in Statistics & Biology. His research experiences in Prof. Tyrrell’s units inspire Indranil to leverage data science, including machine learning, database management and cost-effectiveness analysis to improve clinical care.

From YSP to Hanging Out at Stanford: Michelle Cheung

Hello! My name is Michelle Cheung and I am a rising 2nd year student at the University of Toronto. I was one of the Youth Summer Program (YSP) students in Dr. Pascal Tyrrell’s lab in the summer of 2016. During the program, I helped with the Medical Imaging Network Enterprise Project by surveying patients at Sunnybrooks hospital for their perspectives on sharing medical images for research.
Before entering Pascal’s lab in 2016, I took part in YSP the summer before in 2015. It was my two years in the summer program that made me aware of U of T. Being able to live in the dorms, attend classes and labs, and explore the city made me fall in love with the campus, especially the fast-paced metropolitan city life in contrast to the suburban life back home in California. More importantly, through the program, I was exposed to the lab environment. Of course, it was more than the allure of lab coats and micropipettes, but my time in the labs sparked my interest in research, hence am now pursuing genomics and hoping to learn more about hereditary diseases. Thus, when it came down to deciding which college to attend, all these factors placed U of T high up on the list.
Near the beginning of second semester of my first year, I started thinking about what to do over the summer. I couldn’t waste the 4 months and knew I needed the exposure and experience in professional labs if I plan on becoming a genetics researcher, hence started looking for research internships.
I was offered an internship position at the biopharmaceutical company, AbbVie, back in California, and it was quite an interesting experience applying for the position. I thought the first phone interview went decent but I was aware that I didn’t express enough interest in a particular aspect of research associated with the position. A month later, I interviewed a second time. It went really well until the interviewer said, “Let me ask you a challenging question.” I was expecting a deep theoretical question, and it ended up being, “Introduce yourself and your career goals in Cantonese.” In all fairness, my auditory skills are on point and I can understand conversational Cantonese, however, truthfully, my speaking skills had grown too rusty after not speaking it at home anymore. Hence, in my response, I managed to fluently get out my name, age, and school. I tried talking about my hobbies; trying to say “hiking with friends” turned out in me saying “taking walks with friends”, and “baking” turned out to me saying “cooking”. I was stumped when trying to describe my career goals as I blanked on how to say genetics and research and complicated bio words. Least to say, the awkward silence as I tried to come up with the right thing to say was mortifying. Little did I know that the interviewer would become my current manager (great guy), but hey, he hasn’t brought up the mortifying experience and I now have an embarrassing interview story to tell and a lesson learned.
Meanwhile, my parents connected with a family friend who was a scientist at Stanford. She was looking for a student research trainee to help her with her research project studying pulmonary disease, working with mice, and it was a fitting role for me.
I found out I was accepted to the research internship at AbbVie and luckily, the timing works out with my shadowing at Stanford. One internship would give me more practical lab experience while the other would give me a taste of the bio corporate industry. Hence, it’s the best of both worlds this summer – getting to experience both academic and industry research.
All in all, I am here today, about 1.5 months into the research internships, and having a blast. I had a wonderful first year of undergrad, and as I reflect, am very grateful for my time in YSP for bringing me to U of T and exposing me to the medical research world.     
 
-Michelle Cheung

My Past and Future at U of T: Helena Lan’s Perspective

 



Hey everyone, it’s been a while since I posted here. In case you don’t remember me – my name is Helena Lan, and I started in Professor Pascal Tyrrell’s group as a ROP299 student. Fast forward to the present, I have finished my specialist program in pharmacology, and will be graduating with an Honours Bachelor of Science degree later this month! But if you think that I am finally leaving U of T – nope, my journey is not over yet. This August, I will be living my dream of many years as I start my MD training at U of T! As I prepare to begin the next chapter of my life, I wanted to share with you how my involvement in Prof. Tyrrell’s group paved the way for me achieving my goal today.

At the end of my first year of undergrad, I connected with Prof. Tyrrell and took on a project investigating how the choice of non-invasive imaging modality for diagnosing carotid stenosis impacts patient care (check out my experience here https://www.tyrrell4innovation.ca/2014/08/helena-lan-summer-2014-rop.html).
Afterwards, I continued on as a research assistant, where I ­explored the need for statistics and research methodology training in the medical imaging department.  My early research endeavours showed me that research was not just pipetting; there is a diversity of research that can drive innovations and improve patient care. 
That being said, I also wanted to experience working in a wet lab setting. So upon completing my second year of undergrad, I ventured to the Karolinska Institute in Sweden to investigate the tumour killing mechanism of Natural Killer cells (find out more about my project here https://www.tyrrell4innovation.ca/2015/02/who-is-going-to-karolinska-institute.html). After a summer in basic science research, I decided to switch gears into translational research, where I worked on strategies to augment the therapeutic utility of stem cells and enhance the drug delivery platforms at Prof. Jeff Karp’s lab at Brigham and Women’s Hospital, Harvard Medical School. After I returned from Boston, my passion for discovering ways to improve existing treatments for diseases led me to my current work at Dr. Albert Wong’s lab at CAMH, where I am assisting with the characterization of a novel animal model for schizophrenia with the ultimate goal of using it as a screening platform for new anti-psychotics.
In my experiences as a researcher, I’ve always been very excited at the prospect that what I am working on right now may be brought into the clinic sometime down the road and offer benefits to patients. Then one day, I thought to myself, “How rewarding would it be if I can get involved in patient care, where I can directly impact the life of the person sitting in front of me?” With this idea planted in my mind, I decided to shadow a physician. As I observed how a doctor applies their scientific knowledge and the findings from medical research to figure out ways to best help their patients, my attraction to medicine gradually evolved. For a long time, my goal in life has been to make a positive impact on other people’s lives. But after that shadowing experience, I realized that I wanted to do so through taking on the role of a clinician.
I am incredibly grateful to the U of T medical school for giving me the opportunity to pursue my dream, as well as the pharmacology department and New College for their recognition of my undergrad academic achievements with the Dr. Walter Roschlau Memorial award and the Tricia L. Carroll Memorial Prize in the Life Sciences. But more importantly, thank you to U of T for the unforgettable undergrad experience. Not only was I able to immerse myself in fascinating science and interesting research, I was also connected with mentors who provided unconditional support to me along my journey. Even though the ROP project I worked on under the supervision of Prof. Pascal Tyrrell and Dr. Eli Lechtman ended years ago, the two of them have provided invaluable mentoring to me even to this day.
University can seem arduous at times, and it is almost inevitable that we run into obstacles here and there. But no matter how difficult the circumstances may be, never, ever, lose sight of your goal. Surround yourself with people who cheer you on, and invest the work that is necessary to reach your ambition. And one day, your dream will come true!  
All the best,
Helena Lan

New GPU

Woohoo!! The new GPU in our lab is up and running!

Here’s the specs!
CPU: Intel i7 8th gen, 6-core 12-thread
RAM: 32Gb DDR4 3400 MHz, upgradeable to 64Gb
Storage: 500Gb M2 SSD, 6TB internal HDD
GPU: 2 NVIDIA GeForce GTX 1080Ti 11Gb
OS: Ubuntu 16.04