Blog – Page 7 – Tyrrell4innovation

May 19, 2021May 28, 2021

Jacky Wang’s ROP399 Journey

My name is Jacky Wang, and I am just finishing my third year at the University of Toronto, pursuing a computer science specialist. Looking back on this challenging but incredible year, I was honoured to have the opportunity to work inside Dr. Tyrrell’s lab as part of the ROP399 course. I would love to share my experience studying and working inside the lab.

Looking back, I realize one of the most challenging tasks is getting onboard. I felt a little lost at first when surrounded by loads of new information and technologies that I had little experience with before. Though feeling excited by all the collision of ideas during each meeting, having too many choices sometimes could be overwhelming. Luckily after doing more literature review and with the help of the brilliant researchers in the lab (a big thank you to Mauro, Dimitri, and of course, Dr. Tyrrell), I start to get a better view of the trajectories of each potential project and further determine what to get out from this experience. I did not choose the machine learning projects, though they were looking shiny and promising as always (as a matter of fact, they turned out to be successful indeed). Instead, I was more leaning towards studying the sample size determination methodology, especially the concept of ill-posed problems, which often occur when the researchers make conclusions from models trained on limited samples. It had always been a mystery why I would get different and even contrasting results when replicating someone else’s work on smaller sample sizes. From there, I settled the research topic and moved onto the implementation details.

This year the ROP students are coming from statistics, computer science and biology etc. I am grateful that Dr. Tyrrell is willing to give anyone who has the determination to study in his lab a chance though they may have little research experience and come from various backgrounds. As someone who studies computer science with a limited statistics background, the real challenge lies in understanding all the statistical concepts and designing the experiments. We decided to apply various dimension reduction techniques to study the effect of different sample sizes with many features. I designed experiments around the principal component analysis (PCA) technique while another ROP student Jessica explored the lasso and SES model in the meantime. It was for sure a long and memorable experience with many debugging when implementing the code from scratch. But it was never more rewarding than seeing the successful completion of the code and the promising results.

I feel lucky and grateful that Dr. Tyrell helped me complete my first research project. He broke down the long and challenging research task into clear and achievable subgoals within our reach. After completing each subgoal, I could not even believe it sent us close to the finished line. It felt so different taking an ROP course than attending the regular lessons. For most university courses, most topics are already determined, and the materials are almost spoon-fed to you. But sometimes, I start to lose the excitement of learning new topics, as I am not driven by the curiosity nor the application needs but the pressure of being tested. However, taking the ROP course gives me almost complete control of my study. For ROP, I was the one who decides what topics to explore, how to design the experiment. I could immediately test my understanding and put everything I learned into real applications.

I am so proud of all the skills that I have picked up in the online lab during this unique but special ROP experience. I would like to thank Dr. Tyrrell for giving me this incredible study experience in his lab. There are so many resources out there to reach and so many excellent researchers to seek help from. I would also like to thank all members of the lab for patiently walking me through each challenge with their brilliant insights.

Jacky Wang

May 19, 2021June 2, 2021

MiWord of the Day Is… dimensionality reduction!

Guess what?

You are looking at a real person, not a painting! This is one of the great works by a talented artist Alexa Meade, who paints on 3D objects but creates a 2D painting illusion. Similarly in the world of statistics and machine learning, dimensionality reduction means what it sounds like: reduce the problem to a lower dimension. But only this time, not an illusion.

Imagine a 1x1x1 data point living inside a 2x2x2 feature space. If I ask you to calculate the data density, you will get ½ for 1D, ¼ for 2D and 1/8 for 3D. This simple example illustrates that the data points become sparser in higher dimensional feature space. To address this problem, we need some dimensional reduction tools to eliminate the boring dimensions (dimensions that do not give much information on the characteristics of the data).

There are mainly two approaches when it comes to dimension reduction. One is to select a subset of features (feature selection), the other is to construct some new features to describe the data in fewer dimensions (feature extraction).

Let us consider an example to illustrate the difference. Suppose you are asked to come up features to predict the university acceptance rate of your local high school.

You may discard the “grade in middle school” for its many missing values; discard “date of birth” and “student name” as they are not playing much role in applying university; discard “weight > 50kg” as everyone has the same value; discard “grade in GPA” as it can be calculated. If you have been through a similar process, congratulations! You just performed a dimension reduction by feature selection.

What you have done is removing the features with many missing values, the least correlated features, the features with low variance and one of the highly correlated. The idea behind feature selection is that the data might contain some redundant or irrelevant features and can be removed without losing too much loss information.

Now, instead of selecting a subset of features, you might try to construct some new features from the old ones. For example, you might create a new feature named “school grade” based on the full history of the academic features. If you have been through a thought process like this, you just performed a dimensional reduction by feature extraction

If you would like to do a linear combination, principal component analysis (PCA) is the tool for you. In PCA, variables are linearly combined into a new set of variables, known as the principal components. One way to do so is to give a weighted linear combination of “grade in score”, “grade in middle school” and “recommend letter” …

Now let us use “dimensionality reduction” in a sentence.

Serious: There are too many features in this dataset, and the testing accuracy seems too low. Let us apply dimensional reduction techniques to reduce overfit of our model…

Less serious:

Mom: “How was your trip to Tokyo?”

Me: “Great! Let me just send you a dimensionality reduction version of Tokyo.”

Mom: “A what Tokyo?”

Me: “Well, I mean … photos of Tokyo.”

I’ll see you in the blogosphere…

Jacky Wang

June 15, 2020

My name is Yiyun Gu and I am a fourth-year student studying mathematics and statistics at University of Toronto. After taking some statistical courses and machine learning courses, I was quite interested in applying machine learning methods and statistical methods to practice. Medical imaging is a popular field where machine learning methods have great impacts. Therefore, I contacted Dr. Pascal Tyrrell and he would like to supervise me.

Last September, my initial research direction was Bayesian optimization on hyperparameters of Convolutional Neural Networks based on the previous model information and the distributions. Besides Dr. Pascal Tyrrell’s instruction, he introduced his graduate student who was also interested in this field. We had weekly meetings to discuss how to make the idea implementable. I read many papers and learned relevant knowledge of Gaussian process, acquisition functions and surrogate functions. However, there was a huge challenge on how to update the hyperparameters of the prior distribution based on the information from the CNNs model. I was anxious about the progress. Dr. Pascal Tyrrell encouraged me to shift the direction a little bit because he cared about what a student learned and felt about the project.

Since November, out of interest in Bayesian concepts, I have been working on a project about comparing frequentist CNNs and Bayesian CNNs for the projects with sample size restrictions. Because there might not be sufficient data in medical imaging, I would like to determine whether Bayesian CNNs would benefit from prior information for small datasets and outperform frequentist CNNs. Bayesian CNNs update the distributions of weights and bias while frequentist CNNs use point estimates. The resources of the codes of Bayesian CNNs were limited. I tried to make full use of and modify the codes so that I could run the experiments from training sample size equal to 500 to training sample size equal to 50000. I applied customized architectures and AlexNet to MNIST and CIFAR-10 datasets. I found out that Bayesian CNNs didn’t perform well as I expected. Frequentist CNNs achieved higher accuracy and took less time compared to Bayesian CNNs. However, there is an interesting feature of Bayesian CNNs. Bayesian CNNs incorporate uncertainty measure. Since Bayesian CNNs have the distributions of weights, the models can also output the distributions of outputs. Therefore, Bayesian CNNs could tell how confident the decision is made.

I hope to apply more architectures of Bayesian CNNs to more datasets in medical imaging projects because architectures and datasets have great influences on the performance. Also, I would like to try more prior distributions and learn how to determine which distributions are more appropriate.

I had great research experience in this project with Dr. Pascal Tyrrell’s guidance and other graduate students’ help. It was my first time to write scientific report. Dr. Pascal Tyrrell kept instructing me how to write the report and offered great advice. I really appreciated the guidance and enjoyed the unique research experience in the end year of my undergraduate life. I look forward to contributing to medical imaging research and more opportunities to apply machine learning methods!

Yiyun Gu

June 15, 2020

Amar Dholakia: Some Thoughts as I Wrap Up My STA498Y Project (and Undergrad!)

Hi everyone! I’m Amar Dholakia and I’m a fourth-year/recent graduate having majored in Neuroscience and Statistics, and am starting a Masters’ in Biostatistics at UofT in the fall of 2020. I’ve had the pleasure of being a part of Dr. Tyrrell’s lab for almost two years now and would like to take the opportunity to reflect on my time here.

I started in Fall 2018 as a work-study student, tasked with managing the Department of Medical Imaging’s database. A highlight was discussing and learning about my peers’ work, which sparked my initial interest in the field of artificial intelligence and data science.

The following fall, I began a fourth-year project in statistics, STA498Y under the supervision of Dr. Tyrrell. My project investigated the viability of clustering of image features to assess dataset heterogeneity on deep convolutional network accuracy. Specifically, I compared the behaviour of six clustering algorithms to see if the choice of algorithm affected the ability to capture heterogeneity.

My project started out with reaching out to my labmate and good friend Mauro Mendez, who had recently undertaken a project very similar to mine. He sent me his paper, which I read, and re-read, and re-re-read… It took me about four months to only begin to grasp what Mauro had explored, and how I could use what he had learned to develop my project. But months of struggle was definitely worth the “a-ha!” moment.

First I started by replicating Mauro’s results using Fuzzy K as a clustering to make sure I was on the right track. Reading, coding, and testing the very first time was a nightmare – I had some Python experience but had never applied it before. It took a lot of back and forth with Mauro and Dr. Tyrrell , a lot of learning, understanding, and re-learning what I THOUGHT I understood to get me on the right track. By the start of the Winter term, I had finally conjured preliminary results – banging my head on the wall was slowly becoming worth it.

Once I had the code basics down, getting the rest of the results was relatively smooth sailing. I computed and plotted changes in model accuracy with sample size, and heterogeneity in model accuracy with sample size, as captured by different clustering methods. My results for one model were great from the get go – I was set! I thought to challenge myself by generalizing to a second model – and that was far from easy. But by taking that extra challenge, I felt I learned more about my project, and importantly, how to scientifically justify my results. The results didn’t match up, and I had to support my rationale with evidence (from the literature). If I couldn’t find an explanation, I may have done something incorrectly. And lo and behold, my ‘inexplicable’ results were in fact due to human error – something I very painstakingly troubleshooted, but now I understand much more and justify.

Ultimately, we showed that regardless of clustering technique, or CNN model, clustering could effectively detect how heterogeneity affected CNN accuracy. To me, this was an interesting result as I expected vastly different behaviour between partition-based and density-based

clustering. Nonetheless, it was welcome, as it suggested that any clustering method could be used to assess CNN.

I struggled most with truly appreciating what my research aimed to solve. I attribute this partially to not being as proactive with my readings and questions to Dr. Tyrrell to really verify my understanding. And to be honest, exploring this project is still a work-in-progress – something I will continue learning about this summer!

My advice to any future students – read, read, read! Diving into a specific academic niche is truly a wonderful experience. The learning curve was steep and initially involved a lot of trying, failing, fixing, and then trying again. But this experience only reinforced my notions of “success through failure” and “growth through struggle”. It may be challenging at first, but with some perseverance and support from a wonderful PI – like Dr. Tyrrell – you’ll be able to accomplish so much more than you originally imagined.

May 23, 2020

Sharing Medical Images for Research: Patients’ Perspectives

Michelle was our second YSP student this summer and did a great job at particpating in one of our studies in looking at patients’ willingness to share their medical images for research. This study is also part of the MiNE project.

Here is what Michelle had to say:

“My name is Michelle Cheung and I am a rising senior at Henry M. Gunn High School in Palo Alto, California. In my free time, I love to bake, read, travel with family, and take Barre classes. I also enjoy volunteering with friends at local charitable events and the Key Club at school. I am very interested in human biology and hope to study genetics and biotechnology next fall.

I really enjoyed the three weeks with the YSP Research Program. I learned so much about medical imaging modalities and had the amazing opportunity of helping research assistants survey patients at the Sunnybrook Hospital for the MiNE project. At first, it was a little daunting, but over time, I became more confident and comfortable interacting with patients, and grew to love surveying. The continuous surveying each day highlights the aspect and importance of repetition in conducting scientific research. Above all, it was an absolute pleasure getting to know the MiDATA and VBIRG lab. I’m grateful to my mentors and the lab members for exposing me to a whole new lab world I never thought existed beyond the traditional wet labs.”

Great job Michelle!

Have a peek at Michelle’s award winning poster and…

… I’ll see you in the blogosphere.

Pascal Tyrrell

May 23, 2020

Wow! What a Busy Summer….

Over 20 students in the lab this summer beavering away at some great projects. Last week my two Youth Summer Program (University of Toronto) students finished their three week stay with us.

Jenny and Michelle both did fantastic work.

Today Jenny will show you her poster entitled:“Comparing Healthy and Unhealthy Carotid Arteries”

Jenny Joo is from Richmond Hill, Ontario, entering her senior year of high school. She plans on studying life science at the University of Toronto in the future. She spent the last 3 weeks in U of T’s YSP Medical Research program, where she was placed in two different medical imaging labs: The MiDATA lab of U of T and the Vascular Biology Imaging Research lab at Sunnybrook Hospital.
Jenny chose to do research on the MRI scans of the carotid artery because it focused on both research and clinical aspects and had this to say about her experience with us: “It has been an enriching 3 weeks working with my PI, Pascal Tyrrell, my mentors, John Harvey and Moran Foster, and the rest of the research group.”

Great work Jenny Joo!

Have a peek at her poster and…
… I’ll see you in the blogosphere.

Pascal Tyrrell

April 20, 2019May 22, 2020

Wendi in ROP399: Learning How the Machine Learns…and Improve It!

Hi everyone! My name is Wendi Qu and I’m finishing my third year in U of T, majoring in Statistics and Molecular Genetics. I did a ROP399 research project with Dr. Pascal Tyrrell from September 2018 – April 2019 and I would love to share it with you!

Artificial intelligence, or AI, is a rapidly emerging field becoming ever so popular nowadays, with exponentially increasing research published and companies established. Applications of AI in numerous fields has greatly improved efficacy and convenience, including facial recognition, natural language processing, medical diagnosis, fraud detection, just to name a few. In Dr. Tyrrell’s lab in the Department of Medical Imaging, the gears have been gradually switched from statistics to AI in the past two years for research students. With a Life Science and Statistics background, I’ve always been keen on learning the applications of statistics/data science in various medical fields to benefit both doctors and patients. Having done my ROP299 in Toronto General Hospital, I realized how rewarding it was to use real patient data to study disease epidemiology and how my research can help inform and improve future surgical and clinical practices. Therefore, I was extremely excited when I found out Dr. Tyrrell’s lab and really grateful for this amazing opportunity, where I can go one step further and do AI projects in the field of medical imaging.

Specifically, my projects focused on how to mitigate the effect of one of the common problems in machine learning – class imbalance. So, what is machine learning? Simply put, we feed lots of data to a computer, which has algorithms that find patterns in those data and use such patterns to perform different tasks. Classification is one of the common machine learning tasks, where the machine categorizes data to different classes (eg. categorizes an image to “cat” when shown a cat image). A common problem in medical imaging and diagnosis is that there’s way more “normal” data than “abnormal” ones. A machine learning model predicts more accurately when trained on more data, and the shortage of “abnormal” data, which are the most important ones, can impair the model’s performance in practice. Hence, finding methods to address this issue is of great importance. My motivation for doing this project largely comes from how my findings can offer insights on how different methods behave when training sets have different conditions, such as the severity of imbalance and sample size, which can be potentially generalized and help better implement machine learning in practice.

However, as with any research project, the journey was rarely smooth and beautiful, especially when I started with almost zero knowledge in machine learning and Python (us undergraduate statisticians only use R…). Starting off by doing a literature search, I realized many methods have been suggested to rectify class imbalance, with two main approaches being re-sampling (i.e. modify the training set) and modifying the cost function of the model. Despite many research done on this topic, I found that such methods were almost never studied systematically to assess their effect on training sets of different natures. The predecessor of this project, Indranil Balki, studied the effect of the class imbalance systematically by varying the class imbalance severity in a training set and see how model performance can be affected. Building on this, I decided to apply different methods to such already established imbalanced datasets and test for model improvement. Because more data lead to better performance, I was also curious if there’s a difference in how much different methods can improve the model in smaller and larger training sets.

One of the hardest parts of the project was making sure I was implementing the methods appropriately, and simply writing the code to do exactly what I want it to do. The latter part sounds simple but becomes really tricky when dealing with images in a machine learning context, and is again, even more challenging if you know nothing about Python… ! After digging into more literature, consulting “machine learning people” in the lab (a big shoutout to Mauro, Ahmed, Ariana, and of course, Dr. Tyrrell), I was able to develop a concrete plan, where I implement oversampling methods via image augmentation only when the imbalanced class has fewer images than other classes, and apply under sampling only when imbalanced class has more images; class weights in the cost function will also be adjusted as another method.

However, implementing them was a huge challenge. I self-learned Python by taking courses in Python, machine learning, image modification, random forest model, and anything that’s relevant to my project on Datacamp, a really useful website offering courses in different coding languages. Through this process and using Indranil’s code as a skeleton, I was finally able to implement all my methods and output the model’s prediction accuracy! It was a long, painful process which involved constant debugging, but it was never more rewarding to see the code finally run smoothly and beautifully!

This wonderful journey has taught me many things – not only have I taken my first step in machine learning, it again reminded me of the most valuable part of doing research, which combines independence, creativity, self-drive, and collaboration. Deciding on a topic, finding a gap, developing your own creative solutions, being motivated to learn new things and conquer challenges, and collaborating with intelligent people surrounding you, are the most invaluable experiences for me this year. Finally, I would love to thank all the amazing people in the lab, especially Mauro, whose machine learning knowledge, coding skills and humour were always there with me, and Dr. Pascal Tyrrell, with more questions back to us when we come with a question, enlightening advice, and a great personality. I appreciate
his amazing experience, and it has inspired me to delve deeper into machine learning and healthcare!

Wendi Qu

April 20, 2019May 22, 2020

Dianna McAllister’s ROP Adventures in the Tyrrell Lab!

My name is Dianna McAllister and I am approaching the finish of my second year at University of Toronto, pursuing a bioinformatics specialist and computer science major. This year I was given the incredible opportunity to work in Dr. Tyrrell’s lab for the ROP299 course.

I have just handed in my first ever formal research paper for my work in Dr. Tyrrell’s lab. My project observed the effectiveness of using grad-CAM visualizations on different layers in a convolutional neural network. Though the end results of my project were colourful heat maps placed on top of images, the process to get there was not nearly as colourful or as effortless as the results may seem. There was lots of self-teaching, debugging, decision-making and collaboration that went on behind the scenes that made this project difficult, but much more rewarding when complete.

My journey in Dr. Tyrrell’s lab began when I first started researching ROP projects. I can still remember scrolling through the various projects, trying to find something that I thought I would be really passionate about. Once I happen upon Dr. Tyrrell’s ROP299, I could feel my heart skip a beat- it was exactly the research project that I was looking for. It explained the use of machine learning in medicine, specifically medical imaging. Being in bioinformatics, this project was exactly what I was looking for; it integrated biology and medicine with computer science and statistics. Once I saw this unique combination, I knew that I needed to apply.

After I applied, I was overjoyed that I had received an interview. When I attended the interview, I was very excited to show Dr. Tyrrell my interest in his research and explain how my past research would help me with this new project. But once I walked into his office, it was unlike any other interview I had ever had; he was able to point out things about myself that I had barely even realized and asked me many questions that I had no answer to. I remember walking out of that interview feeling disappointed as I thought that there was no way I would get a position in his lab, but a few weeks later heard back that I had gotten the position! I was delighted to have the opportunity to prove to Dr. Tyrrell that he made a good choice in choosing me for the position and that I would work hard in his lab and on my project.

The night before my first lab meeting, I researched tons of information on machine learning, making sure to have- what I thought- an in-depth understand of machine learning. But after less than five minutes into the lab meeting, I quickly realized that I was completely wrong. Terms like regression, weights, backpropagation were being thrown around so naturally, and I had absolutely no idea what they were talking about. I walked out of the meeting determined to really begin understanding what machine learning was all about!

Thus began my journey to begin my project. When I decided on my project, it seemed fun and not too difficult- all I have to do is slap on some heat maps to images, right? Well as much as I felt it wouldn’t be too difficult, I was not going to be deceived just as I had before attending our first meeting; and after completion I can definitely say it was not easy! The first problem that I encountered immediately was where to start. Sure, I understood the basic concepts associated with machine learning, but I had no experience or understanding of how to code anything related to creating and using a convolutional neural network. I was fortunate enough to be able to use Ariana’s CNN model. Her model used x-rays of teeth to classify if dental plates were damaged and therefore adding damage (artifacts) to the x-rays of teeth or if the plates were functional. It took me quite some time to understand what each line of code did within the program- the code was incredible, and I could not imagine having to write it from scratch! I then began the code to map the grad-CAM visualizations (resembling heat maps) onto the images that Ariana’s model took as input. I was again fortunate enough to find code online that was similar to what I needed for my project. I made very minor tweaks until the code was functional and worked how I needed it to. Throughout this process of trying to debug my own code or figure out why it wouldn’t even begin running, Mauro was always there to help, always being enthusiastic even when my problem was as silly as accidentally adding an extra period to a word.

Throughout the process, Dr. Tyrrell was always there as well- he always helped me to remember the big picture of what my project was about and what I was trying to accomplish during my time in his lab. This was extremely valuable, as it kept me from accidentally veering off-course and focusing on something that wasn’t important to my project. Without his guidance, I would have never been able to finish and execute the project in the way that I did and am proud of.

Everything that I learned, not only about machine learning, but about how to write a research paper, how to collaborate with others, how to learn from other’s and your own mistakes and how to keep trying new ideas and approaches when it seems like nothing is working, I will always carry with me throughout the rest of my undergraduate experience and the rest of my professional future. Thank you, Dr. Tyrrell, for this experience and every opportunity I was given in your lab.

Dianna McAllister

April 20, 2019May 22, 2020

Rachael Jaffe’s ROP Journey… From the Pool to the Lab!

https://thevarsity.ca/2019/03/10/what-does-a-scientist-look-like/

My name is Rachael Jaffe and I am completing my third year in Global Health, Economics and Statistics. I had no clue what I was getting myself into this year during my ROP (399) with Dr. Tyrrell. I initially applied because the project description had to do with statistics,
and I was inclined to put my minor to the test! Little did I know that I was about to embark on a machine learning adventure.

My adventure started with the initial interview: after a quite a disheartening tale of Dr. Tyrrell telling me that my grades weren’t high enough and me trying to convince him that I would be a good addition to the lab because “I am funny”, I was almost 100% certain that I
wasn’t going to be a part of the lab for 2018-2019 year. If my background in statistics has taught me anything, nothing truly has a 100% probability. And yet, last April I found myself sitting in the department of medical imaging at my first lab meeting.

Fast forward to September of 2018. I was knee deep (well, more accurately, drowning) in machine learning jargon; from learning about the basics of a CNN to segmentation to what a GPU is. From there, I chose a project. Initially, I was just going to explore the relationship between sample size and model accuracy, but then it expanded to include an investigation in k-fold cross validation.

I started my project with the help of Ariana, a student from a lab in Costa Rica. She built a CNN that classifies dentistry PSP’s for damage. I modified it to include a part that allowed the total sample size to be reduced. The relationship between sample size and model accuracy is very well known in the machine learning world, so Dr. Tyrrell decided that I
should add an investigation of k-fold cross validation because the majority of models use this to validate their estimate of model accuracy. With further help from Ariana’s colleague, Mauro, I was able to gather a ton of data so that I could analyze my results statistically.

It was more of a “academic” project as Dr. Tyrrell noted. However, that came with its own trials and tribulations. I was totally unprepared for the amount of statistical interpretation that was required, and it took a little bit of time to wrap my head around the intersection of statistics and machine learning. I am grateful for my statistics minor during this ROP because without it I would’ve definitely been lost. I came in with a knowledge of python so writing and modifying code wasn’t the hardest part.

I learned a lot about the scientific process during my ROP. First, it is incredibly important to pick a project with a clear purpose and objectives. This will help with designing your project and what analyses are needed. Also, writing the report is most definitely a process. The first draft is going to be the worst, but hang on because it will get better from there. Lastly, I learned to learn from my experience. The most important thing as a budding scientist is to learn from your mistakes so that your next opportunity will be that much better.

I’d like to thank Dr. Tyrrell for giving me this experience and explaining all the stats to me. Also, Ariana and Mauro were invaluable during this experience and I wish them both the best in their future endeavors!

Rachael Jaffe

April 20, 2019May 22, 2020

Adam Adli’s ROP399 Journey in Machine Learning and Medical Imaging

My name is Adam Adli and I am finishing the third year of my undergraduate studies at the University of Toronto specializing in Computer Science. I’m going to start this blog post by talking a little bit about myself. I am a software engineer, an amateur musician, and beyond all, someone who loves to solve problems and treats every creation as art. I have a rather tangled background; I entered university as a life science student, but I have been a programmer since my pre-teen years. Somewhere along the way, I realized that I would flourish most in my computer science courses and so I switched programs in at the beginning of my third year.

While entering this new and uncertain phase in my life and career, I had the opportunity of meeting Dr. Pascal Tyrrell and gaining admission to his research opportunity program (ROP399) course that focused on the application of Machine Learning to Medical Imaging under the Data Science unit of the Department of Medical Imaging.

Working in Dr. Tyrrell’s lab was one of the most unique experiences I have had thus far in university, allowing me to bridge both my interest in medicine and computer science in order to gain valuable research experience. When I first began my journey, despite having a strong practical background in software development I had absolutely no previous exposure to machine learning nor high-performance computing.

As expected, beginning a research project in a field that you have no experience in is frankly not easy. I spent the first few months of the course trying to learn as much about machine learning algorithms and convolutional neural networks as I could; it was like learning to swim in an ocean. Thankfully, I had the support and guidance of my colleagues in the lab and my professor Dr. Tyrrell throughout the way. With their help, I pushed my boundaries and learned the core concepts of machine learning models and their development with solutions to real-world problems in mind. I finally had a thesis for my research.

My research thesis was to experimentally show a relationship that was expected in theory: smaller training sets tend to result in over-fitting of a model and regularization helps prevent over-fitting so regularization should be more beneficial for models trained on smaller training sets in comparison to those trained on larger ones. Through late nights of coding and experimentation, I used many repeated long-running computations on a binary classification model for dental x-ray images in order to show that employing L2 regularization is more beneficial for models training on smaller training samples than models training on larger training samples. This is an important finding as often times in the field of medical imaging, it may be difficult to come across large datasets—either due to the bureaucratic processes or financial costs of developing them.

I managed to show that in real-world applications, there is an important trade-off between two resources: computation time and training data. L2 regularization requires hyperparameter tuning which may require repeated model training which may often be very computationally expensive—especially in complex convolutional neural networks trained on large amounts of data. So, due to the diminishing returns of regularization and the increased computational
costs of its employment, I showed that L2 regularization is a feasible procedure to help prevent over-fitting and improve testing accuracy when developing a machine learning model with limited training data.

Due to the long-running nature of the experiment, I tackled my research project as not only a machine learning project but also a high-performance computing project as well. I so happened to be taking some systems courses like CSC367: Parallel Programming and CSC369: Operating Systems at the same time as my ROP399, which allowed me to better appreciate the underlying technical considerations in the development of my experimental
machine learning model. I harnessed powerful technologies like Intel AVX2 vectorization instruction set for things like image pre-processing on the CPU and the Nvidia CUDA runtime environment through PyTorch to accelerate tensor operations using multiple GPUs. Overall, the final run of my experiment took about 25 hours to run even with all the high-level optimizations I considered—even on an insane lab machine with an Intel i7-8700 CPU and an Nvidia GeForce GTX Titan X!

Overall, my ROP not only opened a door to the world of machine learning and high-performance computing for me but in doing so, it taught me so much more. It strengthened my independent learning, project management, and software development skills. It taught me more about myself. I feel that I never experienced so much growth as an academic, problem-solver, and software engineer in such a condensed period of time.

I am proud of all the skills I’ve gained in Dr. Tyrrell’s lab and I am extremely thankful for having received the privilege of working in his lab. He is one of the most supportive professors I have had the pleasure of meeting.

Now that I have completed my third year of school, I’m off to begin my year-long software engineering internship at Intel and continue my journey.

Signing out,

Adam
Adli