Research Opportunity Program – Tyrrell4innovation

September 24, 2025September 24, 2025

Phoebe (Shih-Hsin) Chuang’s ROP299 Journey

Hi everyone! My name is Phoebe (Shih-Hsin) Chuang, and I’m a third-year Computer Science Specialist student with a minor in Statistics and a focus in Artificial Intelligence. This year, I had the opportunity to work on my first formal research project involving machine learning in the field of medical imaging. Although the experience was often stressful and full of challenges, it has definitely been one of the most meaningful and transformative learning experiences of my undergraduate academic journey so far.

Before starting this ROP, I had no prior experience in either machine learning or medical imaging. Choosing a research topic initially felt overwhelming. Formulating a good research question required a deep understanding of the current state of the field, so I spent a great deal of time reading papers to grasp major trends such as image generation, multimodal learning, image segmentation, and classification tasks. Eventually, I decided to focus on adnexal mass classification using ultrasound images from the lab.

A major challenge for this project was the small dataset size compared to those typically used in current literature. Recognizing this limitation, I explored approaches specifically designed for small data scenarios. I found that radiomics was particularly promising, especially given that deep learning models typically require large datasets to generalize well. To make my approach more nuanced, I chose not just to use extracted radiomics features in numeric form, but to generate radiomic feature maps. This allowed me to integrate them directly into convolutional neural networks, leveraging CNNs’ strengths in learning from images.

Although this may appear minor, aside from selecting the research topic and technical exploration, one of the biggest lessons I learned was the importance of keeping my code, folders, and documentation organized. Without a clear structure from the beginning, it became very easy to get lost, especially when I paused work for a few days. If I could redo the project, I would definitely prioritize setting up a consistent, organized structure early on to save a lot of confusion and debugging time later.

Looking back, I am deeply grateful to Dr. Tyrrell for offering me this invaluable research opportunity. Through weekly meetings, Dr. Tyrrell emphasized that the primary goal of this experience was not simply achieving great results, but learning the full research process, from identifying gaps in knowledge to formulating research questions and hypotheses, designing experiments, and performing rigorous statistical analyses (since this was a statistics department course!). I would also like to sincerely thank Noushin, our postdoc, whose insightful feedback and support helped me greatly in refining my research questions and overcoming challenges during implementation. Finally, I want to thank everyone else in the lab for their encouragement, shared experiences, and thoughtful suggestions during meetings. It was both inspiring and motivating to see everyone’s projects evolve alongside mine.

This ROP journey has definitely been a steep but rewarding learning curve. It has brought me one step closer to becoming an independent researcher, and I look forward to carrying the skills, mindset, and resilience I built this year into my future research and career endeavours.

September 24, 2025September 24, 2025

Xin Lei’s Personal Reflection

Hi! I’m Xin Lei! I was a second-year Computer Science Specialist and Molecular Genetics major student when I began my ROP with Professor Tyrrell.

My project focused on developing a framework that uses Latent Diffusion Models (LDMs) to generate high-fidelity gastrointestinal (GI) medical images from segmentation masks.

I trained a two-stage pipeline: first, a VQ-GAN model to encode the structure of unlabeled GI images into a latent space and then conditioned a Latent Diffusion Model on segmentation masks to generate corresponding realistic GI tract images. To enhance anatomical diversity, I also designed a novel mask interpolation pipeline to create intermediate anatomical configurations, encouraging the generation of diverse and realistic segmentation-image pairs. It was challenging to tackle the challenge of synthesizing new, varied, and coherent medical images for segmentation tasks, and to push beyond the limitations of existing inpainting and stitching-based generation methods.

Overall, it was a lot of paper reading, GitHub repositories visited, and overnight coding session, all of which would have been impossible without Professor Tyrrell’s continual support and advice! My biggest mistake was not spending enough time reading about the best current methods for solving my problem of interest. Indeed, countless hours would have been saved, if I had found the right repositories and research papers earlier, where others had already implemented parts of the ideas I was trying to build!

Reflecting on my ROP journey, the most difficult part was avoiding the endless rabbit holes of technical optimizations. I would often find myself spending days obsessing over marginal model improvements, investigating every possible architectural tweak or hyperparameter adjustment I could think of. While these deep dives were fun and intellectually stimulating, they were dangerous because no project could ever be delivered on time if perfection was the only goal.

I owe a huge thanks to Professor Tyrrell, who repeatedly pulled me back out of these tangents and helped me refocus on moving the project forward. His guidance taught me one of the most valuable lessons of research: perfect is the enemy of good. A deliverable, working project is far more valuable than an imaginary, flawless one stuck in perpetual revision.

In the end, I am proud of what I accomplished, not just technically, but also in learning how to think more strategically about research. This experience has cemented my excitement about applying AI to real-world medical problems, and I am deeply grateful to Professor Tyrrell and the MiDATA lab for giving me this incredible opportunity.

I can’t wait to see where this journey will take me next!

Xin Lei Lin

September 24, 2025September 24, 2025

Nathan Liu’s STA299 Journey

Hi everyone! My name is Nathan Liu, and I am currently a second-year student at the University of Toronto, specializing in Statistics. From May to August 2025, I had the privilege of conducting an independent research project under the supervision of Dr. Pascal Tyrell. I am deeply grateful for his guidance throughout this journey. This was my first time having an independent research experience in data science, and it proved to be both challenging and rewarding. I would love to share some of the lessons I learned during this summer.

At the core of my project, I focused on the problem of automated grading of knee osteoarthritis (KOA) using deep learning. While recent work has shown promising results, the classification of Kellgren–Lawrence grade 2 (KL2) remains particularly unreliable. My study explored how self-supervised learning (SSL), specifically SimCLR embeddings, could be used to relabel ambiguous KL2 cases and improve classification performance. I designed four experimental pipelines: a baseline, a hard relabeling approach, a confidence-based relabeling approach, and a weighted loss strategy. Along the way, I incorporated quantitative evaluations such as bootstrap confidence intervals and McNemar’s test to assess improvements in KL2 reliability.

Before joining this project, I was already interested in the medical applications of machine learning, but I had never worked directly with this kind of research. I still remember my first lab meeting: Dr. Tyrell introduced a wide range of ongoing projects on different diseases, and I felt both excited and overwhelmed by the amount of new information. He warned us that the beginning would be the most difficult stage, but I underestimated just how challenging it would be. As I started exploring public databases, I quickly realized that many were incomplete, with missing labels and ambiguous annotations. This left me uncertain about how to begin. At this stage, I am thankful for the help I received from Noushin and Dr. Tyrell, as well as advice from a previous student in the lab. Their input helped me realize that I needed to commit to working with my own chosen dataset and design a study that I could take full ownership of.

During the research process, I encountered multiple challenges. The KL grading system itself is inherently noisy, and KL2 is especially difficult to identify consistently. On top of that, my dataset was imbalanced, which made model training unstable. Technically, training SimCLR models was not straightforward—convergence was slow, embeddings were difficult to interpret, and results were often not what I expected. Under Dr. Tyrell’s guidance, I learned to compare different baseline models, and switching from ResNet to EfficientNet immediately improved performance. He also encouraged me to experiment with visualization approaches beyond clustering, which eventually led me to explore spatial distance methods for relabeling KL2 cases. Noushin provided very practical advice on tuning SimCLR hyperparameters to maximize feature learning, which was critical to stabilizing my experiments. Throughout this process, I gained a new appreciation for how problem-solving in research often requires a mix of independent exploration, peer support, and careful reading of the literature.

Looking back, I am especially grateful for the structure of weekly lab meetings. They pushed me to stay disciplined, improve my efficiency, and keep refining my research plan. Just as importantly, they gave me the chance to see how other students tackled projects in different medical domains. I was struck by how many of us faced similar problems—unstable models, imperfect data, unexpected results—and it was reassuring to realize I was not alone. Watching others troubleshoot their difficulties often gave me ideas for my own work.

Overall, this project taught me valuable lessons both technically and personally. On the technical side, I became much more comfortable with self-supervised learning, parameter tuning, and methods for quantifying and visualizing results. On the personal side, I developed patience, resilience, and the ability to adapt when experiments did not go as planned. I also improved my academic writing skills and learned how to present my findings in a structured and convincing way. Most importantly, I am thankful to Dr. Tyrell for his constructive advice whenever I felt uncertain, and to Noushin for patiently answering many of my technical questions—even the simplest ones. I also want to thank my peers and all the lab members for their support, encouragement, and good company. This experience has not only strengthened my skills but has also made me more confident about pursuing research in medical imaging and machine learning in the future.

September 21, 2025September 21, 2025

Qifan Yang’s Personal Reflection

My name is Qifan Yang, and I am an incoming third-year student with Statistics Major and Mathematical Applications in Finance and Economics Specialist at the University of Toronto. This past summer, I had the opportunity to work on an ROP299 research project with Professor Tyrrell, and I would like to share my four-month journey in research, a completely new experience for me.

When I started, I was a complete novice in medical imaging and unfamiliar with the full process of scientific research. Before our first meeting, I felt quite nervous. I still remember Professor Tyrrell, during the interview, warning me about the potential challenges ahead. Coming from a statistics and mathematics background, I initially found both machine learning concepts and medical terminology quite intimidating. Although I had completed a few Kaggle courses, I lacked hands-on experience with building models from raw datasets and running end-to-end training and testing.

My research journey began along two paths: first, learning the fundamentals of machine learning and medical imaging, where review papers became my best starting point, and second, exploring rheumatic heart disease (RHD) and its potential for automated diagnosis using transthoracic echocardiography (TTE). The first obstacle I encountered was the lack of publicly available, large-scale datasets for RHD with detailed labels. This led me to pivot toward studying image quality in TTE, since I found a large echocardiography database with quality labels. However, a second challenge soon emerged: I struggled to identify a research question that was both technically meaningful and scientifically impactful.

This is where Professor Tyrrell’s mentorship made all the difference. In one group meeting, he mentioned severe motion blur he had observed in knee ultrasound images. That sparked the idea for my project: detecting and correcting non-uniform motion blur in echocardiography using deep learning. This was the turning point when the project truly began to take shape.

The real research work involved splitting and labeling datasets, designing a neural network model, training and testing on GPUs, and visualizing and evaluating results. Each of these steps was entirely new to me, requiring both technical learning and persistent problem-solving. I am deeply grateful for the guidance of Professor Tyrrell, as well as the support from Giuseppe, Noushin, and other members of the lab, including previous students whose work provided valuable reference points.

By the end of the summer, I had taken full charge of the project, running it from start to end. This responsibility taught me far more than technical skills. I developed a stronger sense of self-motivation, learned to manage my time effectively, and built the resilience needed to handle research setbacks. I realized that research is not just about repetitive lab work; it is about thinking critically, asking meaningful questions, and telling a compelling story through data and results.

The experience was more than an introduction to the research world; it taught me to think boldly and work carefully. I learned not to let ideas live only in conversation or in my head, but to translate them into small, testable experiments that turn speculation into evidence. Each modest prototype, whether a quick data split, a minimal model, or a rough visualization, sharpened my questions, exposed constraints, and informed the next step. Gradually, those incremental wins compounded into a coherent pipeline and credible results. The discipline I gained is simple but powerful: think wild, start small, measure honestly, and move steadily. This balance of wild curiosity with careful craftsmanship now guides how I approach complex, unfamiliar problems, and it’s the mindset I’ll carry into future research and professional work.

September 20, 2025September 20, 2025

Winnie Ye in STA299

This was my first course related to research, and also my first time working with medical imaging. When I heard that we would be doing independent research, I immediately realized that this course would undoubtedly be a great challenge for me. Independent research meant there was no clear “standard answer”; instead, I had to explore and persist on my own.

At the beginning of my ROP project, I was actually the first student in the class to finalize a research direction. I quickly chose skin tone bias in melanoma detection as my topic and decided to work with the ISIC dataset. At that time, I felt well prepared: even though I noticed that dark-skin samples were rare, I believed the number would be “enough.” I even imagined finishing the project in less than two months.

But soon, reality hit me. Out of more than 30,000 ISIC images, there were almost no dark-skin cases. After that, I kept switching datasets: PAD, Fitzpatrick17k, MSKCC. However, each of them had serious problems: some had almost no melanoma cases, some had almost no dark-skin samples, some images contained a lot of background noise rather than just lesions, and some lacked skin tone labels altogether. Even when I combined them, the total number of dark-skin melanoma images was barely more than one hundred. During that period, I felt like I was constantly “starting over,” and every time I thought I had found a breakthrough, it quickly fell apart.

In this struggle, I tried almost everything I could think of. I trained my own U-Net, experimented with CLIP, SVM, EfficientNet, and ResNet; I tested light-skin-trained models directly on dark-skin data; I even used YOLO to crop lesions in order to reduce background noise. My research focus also shifted again and again: from melanoma, to pigmented lesions, and finally to red scaly diseases; and my tasks shifted from classification to segmentation and back again. Altogether, I must have attempted more than a dozen different approaches, yet none of them produced satisfactory results.

As the deadline drew closer, my anxiety grew stronger. By the last month, despite all the models, tasks, and research objects I had tried, I still had no meaningful results to show. At times I felt completely lost, unsure of what else I could even do. In desperation, I wrote Dr. Tyrrell a very long email, confessing that I might not be able to continue and even considered abandoning the project altogether. I told him that if I could start over, I would never choose to study bias so hastily, but would first spend more time carefully understanding the limitations of the datasets.

That month was probably the hardest part of the entire ROP. I stayed up late almost every day, exhausted and anxious, sometimes even afraid to run my code because I expected yet another failure. Dr. Tyrrell was sometimes worried and even a bit frustrated, which made me feel sad, but I was also deeply grateful that he cared so much. In the final weeks, Giuseppe also began to support me more closely, and I truly appreciated his help. During that time, even the smallest result—no matter how unrepresentative—felt important enough for me to immediately share with Dr. Tyrrell and Giuseppe for feedback.

Finally, near the very end, something changed. About ten days before the deadline, I obtained a result that was still imperfect, but at least demonstrated a sign of bias. It was not a breakthrough, but it was enough to build a conclusion. In the last week, I focused on writing the report, experimenting with bias-mitigation methods, and managed to finish everything just in time.

Looking back on these four months, I went through so many emotions: the early excitement of being “ahead,” the anxiety of being overtaken, the regret and despair of repeated failures, and the relief of a small last-minute success. If you ask me what kept me going, I honestly don’t know, perhaps the support from Dr. Tyrrell and Giuseppe, perhaps the stubborn voice in my head saying “try one more time,” or perhaps just a little bit of luck.

Through this course, I developed a new understanding of medical imaging and machine learning: they are not only technical problems but also involve fairness, data limitations, and persistence throughout the research process. I realized that the true value of research is not in quickly achieving a perfect result, but in continuously experimenting, reflecting, and learning from failures. In the future, I hope to further explore fairness in medical imaging, especially to investigate why my findings differed from previous studies and how I can avoid or better explain such discrepancies. I believe this will not only help me improve my research methods but also allow me to move forward more confidently on my academic path.

September 4, 2024

Yan Qing Lee’s ROP299 Journey

Hi! I’m Yan Qing Lee, an incoming 3rd-year Computer Science and Psychology double major undergraduate student. This past summer, I was given the opportunity to embark on my first research project in the field of artificial intelligence, and I’m excited to share my experience.

My research topic investigated if individuals who receive a false-positive mammogram result by an AI model have a higher risk of receiving a breast cancer diagnosis later on. Past studies have found that receiving a false-positive mammogram result from radiologists is associated with a higher risk of future breast cancer, but no studies have yet investigated if this holds true for AI breast cancer detection models. In this project, I used a longitudinal dataset of breast cancer mammograms, and ran a trained AI breast cancer classifier, made of an ensemble of 4 Convnext-small models, to obtain false-positive and true-negative results. Cox proportional hazards models were then used to investigate the hazard ratio of receiving a false-positive result, from both the AI model, and from radiologists.

As a student who entered the Computer Science major out-of-stream, I started the ROP feeling really out of place. Although I’ve known I wanted to pursue AI, I had no real experience in neither AI nor medical imaging, and I wondered if I was too under-qualified for this experience. Still, I was determined to put in as many hours as I needed to succeed.

I first began by familiarizing myself with ML terms, and choosing an area of interest (breast cancer mammography) to formulate a research question upon. As I’m sure other ROP students would agree, this process was extremely challenging; as weeks passed by, I found that my research questions were always either over-ambitious or not feasible. Over time, however, I realized that my difficulty with creating a research question stemmed from my lack of knowledge in exactly how ML models work, and the existing literature and gaps within the field of breast cancer mammography. As I dug deeper into existing literature, the one interesting finding regarding radiologists’ false-positives caught my eye, and this finally led me to my research question.

Once I began working on my project, the many challenges of research revealed themselves to me. This included difficulties of downloading and parsing through a large dataset, of installing packages and working around incompatible versions of libraries to set up a working environment, and, worst of all, of finding out an AI breast cancer detection model you originally centered your project around is not as replicable as you assumed it would be. Despite that I made sure to set up my research question to be relatively simple, the process of setting up, debugging preprocessing code, training and running an AI breast cancer classification model and obtaining undesirable training results was nothing short of complicated. Still, with the weekly lab meetings keeping me on track, and the support of Dr. Tyrrell, Mauro and the other students in the lab, I slowly but surely overcame every obstacle, and learned immense amounts every week to successfully complete my project. Even though I had to find a new AI model to use near the end, and redo my experimentation, I found that with my experience with the previous AI model, I was now able to independently set up and run the new model much more efficiently than before. It was proof of how much I’d learned, and I’m glad to now be able to look back and be proud of how much I’ve accomplished in the span of a few months.

At the end of it all, I have to thank Dr. Tyrrell for fostering my passion towards AI and its applications in fields as impactful and important as breast cancer mammography. This experience only made me more excited to delve into the applications of AI in other fields in the future, and I can’t thank the MiData lab enough for this experience.

September 3, 2024September 3, 2024

Yuxi Zhu’s ROP Journey

Hi, I am Yuxi Zhu, a Bioinformatics and Computational Biology specialist and Molecular Genetics Major who just finished my second year. Like most people, this is my first formal research experience. Professor Tyrrell warned me from the start that I would need to be independent in this lab, but my genuine interest in ML and its applications gave me the confidence to take on the challenge. Overall, this summer’s ROP journey in the MiDATA lab was filled with both excitement and challenges.

The first challenge was finding a research question. I’m incredibly grateful to Daniel, a volunteer and former ROP student, who introduced me to the concept of “adversarial examples” and helped me formulate my research question from the start. During the first two months of the literature review, I often found myself diving too deeply into theoretical aspects that were less applicable to Medical Imaging, or exploring questions that, while feasible, didn’t capture my interest. Luckily, I was able to settle down with understanding the differential effects between random perturbations (like random noise and loss of resolution) and non-random adversarial perturbations on the model.

As the project progressed, I encountered a series of obstacles and bugs that required constant problem-solving and debugging. For example, my initial findings showed very low performance, all under 50%. Professor Tyrrell pointed out that the accuracy of a binary classifier should never drop below 50%, as that would mean it’s performing worse than a random model. I quickly realized there were bugs in my code and implementation. Additionally, after obtaining results, I thought interpreting them would be straightforward. However, when Professor Tyrrell asked me why adversarial perturbations led to accuracies below 50% while the others didn’t, I found myself at a loss for words. In the end, with Professor Tyrrell’s guidance, I was able to interpret the results correctly and articulate them in my report.

Despite the stress I felt before presenting my findings at our weekly meetings, these sessions became invaluable learning experiences. Professor Tyrrell would scrutinize my work with questions and critiques, pushing me to think more deeply and critically about every aspect of my research. The other lab members also provided very helpful insights and shared their work. These meetings not only allowed me to understand what others were working on but also gave me the chance to get involved in or observe lively discussions that often took place.

Looking back on the last few months, this experience has been invaluable. I am deeply thankful to Professor Tyrrell who offered me this wonderful opportunity in ML and guided me through my research project. I especially appreciate how we weren’t just taught to implement a given research project or conduct a specific experiment; we were taught how to find gaps and how to conduct research. I also want to express my gratitude to Daniel for his support and insights when I was in doubt, and to Atsuhiro for his helpful suggestions. Completing my first-ever research project was challenging yet rewarding, and I am grateful for all the guidance and help I received. I’m confident that what I have learned will stay with me in my future research and career.

September 2, 2024September 2, 2024

Jingwen (Lisa) Zhong’s ROP299 Journey

Hi all! My name is Jingwen (Lisa) Zhong. I’m a Data Science Specialist and Actuarial Science Major at UofT, graduating in 2026. I’m really happy and honored to have joined Prof. Tyrrell’s lab in the summer of 2024 as an ROP299 student. This was my first research project, and it has truly exercised many of my research and scientific skills, such as literature review, critical thinking, and the ability to get familiar with a brand-new field.

Coming into the lab, I had no research experience and no prior knowledge of medical imaging. As a student just finishing my second year of study, I felt curious about machine learning and artificial intelligence because these topics are so widely discussed. However, I still can’t forget how uneasy I felt during the first few weeks as I tried to think of a research question related to medical images and machine learning. I’m incredibly thankful to Prof. Tyrrell, who ‘relentlessly’ pointed out issues during each lab meeting, and to the lab volunteers, Daniel and Atshuhiro, who were always willing to help and guide me through the process. I couldn’t have gotten my project ready for implementation without their support. After a month of struggle, I finally settled on my research topic: investigating whether LPIPS is a better metric for assessing the similarities of medical images compared to PSNR and SSIM under various degradation conditions.

Having a research question is just the beginning; implementing it is another huge mountain to climb. I remember how excited I was when my research question was finally approved. I worked hard that week to implement almost all the code for my project. If I could go back, I would approach this differently. Instead of diving straight into coding, I would first take the time to design the entire study process—splitting the dataset, testing the code on a smaller dataset, figuring out how to use the GPU, then applying the code to the full dataset, and finally choosing the appropriate statistical analysis. I say this because I stumbled at each of these steps. After completing my code, I found that it ran so slowly that it would take several days to get results. So, I began the process of figuring out how to set up the environment to run on the lab’s GPU. This process took me almost two weeks, but with the help of other ROP students, I finally got the code running on the GPU.

Once the GPU problem was solved, my results came in much faster. However, the next obstacle was interpreting these results. As a Data Science student, it’s hard to admit, but I hadn’t yet learned ANOVA. Initially, I turned to ChatGPT for help, but the results weren’t ideal. Prof. Tyrrell suggested that I use SAS to perform ANOVA, which provided me with ideal and comprehensive results. So, I learned how to use SAS—a very powerful statistical analysis tool compared to Python.

Through this ROP experience, I learned the importance of communication and teamwork. Although we worked on different projects, the weekly lab meetings were incredibly helpful. It was a place where everyone’s intelligence came together, and I always left with new insights and a clear plan in mind.

Overall, this journey has been a steep learning curve but an immensely rewarding one. I am grateful for the opportunity to work with such a supportive team, and I know that the skills and lessons I’ve learned will continue to guide me in my future research endeavors.

January 27, 2024

Mason Hu’s ROP Journey

Hey! I am Mason Hu, a Data Science Specialist and Math Applications in Stats/Probabilities Specialist who just finished my second year. This summer’s ROP journey in MiDATA lab has been an enlightening journey for me, marking my first formal venture into the world of research. Beyond gaining insight into the intricate technicalities of machine learning and medical imaging, I’ve gleaned foundational lessons that shaped my understanding of the research process itself. My experience can be encapsulated in the following three points:

Research is a journey that begins with a wide scope and gradually narrows down to a focused point. When I was writing my project proposal, I had tons of ideas and planned to test multiple hypotheses in a row. Specifically, I envisioned myself investigating four different attention mechanisms of UNet and assessing all the possible combinations of them, which was already discouraged by Prof. Tyrrell in the first meeting. My aspirations proved to be overambitious, as the dynamic nature of research led me to focus on some unexpected yet incredible discoveries. One example of this would be my paradoxical discovery that attention maps in UNets with residual blocks have almost completely opposite weights to those without. Hence, for a long time, I delved into the gradient flows in residual blocks and tried to explain the phenomenon. Even when time is limited and not all ambitious goals can be reached, the pursuit of just one particular aspect can lead to spectacular insights.

Sometimes plotting out the weights and visualizing them gives me the best sparks and intuitions. This is not restricted to visualizing attention maps in this case. The practice of printing out important statistics and milestones in training models might usually yield great fruition. I once printed out each and every one of the segmentation IoUs in a validation data loader, and it surprised me that some of them are really close to zero. I tried to explain this anomaly as model inefficacy, but it just made no sense. Through an intensive debugging session, I came to realize that it is actually a PyTorch bug specific to batch normalization when the batch size is one. As I go deeper and deeper into the research, I get a better and better understanding of the technical aspects of machine learning and discover better what my research objectives and my purpose are.

Making models reproducible is a really hard task, especially when configurations are complicated. In training a machine learning model, especially CNNs, we usually have a dozen tunable hyperparameters, sometimes more. The technicality of keeping track of them and changing them is already annoying, let alone reproducing them. Moreover, changing an implementation to an equivalent form might not always produce completely equivalent results. Two seemingly equivalent implementations of a function might have different implicit triggers of functionalities that are hooked to one but not the other. This can be especially pronounced in optimized libraries like PyTorch, where subtle differences in implementation can lead to significantly divergent outcomes. The complexity of research underscores the importance of meticulous tracking and understanding of every aspect of the model, affirming that reproducibility is a nuanced and demanding facet of machine learning research.

Reflecting on this summer’s research, I am struck by the depth and breadth of the learning that unfolded. I faced a delicate balance between pursuing big ideas and focusing on careful investigation, always keeping an eye on the small details that could lead to surprising insights. Most importantly, thanks to Prof. Tyrrell, Atsuhiro, Mauro, and Rosa for all the feedback and guidance. Together, they formed a comprehensive research experience for me. As I look to the future, I know that these lessons will continue to shape my thinking, guiding my ongoing work and keeping my curiosity alive.

January 27, 2024January 27, 2024

Lucie Yang’s STA299 Journey

Hello! My name is Lucie Yang, and I am excited to share my experience with my ROP project this summer! I’m heading into my second year, pursuing a Data Science specialist. While I have been interested in statistics for a long time, I was not sure exactly what field to pursue. Over the past year, I became fascinated with machine learning and decided to apply to Prof. Tyrrell’s posting, despite being in my first year and not having any previous experience with machine learning or medical imaging. To my surprise, I was accepted and thus began my difficult, yet incredibly rewarding journey at the lab.

I remember Prof. Tyrrell had warned me during my interview that the research process would be challenging for me, but still, I was excited and confident that I could succeed. The first obstacle I encountered was choosing a research project. Despite spending hours scrolling through lessons on Coursera and YouTube and reading relevant papers to build my understanding, I struggled to come up with a topic that was feasible, novel, and interesting. I would go to the weekly ROP meetings thinking I had come up with a brilliant idea, only to realize that there was some problem that I had not even considered. After finally settling on an adequate project, I was met with another major obstacle: actually implementing it.

My project was about accelerating the assessment of heterogeneity on an X-Ray dataset with Fourier-transformed features. Past work done in the lab had shown that cluster analysis of features extracted from CNN models could indicate dataset heterogeneity, therefore, I wanted to explore whether the same would hold for Fourier-transformed features and whether it would be faster to use them. With the help of a previous student’s code, implementing the CNN pipeline was relatively straightforward; however, I struggled to understand how to apply the Fast Fourier Transform to images and extract the features. As deadlines loomed near and time was quickly ticking away, I was unsure of whether my code was even correct and became very frustrated. Prof. Tyrrell and Mauro gave me immense help, helping me refine my methodology and answering my many questions. After that, I was able to get back on track and thankfully, completed the rest of my project in time.

I learned a lot from this journey, far more than I have in any class I’ve taken, from the exciting state-of-the-art technologies being developed to the process of conducting research and writing code for machine learning. Above all, I gained a deeper appreciation of the bumpy road of research, and I am incredibly grateful to have had the opportunity to get a taste of it. I am very thankful to all the helpful lab members, and I look forward to continuing my journey in data science and research in the coming years!

Lucie Yang