Linxi Chen’s STA399 Journey

Hi everyone! My name is Linxi Chen and I’m finishing my third year at the University of Toronto, pursuing a statistics specialist and a mathematics major. I did an STA399 research opportunity program with Professor Pascal Tyrrell from May 2021 – August 2021 and I am very grateful that I can have this opportunity. This project provided me an opportunity in understanding machine learning and scientific research. I would love to share my experience with you all.

Initially, I had no experience with machine learning and convolutional neural network and integrating machine learning with medical imaging was a brand-new area for me. Therefore, at the beginning of this project, I searched loads of information on machine learning to gain a general picture of this area. The first assignment in this project was to make a slide deck on machine learning in medical imaging. Extracting and simplifying the gathered information helped me understand this area more deeply. 

My research project is to find out an objective metric for heterogeneity and explore how dataset heterogeneity will affect heterogeneity as measured by the CNN training image features with sample size. At first, how to specifically define the term “heterogeneity” was a big challenge for me, since there are various kinds of definitions on Google and there was so little information that was directly related to my project. By comparing the information on websites and talking with Professor Tyrrell in the weekly meeting, I managed to define the term “between-group heterogeneity” as the extent to which the measurements of each group vary within a dataset, considering the mean of each subgroup and the grand mean of the population. Next, designing the experiment setup was also challenging, because I have to ensure the experiment steps are applicable and explicable. The datasets were separated into different groups according to the label of each image. We introduced new groups into the dataset while keeping the total sample size the same in each case. There was a total of four cases and the between-group heterogeneity was measured using Cochran’s Q which is a test statistic based on chi-squared distribution. The experiment setup was modified several times, because problems came up from time to time. For example, I planned to use multi-classification CNN model at first, but it showed that I have to use different number of output neurons for the model in different cases, which made the results not comparable. Professor Tyrrell and Mauro suggested I use a binary-classification model with pseudo label, which successfully solved this problem. Luckily, I found some code on the website and with the help of Mauro, I managed to come up with the code that was applicable to use. Next came the hardest part that I encountered. Although idealistically my expectation could be explained very well, still the output results I got were not what I expected. After modifying the model and the sample size several times, I finally managed to get the expected results.

Overall, I have learned a lot from this ROP program. With the guidance of Professor Tyrrell and the help of students in the lab, I have gained an in-depth understanding of machine learning, neural network and the process of scientific research. Also, I have become more familiar with writing a formal scientific research paper. The most valuable thing that I got from this experience is the ability of problem-solving and never be frustrated when things get wrong. I would like to thank Professor Tyrrell for giving me this opportunity to learn about scientific research and helping me overcome all the challenges that I encountered during this process. I’m very grateful that I have gained so many valuable skills in this project. Also, I would like to thank Mauro and all the members in the lab for giving me so much help with my project.

Linxi Chen

Jacqueline Seal’s Journey in ROP299: Simulating clinical data!

Hi! My name is Jacqueline, and I’m going into my second year at U of T, pursuing a major in Computer Science and a specialist in Bioinformatics. This past summer, I had the opportunity to do a STA299 project with Professor Tyrrell through the Research Opportunity Program, and I’m excited to share my experiences here!

My ROP project for the summer dealt with the simulation of clinical variables relevant to detecting intra-articular hemarthrosis – basically bleeding into the joint – among patients with hemophilia, a disease where patients lack sufficient clotting proteins and are prone to regular, excessive bleeding. Since hemophilia is quite rare, clinical data is often unavailable and so simulation can help us understand what the real data might look like under different sets of plausible assumptions. The ultimate goal was to demonstrate that adding clinical data to Mauro’s existing binary CNN classifier for articular blood detection could boost model performance, as compared to a model trained exclusively on ultrasound images.

Having just completed first year, I went into this ROP with a very limited statistics background and was initially overwhelmed by all the stats jargon being used in lab meetings and in conversations with other lab members. Concepts like “odds ratios,” “ROC,” and “sensitivity analysis” were completely new to me, and I spent many hours just familiarizing myself with these fundamentals. 

After a bit of a slow start, I began my project by identifying physical presentation and clinical history variables to use in my simulation. I was fortunate enough to speak with two distinguished hematologists from Novo Nordisk, Drs. Brand-Staufer and Zak, about the features most relevant to diagnosing a joint bleed. Based on this conversation, I selected two of these variables as a starting point and simulated them according to assumed distributions. Next, I simulated the probability of an articular bleed based on a logistic regression model and used that probability to simulate the “true” presence of a bleed based on a Bernoulli distribution.

Then, I took a bit of a fun detour: figuring out how to best match simulated data to real-world bleed probabilities output by Mauro’s model. With some guidance from Professor Tyrrell, I developed a matching algorithm that allowed us to control the strength of the positive correlation between clinical simulated probabilities and classifier probabilities. Perhaps the most difficult part of my project was ensuring that the simulated dataset captured the desired relationships between my explanatory variables and between each explanatory variable and the response variables. Thanks to the advice of Guan and Sylvia, however, I was able to verify these relationships and report on them in a statistically sound manner.

Despite all the obstacles I encountered along the way, despite changing the details of my methodology several times, despite making slow progress and occasionally feeling like I was going in circles, I’m very grateful to have had this opportunity. Not only did I gain a greater understanding of important statistical concepts and greater familiarity with machine learning techniques, but I also got first-hand experience navigating the research process, from beginning to end. Ultimately, my experience in the MiDATA lab was simultaneously challenging and rewarding, and I would like to thank Dr. Tyrrell for all his guidance this summer – whether it was setting up impromptu meetings to discuss unexpected issues in my data, providing feedback on my results, or simply sharing humorous anecdotes in our weekly lab meetings. Regardless of where this next year takes me, I’m confident that I’ll carry the lessons I learned this summer with me.

Jacqueline Seal

Jihong Huang’s ROP399 Journey

Hi, my name is Jihong Huang and I have finished my third year in computer science and statistics at the University of Toronto. During this summer, I had the great chance to work on my ROP399 project under the guide of Dr. Pascal Tyrell. In such a pandemic, everything was a bit different from usual, including this program. Still, I would like to share my experience and lessons from this summer with you!

After three years in the university and so many different courses in statistics and computer science, I thought that I was totally prepared to take a try in some research projects with knowledge learnt in lectures. However, it turned out that my thoughts were completely wrong! Everything was different from the lectures, where professors will teach step by step with detailed notes. I needed to create my own proposal and design the experiments, independently like a scholar instead of a student. Despite Dr. Tyrrell’s help, I struggled to figure out my schedule for the project. Such an experience was quite unique and special to me compared with time in lecture assignments.

After all the setups, I began to handle the coding part of my project. I picked YOLOv3 as my application of bounding box regression. YOLOv3 is one of the most popular bounding box regression algorithms and it already has excellent performances in many fields. At the same time, it has its complex structures and mechanisms that are longer and more complicated than any code that I have ever learnt. It looks like only the combination of classification and localization, where each single algorithm is easy to understand but the combination is much more advanced than my lectures notes! It took me weeks to roughly figure out its mechanism. Then, I devoted myself to debugging the code. That was difficult, as I was not familiar with most of the packages used. Some issues were caused by different versions of packages, while some were made by subtle wrong code. The adjustments of hyperparameters were also annoying as I usually could not find the optimal solutions for them. Thanks to the great help from Mauro, I finally made my code work on the server successfully.

At the end of the whole trip in my project, I gained a lot of advanced knowledge about bounding box regression and many relating packages, which I would probably never touch before my graduation if I did not take this project. However, my most precious lessons are not about any specific coding ability. The most important lesson is what scientific research is and how it should be done. I learnt that it is very important to make a clear and specific proposal as the plan in the beginning as it would provide the guidelines for any further experiments on coding. Otherwise, it would be easy to go off track and lose the initial goal when thousands of lines of code overwhelm. Also, there could always be failures in scientific research. I spent more than half of my time making and fixing mistakes during the project, which frustrated me a lot in the process. My final conclusion was suggesting that the algorithm selected was not performing well. But they were all common in scientific research. As we learn from failures, the failures are meaningful, and we could make further progress based on them. Thanks to the help from Dr. Tyrrell and all other lab members, it was them that helped me out of frustration during the project and offered me valuable advice.

After this project of three months, I learnt a lot from my first try in the world of scientific research, including coding skills and scientific spirits. This experience provided me with important guidance on my future direction of study and I think all the time and efforts are worthwhile.

– Jihong Huang

Rui Zhu’s ROP399 Journey

I am Rui Zhu. I’ve just completed my third year in the computer science program. I’ve been working in Dr. Tyrrell’s lab on my ROP399 project in the past summer, which is a new and wonderful experience for me.

When I am writing down this reflection, and at some other decision-making moment in the future, it reminds me of the interview with Dr. Tyrrell, where he asked me why I chose his lab and why he chose me. I had tons of reasons for choosing his lab. However, honestly, it was hard for me to put up a whole sentence to answer why he would choose me. “I haven’t done research before, and everything needs a start,” I remember I said unconfidently, “so I need this chance to see if I am really interested in it and see how it goes”. Fortunately, I received Dr. Tyrrell’s offer a few days after the interview, and my very first research experience started.

My ROP project is on imperfect gold standard, which is the consensus of the readers. More specifically, the project is about training models on dataset labelled by readers who make mistakes. At first, I started by reading a lot of papers on robust learning. However, when I had my first meeting with Dr. Tyrrell and Atsuhiro, who kindly helped me with my project, I could not answer what is the definition of imperfect gold standard and why we need consensus of the readers. Atsuhiro helped me out. He explained the problem in real-world applications, where multiple readers annotate a huge dataset without looking at each other’s labels, because it is time-consuming and costly. I learnt the lesson that doing research starts by thinking why I am doing this rather than thinking how to do it. I kept getting questions like why my project is meaningful.

After sorting my mind, I began writing my premise, purpose, hypothesis, and objectives. I thought it was difficult to write up a whole page for these things, but after finding out that I should not assume people know why I am doing the project, I explained everything to readers in my introduction. It was easier than I thought to put up a whole page. After finishing my premise, purpose, hypothesis, and objectives, I combined them together to be a full introduction. Everything flowed like water.

When I was writing the actual code for my project, not many difficulties were met, as I was getting help from Atsuhiro and Mauro. I wanted to thank them for their help. Mauro taught me how to use Pytorch Lightning, which structured Pytorch code in a way easy to understand. Atsuhiro helped me confirm my experimental methodology and gave me guidance on what robust learning techniques to use for my project. Moreover, I started very early to familiarize myself with the code.

Overall, the journey on my summer ROP research was wonderful. I learnt how to start research from scratch and some knowledge in robust learning, although I am only scratching the surface of it. It was a pleasure for me to work in Dr. Tyrrell’s lab this summer. I look forward to what I can do in the future in the world of research.

– Rui Zhu

Qianyu Fan’s ROP299 Experience

My name is Qianyu Fan, and I finished my first year at the University of Toronto, pursuing a statistics specialist. This summer I was given the incredible opportunity to work in Dr. Tyrrell’s lab for the ROP299 course. These four months, I have gone through pain and suffering, underwent a metamorphosis, and finally reaped the fruits.

I still remembered that I promised Professor Tyrrell during the interview that I would put twice as much effort as others to complete scientific research. Even if I had no experience with machine learning and neural networks, even if I hadn’t heard of them, the professor was welcome to accept me! During the weekly meetings, terminologies were hard for me to understand, though I tried to research them afterward. And so, I began my research in a daze.

Early on, I floundered to find a focus. The topic “Compare Image Similarity” is huge, where I could do the research on many sides. For instance, we could use different distance metrics to explore the similarities between synthetic and real images. Also, whether replacing real with synthetic images will improve the model accuracy in the training process is a meaningful topic. Due to many interesting ideas for the project, I was lost, and the proposal had been constantly revised. As other ROP students were starting to write their projects, I was still stuck in the proposal and was anxious about the progress. The professor understood my situation and helped me redefine my direction, because he cared about what the students learned in the course. So, my theme was: Comparison of Two Augmentation Methods in Improving Detection Accuracy of Hemarthrosis. We used data synthesis and traditional augmentation techniques to explore and compare the recognition accuracy with increasing proportions of augmented data.

As the deadline was approaching, I had the idea of giving up due to no results. Once in the private meeting with the professor, I broke down and cried. What a shame! He gave me much support and understood my frustration. Mauro was very helpful in offering the datasets as well as allowing me to use his codes and solving my questions. Thanks to their help, my thinking became clear, and I was able to complete the project on time.

A tortuous but unforgettable journey is over. I have learned a lot of things in this ROP course, from machine learning to scientific research. This will be an asset in my life. I appreciate that the professor gave me this opportunity and that I was able to complete my project.

Qianyu Fan

Jenny Du’s ROP299 Journey: Telling apart the real and the fake!

My name is Jenny Du, and I have just wrapped up my ROP299 project in the Tyrrell Lab, as well as my second year at the University of Toronto, pursuing a bioinformatics specialist. Looking back, it was a bumpy ride, but in the end, this journey was very rewarding and has taught me a lot of things on both machine learning topics as well as the process of scientific research.

Like most of the other ROP299 students, I had no experience with machine learning and neural networks. Despite doing some research beforehand, I found myself googling what everyone was talking about during the weekly meetings (thankfully, they were online) to make sure I was not completely lost. None of my first-year courses had prepared me for these kinds of things! And so, with some uncertainties in my heart, I started my ROP journey.

I decided on my overall research topic fairly early, but the details were adjusted several times as I progressed through my project. My project is about coming up with a way to quantitatively assess a set of synthetic ultrasound images in terms of how “realistic” they look compared to the real ultrasound images. “Realism” here is defined as whether the synthetic images can be used as training images in replacement of the real images without creating too big of an impact on the machine learning algorithm. At first, I came up with a naïve proposal: I will build an algorithm that differentiates real and synthetic ultrasound images, and if the algorithm can classify the two kinds (with high accuracies), then it means that the synthetic images are not realistic, and vice versa. In the weekly meeting, Dr. Tyrrell immediately pointed out why this wouldn’t work. In my proposal, a low accuracy could mean that the synthetic and the real images are very similar, but it could also mean that the algorithm itself is terrible. For example, if my algorithm has 50% accuracy, then it is basically randomly guessing each image, like a coin toss, so its classification is unreliable, to say the least. He suggested that I look online to see how others have done it. There was very little information that directly relates to what I’m doing, but eventually I was able to come up with a plan to extract features from the images using a pre-trained CNN model and measure the cosine similarity score between two images and graph these values into a histogram to see their distribution. Dr. Tyrrell also suggested that I compare the distributions at different equivalence margins to determine how big a mean difference is acceptable.

Thankfully, I was able to find some code online that I was able to use in my project with minor changes, and I was able to produce some distribution data fairly quickly. Then, I encountered what I considered to be the hardest part of my entire project: to statistically interpret and discuss my data and create a conclusion out of it. Since I am not a statistics student, and so my knowledge of statistics is limited to one stats course I took as a part of my program requirements. It took a while for me to learn all these statistical concepts and understand why each is needed in my project.

This year was especially interesting since everything was online. Despite not being able to see each other face-to-face, I was still able to receive much support from Dr. Tyrrell and other students in the lab. Mauro was very helpful in preparing the datasets for my project as well as answering any problems related to the codes. Guan also helped to check my statistical calculations and clarifying some hard concepts. I have also made great friends with the other ROP students this year, and hopefully we will be able to see each other in person when the school re-opens.

Overall, this journey was a wonderful experience, and I have learned many things from it. Not only did I got some familiarity with machine learning topics and their application in medicine, but I have also gained experience in the general academic research process, from coming up with a topic to the actual implementation to the final reports. There were challenges along the way, but in the end, it was very rewarding. I am extremely thankful to Dr. Tyrrell for the guidance and support and am grateful for this opportunity.

Jenny Du

Parinita Edke’s ROP experience in the Tyrrell Lab!

Hi! My name is Parinita Edke and I’m finishing my third year at UofT, specializing in Computer Science with a minor in Statistics. I did a STA399Y research project with Professor Tyrrell from September 2020 – April 2021 and I am excited to share my experience in the lab!

I have always been interested in medicine and the applications of Computer Science and Statistics to solve problems in the medical field. I was looking out for opportunities to do research in this intersection and was excited when I saw Professor Tyrrell’s ROP posting. I applied prior to the second-round deadline and waited to hear back. After almost 2 weeks past the deadline, I had still not heard back and decided to follow up on the status of my application. I quickly received a reply from Professor Tyrrell that he had already picked his students prior to receiving my application. While this was extremely disappointing, I thanked Professor Tyrrell for his time and expressed that I was still interested in working with him during the year and attached my application package to the email. I was not really expecting anything coming out of this, so I was extremely happy when I received an invite to an interview! After a quick chat with Professor Tyrrell about my goals and fit for the lab, I was accepted as an ROP student!

Soon after being accepted, I joined my first lab meeting where I was quickly lost in the technical machine learning terms, the statistical concepts and the medical imaging terminology used. I ended the meeting determined to really begin understanding what machine learning was all about!

This marked the beginning of the long and challenging journey through my project. When I decided on my project, it seemed interesting as solving the problem allowed for some cool questions to be answered. The task was to detect the presence of blood in ultrasound images of the knee joint; my project was to determine if Fourier Transformation can be used to generate features to perform the task at hand well. It seemed quite straightforward at first – simply generate Fourier Transformed features and run a classification model to get the outputs, right? After completing the project, I am here to tell you that it was far from being straightforward. It was more like a zigzag progress pattern through the project. The first challenge that I faced was understanding the theory behind the Fourier Transform and how it applies to the task at hand. This took me quite some time to fully grasp and was definitely one of the more challenging parts of the project. The next challenge was figuring out the steps and the things I would need for my project. Rajshree, a previous lab member, had done some initial work using a CNN+SVM model. I first tried to replicate what Rajshree had done in order to create a baseline to compare my approach to. It took me some time to understand what each line of code did within Rajshree’s model but after I was able to get it to work, I felt amazing! Reading through Rajshree’s code gave me more experience in understanding the common Python libraries used in machine learning, so when I built my model, it was much quicker! When I ran my model for the first time, I felt incredible! The process was incredibly frustrating at times, but when I saw results for the first time, I felt like all this struggle was worth it. Throughout this process of figuring out the project steps and building the model, Mauro was always there to help, always being enthusiastic when answering any questions I had and giving me encouragement to keep going.

Throughout the process, Professor Tyrrell was always there as well – during our weekly ROP meetings, he always reminded us to think about the big picture of what our projects were about and the objectives we were trying to accomplish. I definitely veered off in the wrong direction at times, but Professor Tyrrell was quick to pull me back and redirect me in the right direction. Without this guidance, I would not have been able to finish and execute the project in the way that I did and am proud of.

Looking back at the year, I am astonished at the number of things I have learned and how much I have grown. Everything that I learned, not only about machine learning, but about writing a research paper, learning from others and your own mistakes, collaborating with others, learning from even more of my own mistakes, and persevering when things get tough will carry with me throughout the rest of my undergraduate studies and the rest of my professional career.

Thank you, Professor Tyrrell, for taking a chance on me. He could have simply passed on my application but the fact that he took a chance with me and accepted me into the course lead to such an invaluable experience for me which I truly appreciate. The experiences and the connections I have made in this lab have been a highlight of my year, and I hope to keep contributing to the lab in the future!

Parinita Edke

Stanley Hua in ROP299: Joining the Tyrrell Lab during a Pandemic

My name is Stanley Hua, and I’ve just finished my 2nd year in the bioinformatics program. I have also just wrapped up my ROP299 with Professor Pascal. Though I have yet to see his face outside of my monitor screen, I cannot begin to express how grateful I am for the time I’ve been spending at the lab. I remember very clearly the first question he asked me during my interview: “Why should I even listen to you?” Frankly, I had no good answer, and I thought that the meeting didn’t go as well as I’d hoped. Nevertheless, he gave me a chance, and everything began from there.

Initially, I got involved with quality assessment of Multiple Sclerosis and Vasculitis 3D MRI images along with Jason and Amar. Here, I got introduced to the many things Dmitrii can complain about taking brain MRI images. Things such as scanner bias, artifacts, types of imaging modalities and prevalence of disease play a role in how we can leverage these medical images in training predictive models.

My actual ROP, however, revolved around a niche topic in Mauro and Amar’s project. Their project sought to understand the effect of dataset heterogeneity in training Convolutional Neural Networks (CNN) by cluster analysis of CNN-extracted image features. Upon extraction of image features using a trained CNN, we end up with high-dimensional vectors representing each image. As a preprocessing step, the dimensionality of the features is reduced by transformation via Principal Component Analysis, then selecting a number of principal components (PC) to keep (e.g. 10 PCs). The question must then be asked: How many principal components should we use in their methodology? Though it’s a very simple question, I took way too many detours to answer this question. I looked at the difference between standardization vs. no standardization before PCA, nonlinear dimensionality reduction techniques (e.g. autoencoder) and comparisons of neural network image representation (via SVCCA) among other things. Finally, I proposed an equally simple method for determining the number of PCs to use in this context, which is the minimum number of PCs that gives the most frequent resulting value (from the original methodology).

Regardless of the difficulty of the question I sought to answer, I learned more about practices in research, and I even learned about how research and industry intermingle. I only have Professor Pascal to thank for always explaining things in a way that a dummy such as me would understand. Moreover, Professor Pascal always focused on impact; is what you’re doing meaningful and what are its applications?

 I believe that the time I spent with the lab has been worthwhile. It was also here that I discovered that my passion to pursue data science trumps my passion to pursue medical school (big thanks to Jason, Indranil and Amar for breaking my dreams). Currently, I look towards a future, where I can drive impact with data; maybe even in the field of personalized medicine or computational biology. Whoever is reading this, feel free to reach out! Hopefully, I’ll be the next Elon Musk by then…

Transiently signing out,

Stanley Bryan Z. Hua

Jessica Xu’s Journey in ROP299

Hello everyone! My name is Jessica Xu, and I’ve just completed my second year in Biochemistry and Statistics at the University of Toronto. This past school year, I’ve had the wonderful opportunity to do a ROP299 project with Dr. Pascal Tyrrell and I’d like to share my experience with you all!

A bit about myself first: in high school, I was always interested in life sciences. My favourite courses were biology and chemistry, and I was certain that I would go to medical school and become a doctor. But when I took my first stats course in first year, I really enjoyed it and I started to become interested in the role of statistics in life sciences. Thus, at the end of my first year, while I was looking through the various ROP courses, I felt that Dr. Tyrrell’s lab was the perfect opportunity to explore my budding interest in this area. I was very fortunate to have an interview with Dr. Tyrrell, and even more fortunate to be offered a position in his lab!

Though it may be obvious, doing a research project when you have no research experience is very challenging! Coming into this lab having taken a statistics course and a few computer science courses in first year, I felt I had a pretty good amount of background knowledge. But as I joined my first lab meeting, I realized I couldn’t be more wrong! Almost every other word being said was a word I’d never heard of before! And so, I realized that there was a lot I needed to learn before I could even begin my project.

I then began on the journey of my project, which was looking at how two dimension reduction techniques, LASSO and SES, performed in an ill-posed problem. It was definitely no easy task! While I had learned a little bit about dimension reduction in my statistics class, I still had a lot to learn about the specific techniques, their applications in medical imaging, and ill-posed problems. I was also very inexperienced in coding, and had to learn a lot of R on my own, and become familiar with the different packages that I would have to use. It was a very tumultuous journey, and I spent a lot of time just trying to get my code to work. Luckily, with help from Amar, I was able to figure out some of the errors and issues I was facing in regards to the code.

I learned a lot about statistics and dimension reduction in this ROP, more than I have learned in any other courses! But most importantly, I had learned a lot about the scientific process and the experience of writing a research paper. If I can provide any advice based on my experience, it’s that sometimes it’s okay to feel lost! It’s not expected of you to have devised a perfect plan of execution for your research, especially when it’s your first time! There will be times that you’ll stray off course (as I often did), but the most valuable lesson that I learned in this ROP is how to get back on track. Sometimes you just need to take a step back, go back to the beginning and think about the purpose of your project and what it is you’re trying to tell people. But it’s not always as easy to realize this. Luckily Dr. Tyrrell has always been there to guide us throughout our projects and to make sure we stay on track by reminding us of the goal of our research. I’m incredibly grateful for all the support, guidance, and time that Dr. Tyrrell has given this past year. It has been an absolute pleasure of having the experience of working in this lab.

Now that I’ve taken my first step into the world of research, with all the new skills and lessons I’ve learned in my ROP, I look forward to all the opportunities and the journey ahead!

Jessica Xu

Jacky Wang’s ROP399 Journey

My name is Jacky Wang, and I am just finishing my third year at the University of Toronto, pursuing a computer science specialist. Looking back on this challenging but incredible year, I was honoured to have the opportunity to work inside Dr. Tyrrell’s lab as part of the ROP399 course. I would love to share my experience studying and working inside the lab.

Looking back, I realize one of the most challenging tasks is getting onboard. I felt a little lost at first when surrounded by loads of new information and technologies that I had little experience with before. Though feeling excited by all the collision of ideas during each meeting, having too many choices sometimes could be overwhelming. Luckily after doing more literature review and with the help of the brilliant researchers in the lab (a big thank you to Mauro, Dimitri, and of course, Dr. Tyrrell), I start to get a better view of the trajectories of each potential project and further determine what to get out from this experience. I did not choose the machine learning projects, though they were looking shiny and promising as always (as a matter of fact, they turned out to be successful indeed). Instead, I was more leaning towards studying the sample size determination methodology, especially the concept of ill-posed problems, which often occur when the researchers make conclusions from models trained on limited samples. It had always been a mystery why I would get different and even contrasting results when replicating someone else’s work on smaller sample sizes. From there, I settled the research topic and moved onto the implementation details.

This year the ROP students are coming from statistics, computer science and biology etc. I am grateful that Dr. Tyrrell is willing to give anyone who has the determination to study in his lab a chance though they may have little research experience and come from various backgrounds. As someone who studies computer science with a limited statistics background, the real challenge lies in understanding all the statistical concepts and designing the experiments. We decided to apply various dimension reduction techniques to study the effect of different sample sizes with many features. I designed experiments around the principal component analysis (PCA) technique while another ROP student Jessica explored the lasso and SES model in the meantime. It was for sure a long and memorable experience with many debugging when implementing the code from scratch. But it was never more rewarding than seeing the successful completion of the code and the promising results.

I feel lucky and grateful that Dr. Tyrell helped me complete my first research project. He broke down the long and challenging research task into clear and achievable subgoals within our reach. After completing each subgoal, I could not even believe it sent us close to the finished line. It felt so different taking an ROP course than attending the regular lessons. For most university courses, most topics are already determined, and the materials are almost spoon-fed to you. But sometimes, I start to lose the excitement of learning new topics, as I am not driven by the curiosity nor the application needs but the pressure of being tested. However, taking the ROP course gives me almost complete control of my study. For ROP, I was the one who decides what topics to explore, how to design the experiment. I could immediately test my understanding and put everything I learned into real applications.

I am so proud of all the skills that I have picked up in the online lab during this unique but special ROP experience. I would like to thank Dr. Tyrrell for giving me this incredible study experience in his lab. There are so many resources out there to reach and so many excellent researchers to seek help from. I would also like to thank all members of the lab for patiently walking me through each challenge with their brilliant insights.

Jacky Wang