Jacqueline Seal’s Journey in ROP299: Simulating clinical data!

Hi! My name is Jacqueline, and I’m going into my second year at U of T, pursuing a major in Computer Science and a specialist in Bioinformatics. This past summer, I had the opportunity to do a STA299 project with Professor Tyrrell through the Research Opportunity Program, and I’m excited to share my experiences here!

My ROP project for the summer dealt with the simulation of clinical variables relevant to detecting intra-articular hemarthrosis – basically bleeding into the joint – among patients with hemophilia, a disease where patients lack sufficient clotting proteins and are prone to regular, excessive bleeding. Since hemophilia is quite rare, clinical data is often unavailable and so simulation can help us understand what the real data might look like under different sets of plausible assumptions. The ultimate goal was to demonstrate that adding clinical data to Mauro’s existing binary CNN classifier for articular blood detection could boost model performance, as compared to a model trained exclusively on ultrasound images.

Having just completed first year, I went into this ROP with a very limited statistics background and was initially overwhelmed by all the stats jargon being used in lab meetings and in conversations with other lab members. Concepts like “odds ratios,” “ROC,” and “sensitivity analysis” were completely new to me, and I spent many hours just familiarizing myself with these fundamentals. 

After a bit of a slow start, I began my project by identifying physical presentation and clinical history variables to use in my simulation. I was fortunate enough to speak with two distinguished hematologists from Novo Nordisk, Drs. Brand-Staufer and Zak, about the features most relevant to diagnosing a joint bleed. Based on this conversation, I selected two of these variables as a starting point and simulated them according to assumed distributions. Next, I simulated the probability of an articular bleed based on a logistic regression model and used that probability to simulate the “true” presence of a bleed based on a Bernoulli distribution.

Then, I took a bit of a fun detour: figuring out how to best match simulated data to real-world bleed probabilities output by Mauro’s model. With some guidance from Professor Tyrrell, I developed a matching algorithm that allowed us to control the strength of the positive correlation between clinical simulated probabilities and classifier probabilities. Perhaps the most difficult part of my project was ensuring that the simulated dataset captured the desired relationships between my explanatory variables and between each explanatory variable and the response variables. Thanks to the advice of Guan and Sylvia, however, I was able to verify these relationships and report on them in a statistically sound manner.

Despite all the obstacles I encountered along the way, despite changing the details of my methodology several times, despite making slow progress and occasionally feeling like I was going in circles, I’m very grateful to have had this opportunity. Not only did I gain a greater understanding of important statistical concepts and greater familiarity with machine learning techniques, but I also got first-hand experience navigating the research process, from beginning to end. Ultimately, my experience in the MiDATA lab was simultaneously challenging and rewarding, and I would like to thank Dr. Tyrrell for all his guidance this summer – whether it was setting up impromptu meetings to discuss unexpected issues in my data, providing feedback on my results, or simply sharing humorous anecdotes in our weekly lab meetings. Regardless of where this next year takes me, I’m confident that I’ll carry the lessons I learned this summer with me.

Jacqueline Seal

Qianyu Fan’s ROP299 Experience

My name is Qianyu Fan, and I finished my first year at the University of Toronto, pursuing a statistics specialist. This summer I was given the incredible opportunity to work in Dr. Tyrrell’s lab for the ROP299 course. These four months, I have gone through pain and suffering, underwent a metamorphosis, and finally reaped the fruits.

I still remembered that I promised Professor Tyrrell during the interview that I would put twice as much effort as others to complete scientific research. Even if I had no experience with machine learning and neural networks, even if I hadn’t heard of them, the professor was welcome to accept me! During the weekly meetings, terminologies were hard for me to understand, though I tried to research them afterward. And so, I began my research in a daze.

Early on, I floundered to find a focus. The topic “Compare Image Similarity” is huge, where I could do the research on many sides. For instance, we could use different distance metrics to explore the similarities between synthetic and real images. Also, whether replacing real with synthetic images will improve the model accuracy in the training process is a meaningful topic. Due to many interesting ideas for the project, I was lost, and the proposal had been constantly revised. As other ROP students were starting to write their projects, I was still stuck in the proposal and was anxious about the progress. The professor understood my situation and helped me redefine my direction, because he cared about what the students learned in the course. So, my theme was: Comparison of Two Augmentation Methods in Improving Detection Accuracy of Hemarthrosis. We used data synthesis and traditional augmentation techniques to explore and compare the recognition accuracy with increasing proportions of augmented data.

As the deadline was approaching, I had the idea of giving up due to no results. Once in the private meeting with the professor, I broke down and cried. What a shame! He gave me much support and understood my frustration. Mauro was very helpful in offering the datasets as well as allowing me to use his codes and solving my questions. Thanks to their help, my thinking became clear, and I was able to complete the project on time.

A tortuous but unforgettable journey is over. I have learned a lot of things in this ROP course, from machine learning to scientific research. This will be an asset in my life. I appreciate that the professor gave me this opportunity and that I was able to complete my project.

Qianyu Fan