Christine Wang’s STA299 Journey

Hi! My name is Christine Wang, and I’m finishing my third year at the University of Toronto pursuing a specialist in statistics with a focus on cognitive psychology. The STA299 journey through the whole year has been a really amazing and challenging experience.

My research project involved assessing whether the heterogeneity of medical images affects the clustering of image features extracted from the CNN model. Initially, I found it quite challenging to understand the difference between my research and the previous work done by Mauro, who analyzed the impact of heterogeneity on the generalizability of CNN by testing the overall model performance on the test clusters. Many thanks to the discussions in the ROP meeting every week, I understood that I needed to retrain the CNN model using the images in each of the clusters in the training set to see how heterogeneity could affect the clustering of image features. By checking whether the retrained CNN models from each cluster perform differently, I was able to show that heterogeneity could affect the clustering of image features. However, the most challenging part of the research is not just achieving the desired results, but rather interpreting what I could learn from those results. For instance, even though I obtained results that showed the retrained models perform differently, I spent a lot of time trying to understand what the clusters represent and why some retrained models perform better than others. I am very grateful to Professor Pascal Tyrrell for helping me understand my project and providing me with essential advice to check the between-cluster distances. This enabled me to interpret the results and identify a possible pattern: the retrained models with similar performance come from clusters that are also close to each other. However, further research is still required because the two datasets I used were not large enough. Looking back, I realize that it would have been better if I used the dataset in our lab, as finding the appropriate dataset and code was very challenging. I would like to thank Mauro, Atshuhiro, and Tristal for their generous help in teaching me how to do feature extraction and cluster analysis.

Before starting the project, I was fascinated by the high accuracy and excellent performance of ML techniques. However, during the ROP journey, I realized that achieving high model performance is not the most important thing. As Professor Pascal mentioned, the most crucial aspect of doing research is truly understanding what we are doing and focusing on interpreting what we can learn from the results we obtain. It is not enough to just have tables and figures; we need to go further by choosing appropriate statistical analysis to understand our results.