Step 1 in ROP399 – What’s my project?

This week I finally decided on my project topic!

During last week’s lab meeting, Dr. Tyrrell brought up some potential topics for us to choose from. This included determining the appropriate sample size for machine learning, class imbalance problem, participating in the dental project and the ultrasound project that has just been brought up.

After the lab meeting, I talked to Wenda and Ariana regarding the dental project that they have been working on. This was the project that I wanted to be in the most primarily because I intend to go to dental school after graduation, and being involved in a dental project would offer more exposure to this field. However, after the brief introduction and update on the current progress by Wenda and Ariana, I realized that there might not be much to do as a complete project. Hanatu, an independent research student, would also be working on this project, leaving fewer gaps that need to be addressed for the project. Because my expectation is to work on a project independently on a topic where there’s plenty of freedom, I decided to change gears and look at other ideas.

The class imbalance topic was the next thing that caught my interest. Indranil, who happened to be my mentor before I joined the lab, has been working on the class imbalance project before. I immediately contacted him regarding this project and got his project report. I was told that this topic is more technical and less clinical than the dental project, so I didn’t know if I would like the topic. Surprisingly, I found it really interesting and has great implications. Indranil studied the effect of class imbalance using images in the IRMA database and applied the random forest model. By manually changing the sample size of one class, he found that as the proportion of the imbalanced set goes up, the overall accuracy of the model decreases, while the accuracy for the imbalanced class increases. I found it interesting and useful, as class imbalance can be very common in any dataset, especially in medical imaging. Studying its effect can help identify this issue when machine learning is applied to assist with medical imaging.

I then met with Indranil on the possible projects on this topic, and the most natural one would just be investigating which method can better mitigate the class imbalance problem – as a continuation after studying its effects. Next, I researched on any existing literature on this topic specifically in medical imaging, and very little was found. The most commonly used methods for class imbalance include over-sampling, under-sampling, and changing the weight for the imbalanced class coefficient in the cost function. I met with Dr. Tyrrell, he liked the idea for my project, and suggested that I focus on these 3 main methods (mentioned above).

I am excited about my project (and most importantly, really interested). I decide to ask for the code that Indranil used to do the image preprocessing and creating imbalanced classes as a starting point. For my next steps, I’m also planning to learn more about the different methods in addressing this problem as well as how to code in Python.

Looking forward to working on my project!

Sep.28, 2018