Hi everyone! My name is Tong Su, and I have just wrapped up my ROP299 project in Professor Tyrrell’s Lab, as well as my second year at the University of Toronto, pursuing a computer science specialist and statistics major. It is a great pleasure to complete my whole second-year journey along with this research experience. I have learned a lot of things about both artificial intelligence topics and the process of scientific research. I would like to share my experiences with you here.
My ROP project is the effect of compression and downsampling on the accuracy of the Convolutional Neural Network (CNN)-based histological image binary classification model. Advances in medical imaging systems have made medical images more details. They also increased the size of medical images as scarification. Compared to other images, medical images are larger and occupy more storage space. Therefore, most medical images were downsampled or compressed before they were stored. While some compressions are reversible, most of the others are irreversible. Once the image is compressed, perceptible information is lost and could not be restored. When these modified medical images are used for training machine learning algorithms, the information loss during compression may affect the algorithms’ accuracy. This study aims to investigate how compression and downsampling ratio to medical imaging affect the accuracy of CNN.
Similar to other ROP students, I decided on my research topic early by selecting from a bunch of topics in different areas. However, the focus of my research has slightly adjusted as I progressed through my project. Initially, my research topic is “Can we compress training data without degrading accuracy?”. This topic only illustrates the effect of compression on the accuracy of the algorithm and at the end of the research, I need to propose the best compression ratio that is suitable for medical images storage without much loss of accuracy.
Among all the compression types, I decided to work with JPEG2000 as it is one of the most commonly used compression types in medical imaging. The dataset chosen consisted of 100,000 different image patches from histological images of human colorectal cancer (CRC) and normal tissue. It was organized into 9 for each image. The next step is to choose the machine learning model. I decided to work on binary classification with the CNN model. The two categories were picked for the binary classification model that classifies whether a given tissue image is cancer-associated stroma (STR) (1) or is normal colon mucosa (NORM) (0).
The next step is compressing the dataset. I used Python Image Library (PIL) to compress the dataset using JPEG2000. However, the binary classification model does not support the dataset with format j2k. In this case, I needed to include another process of converting the j2k images to a type that is supported by the model. I decided to convert the image to TIFF as it is the same as the dataset’s original format.
During my research about compression, Professor Tyrrell pointed out another image size reduction method, downsampling. Although both methods are used to reduce the image size, there are some differences between them. This aroused my interest that which image size reduction method performs better than the machine learning algorithm. In that case, I started to add another purpose to my project to compare the difference between downsampling and compression and state which image size reduction method is more suitable for medical imaging.
Despite all the obstacles I encountered along the way, such as changing the dataset halfway through the project, making modifications to the model and rerunning everything, and the unexpected 54.39% error for high compression ratio, etc., I have successfully come to the end and concluded my excellent ROP experience through this reflection. Now I have a greater understanding of the process of research and deep learning algorithm. At the end of this reflection, I want to thank Professor Tyrrell for offering me this opportunity and guiding my research progress through the weekly meetings. I also want to thank Dr. Atsuhiro Hibi for providing me with endless guidance and support for the whole research project through meetings and frequent email exchange even when he was busy. Without their help, I would not be able to have such an excellence research experience.
Tong Su