MiWord of the Day Is… Volume Rendering!

Volumetric rendering stands at the forefront of visual simulation technology. It intricately models how light interacts with myriad tiny particles to produce stunningly realistic visual effects such as smoke, fog, fire, and other atmospheric phenomena. This technique diverges significantly from traditional rendering methods that predominantly utilize geometric shapes (such as polygons in 3D models). Instead, volumetric rendering approaches these phenomena as if they are composed of an immense number of particles. Each particle within this cloud-like structure has the capability to absorb, scatter, and emit light, contributing to the overall visual realism of the scene. 

This is not solely useful for generating lifelike visual effects in movies and video games; it also serves an essential function in various scientific domains. Volumetric rendering enables the visualization of intricate three-dimensional data crucial for applications such as medical imaging, where it helps in the detailed analysis of body scans, and in fluid dynamics simulations, where it assists in studying the behavior of gases and liquids in motion. This technology, thus, bridges the gap between digital imagery and realistic visual representation, enhancing both our understanding and our ability to depict complex phenomena in a more intuitive and visually engaging manner. 

How does this work? 

Let’s start by talking about direct volume rendering. Instead of trying to create a surface for every object, this technique directly translates data (like a 3D array of samples, representing our volumetric space) into images. Each point in the volume, or voxel , contains data that dictates how it should appear based on how it interacts with light. 

For example, when visualizing a CT scan, certain data points might represent bone, while others might signify soft tissue. By applying a transfer function—a kind of filter—different values are mapped to specific colors and opacities. This way, bones might be made to appear white and opaque, while softer tissues might be semi-transparent. 

The real trick lies in the sampling process. The renderer calculates how light accumulates along lines of sight through the volume, adding up the contributions of each voxel along the way. It’s a complex ballet of light and matter, with the final image emerging from the cumulative effect of thousands, if not millions, of tiny interactions. 

Let us make this a bit more concrete. We first have transfer functions, a transfer function maps raw data values to visual properties like color and opacity. Let us represent the color assigned to some voxel as C(v) and the opacity as α(v). For each pixel in the final image, a ray is cast through the data volume from the viewer’s perspective. For this we have a ray equation: 

Where P(t) is a point along the ray at parameter 𝑡, P0 is the ray’s origin, and is the normalized direction vector of the ray. As the ray passes through the volume, the renderer calculates the accumulated color and opacity along the ray. This is often done using compositing, where the color and opacity from each sampled voxel are accumulated to form the final pixel color. 

You probably used Volumetric Rendering 

Volumetric rendering transforms CT and MRI scans into detailed 3D models, enabling doctors to examine the anatomy and functions of organs in a non-invasive manner. A specific application includes most of the modern CT viewers. Volumetric rendering is key in creating realistic simulations and environments. In most AR applications, it is used under the hood to overlay interactive, three-dimensional images on the user’s view of the real world, such as in educational tools that project anatomical models for medical students. 

Now for the fun part (see the rules here), using volume rendering  in a sentence by the end of the day: 

Serious: The breakthrough in volumetric rendering technology has enabled scientists to create highly detailed 3D models of the human brain. 

Less Serious: I tried to use volumetric rendering to visualize my Netflix binge-watching habits, but all I got was a 3D model of a couch with a never-ending stream of pizza and snacks orbiting around it. 

…I’ll see you in the blogosphere. 

MiWord of the Day is… KL Divergence!

You might be thinking, “KL Divergence? Sounds exotic. Is it something to do with the Malaysian capital (Kuala Lumpur) or a measurement (kiloliter)?” Nope, and nope again! It stands for Kullback-Leibler Divergence, a fancy name for a metric to compare two probability distributions.

But why not just compare their means? After all, who needs these hard-to-pronounce names? Kullback… What was it again? That’s a good point! Here’s the catch: two distributions can have the same mean but look completely
different. Imagine two Gaussian distributions, both centered at zero, but one is wide and flat, while the other is narrow and tall. Clearly, not similar!

So, maybe comparing the mean and variance would work? Excellent thinking! But what if the distributions aren’t both Gaussian? For example, a wide and flat Gaussian and a uniform distribution (totally flat) might look similar visually, but the uniform distribution is not parametrized by a mean or variance. So, what do we compare?


Enter KL Divergence!

KL Divergence returns a single number that tells us how similar two distributions are, regardless of their types. The smaller the number, the more similar the distributions. But how do we calculate it? Here’s the formula (don’t worry, you don’t have to memorize it!).

Notice, if the distribution q has probability mass where p doesn’t, the KL Divergence will be large. Good, that’s what we want! But, if q has little mass where p has a lot, the KL Divergence will be small. Wait, that’s not what we want! No, it’s not, but luckily KL Divergence is asymmetric! KL(q || p) returns a different value than KL(p || q), so
we can compute both! Why are they different? I’ll leave that up to you to figure out!

KL Divergence in Action

Now, the fun part: using KL Divergence in a sentence!

Serious: Professor, can we approximate one distribution with another by minimizing the KL Divergence between them? That’s a great question! You’ve just stumbled on the idea behind Variational Inference.

Less Serious: Ladies and gentlemen, the KL Divergence between London and Kuala Lumpur is large, and so our flight time today will be 7 hour and 30 minutes. Please remember to stow your hand luggage in the overhead bins above you, fold your tray tables, and fasten your seatbelts.

See you in the blogosphere,
Benedek Balla

Mason Hu’s ROP Journey

Hey! I am Mason Hu, a Data Science Specialist and Math Applications in Stats/Probabilities Specialist who just finished my second year. This summer’s ROP journey in MiDATA lab has been an enlightening journey for me, marking my first formal venture into the world of research. Beyond gaining insight into the intricate technicalities of machine learning and medical imaging, I’ve gleaned foundational lessons that shaped my understanding of the research process itself. My experience can be encapsulated in the following three points:

Research is a journey that begins with a wide scope and gradually narrows down to a focused point. When I was writing my project proposal, I had tons of ideas and planned to test multiple hypotheses in a row. Specifically, I envisioned myself investigating four different attention mechanisms of UNet and assessing all the possible combinations of them, which was already discouraged by Prof. Tyrrell in the first meeting. My aspirations proved to be overambitious, as the dynamic nature of research led me to focus on some unexpected yet incredible discoveries. One example of this would be my paradoxical discovery that attention maps in UNets with residual blocks have almost completely opposite weights to those without. Hence, for a long time, I delved into the gradient flows in residual blocks and tried to explain the phenomenon. Even when time is limited and not all ambitious goals can be reached, the pursuit of just one particular aspect can lead to spectacular insights.

Sometimes plotting out the weights and visualizing them gives me the best sparks and intuitions. This is not restricted to visualizing attention maps in this case. The practice of printing out important statistics and milestones in training models might usually yield great fruition. I once printed out each and every one of the segmentation IoUs in a validation data loader, and it surprised me that some of them are really close to zero. I tried to explain this anomaly as model inefficacy, but it just made no sense. Through an intensive debugging session, I came to realize that it is actually a PyTorch bug specific to batch normalization when the batch size is one. As I go deeper and deeper into the research, I get a better and better understanding of the technical aspects of machine learning and discover better what my research objectives and my purpose are.

Making models reproducible is a really hard task, especially when configurations are complicated. In training a machine learning model, especially CNNs, we usually have a dozen tunable hyperparameters, sometimes more. The technicality of keeping track of them and changing them is already annoying, let alone reproducing them. Moreover, changing an implementation to an equivalent form might not always produce completely equivalent results. Two seemingly equivalent implementations of a function might have different implicit triggers of functionalities that are hooked to one but not the other. This can be especially pronounced in optimized libraries like PyTorch, where subtle differences in implementation can lead to significantly divergent outcomes. The complexity of research underscores the importance of meticulous tracking and understanding of every aspect of the model, affirming that reproducibility is a nuanced and demanding facet of machine learning research.

Reflecting on this summer’s research, I am struck by the depth and breadth of the learning that unfolded. I faced a delicate balance between pursuing big ideas and focusing on careful investigation, always keeping an eye on the small details that could lead to surprising insights. Most importantly, thanks to Prof. Tyrrell, Atsuhiro, Mauro, and Rosa for all the feedback and guidance. Together, they formed a comprehensive research experience for me. As I look to the future, I know that these lessons will continue to shape my thinking, guiding my ongoing work and keeping my curiosity alive.

MiWORD of the Day is… Residual!

Have you ever tried to assemble a Lego set and ended up with mysterious extra pieces? Or perhaps you have cleaned up after a big party and found some confetti hiding in the corners days later? Welcome to the world of “residuals”!

Residuals pop up everywhere. It’s an everyday term but it’s actually fancier than just referring to the leftovers of a meal; it’s also a term used in regression models to describe the difference between observed and predicted values, or in finance to talk about what’s left of an asset. However, nothing I mentioned compares to the role residuals played in machine learning and particularly training deep neural networks.

When you learn an approximation of a function from an input space to an output space using backpropagation, the weights are updated based on the learning rate and gradients that are calculated through chain rule. As a neural network gets deeper, you have to multiply a small value—usually much smaller than 1—multiple times to pass it to the earliest layers, making the neural network excessively hard to optimize. This phenomenon prevalent in deep learning is call the vanishing gradient problem.

However, notice how deep layers of a neural network are usually composed by mappings that are close to identity. This is exactly why residual connections do their magic! Suppose your true mapping from input to output is h(x), and let the forward pass be f(x)+x. It follows that the mapping subject to learning would be h(x)-x, which is close to a zero function. This means f(x) would be way easier to learn under the vanishing gradient problem, since functions that are close to zero functions demand a lower level of sensitivity to each parameter, unlike the identity function.

Now before we dive too deep into the wizardry of residuals, should we use residual in a sentence?

Serious: Neuroscientists wanted to explore if CNNs perform similarly to the human brain in visual tasks, and to this end, they simulated the grasp planning using a computational model called the generative residual convolutional neural network.

Less serious: Mom: “What happened?”
Me: “Sorry Mom, but after my attempt to bake chocolate cookies, the residuals were a smoke-filled kitchen and a cookie-shaped piece of charcoal that even the dog wouldn’t eat”

See you in the blogosphere,
Mason Hu

Lucie Yang’s STA299 Journey

Hello! My name is Lucie Yang, and I am excited to share my experience with my ROP project this summer! I’m heading into my second year, pursuing a Data Science specialist. While I have been interested in statistics for a long time, I was not sure exactly what field to pursue. Over the past year, I became fascinated with machine learning and decided to apply to Prof. Tyrrell’s posting, despite being in my first year and not having any previous experience with machine learning or medical imaging. To my surprise, I was accepted and thus began my difficult, yet incredibly rewarding journey at the lab.

I remember Prof. Tyrrell had warned me during my interview that the research process would be challenging for me, but still, I was excited and confident that I could succeed. The first obstacle I encountered was choosing a research project. Despite spending hours scrolling through lessons on Coursera and YouTube and reading relevant papers to build my understanding, I struggled to come up with a topic that was feasible, novel, and interesting. I would go to the weekly ROP meetings thinking I had come up with a brilliant idea, only to realize that there was some problem that I had not even considered. After finally settling on an adequate project, I was met with another major obstacle: actually implementing it.

My project was about accelerating the assessment of heterogeneity on an X-Ray dataset with Fourier-transformed features. Past work done in the lab had shown that cluster analysis of features extracted from CNN models could indicate dataset heterogeneity, therefore, I wanted to explore whether the same would hold for Fourier-transformed features and whether it would be faster to use them. With the help of a previous student’s code, implementing the CNN pipeline was relatively straightforward; however, I struggled to understand how to apply the Fast Fourier Transform to images and extract the features. As deadlines loomed near and time was quickly ticking away, I was unsure of whether my code was even correct and became very frustrated. Prof. Tyrrell and Mauro gave me immense help, helping me refine my methodology and answering my many questions. After that, I was able to get back on track and thankfully, completed the rest of my project in time.

I learned a lot from this journey, far more than I have in any class I’ve taken, from the exciting state-of-the-art technologies being developed to the process of conducting research and writing code for machine learning. Above all, I gained a deeper appreciation of the bumpy road of research, and I am incredibly grateful to have had the opportunity to get a taste of it. I am very thankful to all the helpful lab members, and I look forward to continuing my journey in data science and research in the coming years!

Lucie Yang

MiWORD of the Day is… Silhouette Score!

Silhouette score… is that some sort of way to measure whose silhouette looks better? Or how identifiable the silhouettes are? Well… kind of! It turns out that in statistics, silhouette score is a measure for how “good” a clustering algorithm is. It considers two factors: cohesion and separation. Particularly, how compact is the cluster? And how separated is the cluster from other clusters?

Let’s say you asked your friend to group a bunch of cats into 3 clusters based on where they were sitting on the floor, because you wanted to know whether the cats sit in groups or if they just sit randomly. How can we determine how “good” your friend clustered them? Let’s zoom in to one specific cat who happens to be placed in Cluster 1. We first look at intra-cluster distance, which would be the mean distance to all other cats in Cluster 1. We then take the mean nearest-cluster distance, which would be the distance between the cat and the nearest cluster the cat is not a part of, either Cluster 2 or 3, in this case.

To have a “good” clustering algorithm, we want to minimize the intra-cluster distance and maximize the mean nearest-cluster distance. Together, this can be used to calculate our silhouette score for one cat. Then, we can repeat this for each cat and average the score for all cats to get the overall silhouette score. Silhouette score ranges from -1 to +1, and the higher the score, the better! A high score indicates that the cats are generally similar to the other cats in their clusters and distinct from the cats in other clusters. A score of 0 means that clusters are overlapping. So, if it turns out that the cats were sitting in distinct groups and your friend is good at clustering, we’d expect a high silhouette score.

Now, to use it in a sentence!

Serious: I am unsure of how many clusters I should group my data into for k-means clustering… it seems like choosing 3 or 4 will give me the same silhouette score of 0.38!

Less serious (suggested to me by ChatGPT): I tried sorting my sock drawer by color, But it’s a bit tricky with all those shades of grey. I mean, I can’t even tell the difference between dark grey and mid grey. My sock drawer’s silhouette score is so low!

See you in the blogosphere!
Lucie Yang

Christine Wang’s STA299 Journey

Hi! My name is Christine Wang, and I’m finishing my third year at the University of Toronto pursuing a specialist in statistics with a focus on cognitive psychology. The STA299 journey through the whole year has been a really amazing and challenging experience.

My research project involved assessing whether the heterogeneity of medical images affects the clustering of image features extracted from the CNN model. Initially, I found it quite challenging to understand the difference between my research and the previous work done by Mauro, who analyzed the impact of heterogeneity on the generalizability of CNN by testing the overall model performance on the test clusters. Many thanks to the discussions in the ROP meeting every week, I understood that I needed to retrain the CNN model using the images in each of the clusters in the training set to see how heterogeneity could affect the clustering of image features. By checking whether the retrained CNN models from each cluster perform differently, I was able to show that heterogeneity could affect the clustering of image features. However, the most challenging part of the research is not just achieving the desired results, but rather interpreting what I could learn from those results. For instance, even though I obtained results that showed the retrained models perform differently, I spent a lot of time trying to understand what the clusters represent and why some retrained models perform better than others. I am very grateful to Professor Pascal Tyrrell for helping me understand my project and providing me with essential advice to check the between-cluster distances. This enabled me to interpret the results and identify a possible pattern: the retrained models with similar performance come from clusters that are also close to each other. However, further research is still required because the two datasets I used were not large enough. Looking back, I realize that it would have been better if I used the dataset in our lab, as finding the appropriate dataset and code was very challenging. I would like to thank Mauro, Atshuhiro, and Tristal for their generous help in teaching me how to do feature extraction and cluster analysis.

Before starting the project, I was fascinated by the high accuracy and excellent performance of ML techniques. However, during the ROP journey, I realized that achieving high model performance is not the most important thing. As Professor Pascal mentioned, the most crucial aspect of doing research is truly understanding what we are doing and focusing on interpreting what we can learn from the results we obtain. It is not enough to just have tables and figures; we need to go further by choosing appropriate statistical analysis to understand our results.

MiWORD of The Day is … Feature Extraction!

Imagine you have a photo of a cat sitting in a garden. If you want to describe the cat to someone who has never seen it, you might say it has pointy ears, a furry body, and green eyes. These details are the features that make the cat unique and distinguishable.

Similarly, in medical imaging, ML algorithms like CNN are widely used to analyze images like X-rays or MRIs. The CNN works like a set of filters that look for specific features in the image, such as edges, corners, or textures, and then combines these features to create a representation of the image.

For example, when looking at a chest X-ray, a CNN can detect features like the shape of the lungs, blood vessels, and other structures. By analyzing these features, CNN can identify patterns that indicate the presence of a disease like pneumonia or lung cancer. The CNN can also analyze other medical images, like MRIs, to detect tumors, blood clots, or other abnormalities.

To perform feature extraction, CNN applies a series of convolutional filters to the image, each designed to detect a specific pattern or feature. The filters slide over the image, computing the dot product between the filter and the corresponding pixel values in the image to produce a new feature map. These feature maps are then passed through non-linear activation functions to increase the discriminative power of the network. CNN then down-samples the feature map to increase the robustness of the network to translation and rotation. This process is repeated multiple times in a CNN, with each layer learning more complex features based on the previous layers. The final output of the network is a set of high-level features that can be used to classify or diagnose medical conditions.

Now let’s use feature extraction in a sentence!

Serious: “How can we ensure that the features extracted by a model are truly representative of the underlying data and not biased towards certain characteristics or attributes?”

Less Serious:
My sister: “You know, finding the right filter for my selfie is like performing feature extraction on my face.”

Me: “I guess you’re just trying to extract the most Instagram-worthy features right?”

Alice Zhang’s STA299 Journey

Hi friends! My name is Alice Zhang. I am finishing my third year of undergrad pursuing a
statistical science specialist with a focus on genetics and biotechnology, as well as a biology minor. It was a blessing to take part in STA299Y ROP with Professor Tyrrell and his MiDATA lab. As this experience comes to an end, I would like to share about my incredible journey.

Coming into the lab, I held great interest but zero research experience and zero knowledge about machine learning. I remember being completely lost and worried in my very first lab meeting. Looking back, I’m actually quite proud of how far I’ve come. My project was to compare multiple-instance classifiers and single-instance classifiers for diagnosing knee recess distension ultrasounds. I also explored factors that may influence multiple-instance model training.

The start of my project was rather smooth compared to others since it was more application-based than theoretical. I was able to grasp key concepts through literature searches and gather usable models and datasets (thanks to Mauro) needed to begin the project. However, with a lack of research experience and weak background in programming, I soon faced obstacles, confusion, panic and doubts. I had the tools in hand, but the hard part was designing, running and interpreting appropriate experiments. How do I modify and apply the code to my ultrasound data? How do I fairly compare two dissimilar algorithms? How do I unbiasedly alter and compare training factors? How do I give rational interpretations of the outcomes and unusual observations?

As the project progressed, I constantly felt that I was falling behind; I was still doubting and
modifying my experiments while my peers obtained results, I was still training my models while others were starting the write-up. To be honest, I panicked in every ROP meeting, but I was supported by Professor Tyrrell, lab members and my peers. I was able to power through. I am so grateful for having Professor Tyrrell as my guide through the first doorstep of research. He taught me that research isn’t about finding and reporting a standard answer, it is a process of discovering and then solving problems, and there’s no template for it. I was constantly encouraged to reflect on the “what”, “how” and “why” of the process. I also greatly appreciate the help from Mauro, who prepared the dataset and spent many hours guiding me through programming and model training.

Progressing through the project, I was later able to solve problems and modify bugs
independently. I started from zero to now completing my very first research project in machine learning. It feels like I’ve raised my first “research baby”! I would like to once again thank Professor Tyrrell and the lab members for their support, I couldn’t have gained this marvellous learning experience without them.

Diana Escoboza’s ESC499 Journey

Hello there! My name is Diana Escoboza, and I’ve just finished my undergraduate studies at UofT in Machine Intelligence Engineering. I am very fortunate to have Prof. Tyrell as my supervisor while I worked on my engineering undergraduate thesis project ESC499 during the summer. I believe such an experience is worth sharing!

My project consisted of training an algorithm to identify/detect the anatomical landmarks on ultrasounds for the elbow, knee, and ankle joints. In medical imaging, it is challenging to correctly label large amounts of data since we require experts, and their time is minimal and costly. For this reason, I wanted my project to compare the performance of different machine learning approaches when we have limited labelled data for training.

The approaches I worked on were reinforcement and semi-supervised learning. Reinforcement learning is based on learning optimal behaviour in an environment through decision-making. In this method, the model would ‘see’ a section of the image and choose a direction to move towards the target landmark. In semi-supervised learning, both labelled and unlabelled data are used for training, and it consists of feeding the entire image to the model for it to learn the target’s location. Finally, I analysed the performance of both architectures and the training resources used to determine the optimal architecture.

While working on my project, I sometimes got lost in the enthusiasm and possibilities and overestimated the time I had. Prof. Tyrell was always very helpful in advising me throughout my progress to keep myself sensible on the limited time and resources I had while still giving me the freedom to work on my interests. The team meetings not only provided help, but they were also a time we would talk about AI research and have interesting discussions that would excite us for our projects and future possibilities. We also had a lot of support from the grad students in the lab, providing us with great help when encountering obstacles. A big shout-out to Mauro for saving me when I was freaking out my code wasn’t working, and time was running out.

Overall, I am very grateful for having the opportunity to work with such a supportive team and for everything I learned along the way. With Prof. Tyrell, I gained a better understanding of scientific research and advanced my studies in machine learning. I want to thank the MiData team for all the help and for providing me with such a welcoming environment.