Who is your neighbor?

Classic Seth Rogan movie. Today we will be talking about good neighbors as a followup to my first post “What cluster Are You From?“. If you want to learn a little about bad neighbors watch the trailer to the movie Neighbors.


So let’s say you are working with a large amount of data that contains many, many variables of interest. In this situation you are most likely working with a multidimensional model. Multivariate analysis will help you make sense of multidimensional space and is simply defined as a situation when your analysis incorporates more than 1 dependent variable (AKA response or outcome variable). 


*** Stats jargon warning*** 
Mulitvariate analysis can include analysis of data covariance structures to better understand or reduce data dimensions (PCA, Factor Analysis, Correspondence Analysis) or the assignment of observations to groups using a unsupervised methodology (Cluster Analysis) or a supervised methodology (K Nearest Neighbor or K-NN). We will be talking about the later today.


*** Stats-reduced safe return here***
Classification is simply the assignment of previously unseen entities (objects such as records) to a class (or category) as accurately as possible.  In our case, you are fortunate to have a training set of entities or objects that have already been labelled or classified and so this methodology is termed “supervised”. Cluster analysis is unsupervised learning and we will talk more about this in a later post.


Let’s say for example you have made a list of all of your friends and labeled each one as “Super Cool”, “Cool”, or “Not cool”. How did you decide? You probably have a bunch of attributes or factors that you considered. If you have many, many attributes this process could be daunting. This is where k nearest neighbor or K-NN comes in. It considers the most similar other items in terms of their attributes, looks at their labels, and gives the unassigned object the majority vote!


This is how it basically works:


1- Defines similarity (or closeness) and then, for a given object, measures how similar are all the labelled objects from your training set. These become the neighbors who each get a vote.


2- Decides on how many neighbors get a vote. This is the k in k-NN.


3- Tallies the votes and voila – a new label! 




All of this is fun but will be made much easier using the k-NN algorithm and your trusty computer!




So, now you have an idea about supervised learning technique that will allow you to work with a multidimensional data set. Cool.




Listen to Frank Sinatra‘s The Girl Next Door to decompress and I’ll see you in the blogosphere…






Pascal Tyrrell

MiWord of the Day Is… Haptoglobin!

 

Just got back from the RSNA! Wow what a big conference – 56,000 people this year. McCormick place in Chicago, Illinois (where the conference is held) feels like an airport it is so big. 


Love Chicago. Great city. 


Of course, I had the pleasure of attending a bunch of great presentations and today I will introduce you to one of them. Tina Binesh Marvasti (say that 7 times fast!) presented on the topic of Haptoglobin. No, not Hobgoblin (not sure who that is? See here) or his infamous green predecessor (see here). 


So, what is Haptoglobin you ask? It is a serum protein that binds free hemoglobin – resulting from the breakdown of red blood cells – and functions to prevent loss of iron (contained in the heme group) through the kidneys and to protect tissues from the highly reactive heme groups. Essentially a housekeeping protein that helps to recycle hemoglobin as part of the red blood cell life cycle. Now what if your ability to clean-up free hemoglobin was impaired? Well, quite simply you would be putting at risk those sensitive tissues that come into contact with free hemoglobin. 




One important example of this is vessel walls affected by atheroma (AKA plaque). Sometimes these atheroma can bleed (called intraplaque hemorrhage or IPH) which worsens the whole situation. Typically, your body responds by sending the clean-up crew including the Hobgoblin (or haptoglobin, I always get these two confused). 











When people have the recessive genotype (Hp 2-2) of the Hp gene they produce less haptoglobin and therefore are at increased risk of damage from free hemoglobin (or more specifically the heme groups).



Tina and friends hypothesized the following: 













And she found that having the recessive Hp2-2 genotype was associated with a higher prevalence of IPH in a group of  80 patients (average age of 73 yrs). She also found that the IPH volume of Hp2-2 patients worsened over time.








So what is the take home? The Haptoglobin genotype is associated with IPH which is a biomarker of high risk vascular disease and could identify populations at higher risk of developing cardiovascular events.

 

Now for the fun part (see the rules here), using Haptoglobin in a sentence by the end of the day:

Serious: Hey Bob, did you know that a recessive haptoglobin genetype may contribute to an increased risk of cerebrovascular disease?

Less serious: My GP suggested that based on my recessive hobgoblin genotype I should consider a healthier lifestyle. Funny, I always figured Doc Ock to be the one to watch for…

OK, watch the Spider-man 2 trailer to decompress and I’ll see you in the blogosphere…

 
 
 
Pascal Tyrrell