Breaking Up Is Hard to Do

Last week I met with Helen, a clinical investigator program radiology resident from our department, about her research (shout out to Dr Laurent Milot’s research group). When discussing predictors and outcomes for her retrospective study it was suggested that some continuous variables be broken up into levels or categories based on given cut-points. This practice is often encountered in the world of medical research. The main reason? People in the medical community find it easier to understand results that are expressed as proportions, odds ratio, or relative risk. When working with continuous variables we end up talking about parameter estimates / beta weights and such – not as “reader friendly”. 


Unfortunately, as Neil Sedaka sang about in his famous song Breaking Up Is Hard to Do, by breaking up continuous variables you pay a stiff penalty when it comes to your ability to describe the relationship that you are interested in and the sample size requirements (see loss of power) of your study.


You are now a newly minted research scientist (need a refresher? See Pocket Protector) and are interested in discovering relationships among variables or between predictors and outcomes. The more accurate your findings the better the description of the relationships and the better the interpretation/ conclusions you can make.The bottom line is that dichotomizing/ categorizing a continuous measure will result in loss of information. Essentially, the “signal” which is the information captured by your measure will be reduced by categorization and, therefore, when you perform a statistical test that compares this signal to the “noise” or error of the model (observed differences between your patients for example) you will find yourself at a disadvantage (loss of power)David Streiner (great author and great guy!) gives a more complete explanation in one of his papers.


Now, as we see in the funny movie with Vince Vaugh and Jennifer Aniston, The Break Up, there are times when categorization may make sense. For example when the variable you are considering is not normally distributed (see Are You My Type?) or when the relationship that you are studying is not linear. We will talk about these situations in a later post.


Don’t forget: you will get further ahead if you keep your variables as continuous data whenever possible.




See you in the blogosphere,




Pascal Tyrrell