Ok, so you agree to dress your little friend before sending him/her out into the cold world of publication. But what is a p-value anyway? I realize that I am jumping the gun (pun intended) a little as it forces us to talk about inferential statistics – a challenging topic. So today I will only give you a small taste of what is to come. First, to get you in a good mood I want you to watch the trailer for the first of three hilarious Naked Gun movies.

We have already talked about research questions and today I would like to introduce you to their children the research hypotheses. Essentially they are a version of their parents that summarize the main elements of a study – sample, predictor and outcome variables – in such a way that you are able to perform a test of statistical significance. These hypotheses are not required for descriptive studies like the ones we have been discussing in our blog so far. For instance if we were to ask how many people who read this blog enjoyed the Naked Gun series of movies we would end up with a proportion. We could then simply describe our findings as discussed in my Ogive post.

But what if you wanted to now if the proportion of gals differed from the proportion of guys who enjoyed the movies as you suspect that the type of humor will please guys more than gals? As we are research scientists we would want to test this “hypothesis” in order to compare the findings among the groups: this is a test of statistical significance. The brilliant statistician Ronald Fisher championed this approach. Only a single hypothesis is required: the null hypothesis. It simply states that no association of interest exists. So in this case whether you are a gal or a guy is not associated with whether you like the Naked Gun movies or not in the population of blog readers.

Break! Listen to the music of P-value Diddy (he has so many names already I thought it ok to add one more) with Jimmy page from the Godzilla soundtrack.

Welcome back. So the null hypothesis is always assumed to be true until shown to be false with a statistical test. When you analyze your data and perform the test you will determine the probability of seeing an effect as big or bigger than that in your study by chance alone if the null hypothesis were true. You would reject the null hypothesis if the p-value is less than a predetermined level of significance – typically 5% or 1 in 20.

So what is a naked p-value? It is simply a p-value obtained from the statistical test you performed on the data from your study reported WITHOUT an effect size, its sign and precision. The effect size is simply an estimate of the size of the association that you are studying – 25% more guys liked the movies as compared to the gals. The sign and precision is simply the direction of the observed difference (are you comparing gals to guys or the other way around) and an estimate of how confident you are – generally reported as a confidence interval which we will talk about in a later post.

So what is the bottom line?** In order to keep your p-value warm you need to report it with the measure of the size of the association (****effect size)**** and how confident you are about your answer.**

In a subsequent post we will talk about another similar approach, Pearson-Neyman hypothesis testing, which involves two competing hypotheses (the null and the alternate hypotheses). This approach is duductive as opposed to Fisher’s inductive statistical testing approach. Both approaches are valid. It is simply a matter of determining which is more appropriate in a given situation.

See you in the blogosphere,

Pascal Tyrrell

## Are You My Type, Data?

So you have come up with a research question and now you must chose a method by which your responses will be obtained. For example, a question like ‘Are you a Trekky?’ leads to a simple yes/no answer. So, are you? No need to fess up. I understand. Don’t know what I am talking about? See the trailer for my favorite of the Star Trek movies: The Wrath of Khan Trailer.

What if you were to ask, ‘How much of a Trekky are you?’. You are no longer able to use a simple two-category response but one that uses a continuous scale.

An important distinction to remember when dealing with responses in research is that in general some will be categorical, such as favorite TV series, race, or marital status, and others continuous variables like blood pressure, cholesterol levels, or how much you enjoy Star Trek shows on a scale of 1 to 10 recorded on a 100 mm line. For those of you who would score high here listen to Santana – You are my kind as a reward.

This brings us to the important concept of the *level of measurement*. If you are working with named categories – race for example – then you have a nominal variable. Categories that have an order to them – education level for example – are ordinal variables. What if the interval between your responses is fixed and known? Then you have an interval variable – temperature in Celcius or Fahrenheit is a good example. However, is zero degrees Celcius the same as zero degrees Fahrenheit? No. The latter is much colder! Now what if you are working in Kelvin which has a meaningful zero point? Then it is a ratio variable.

Ok, so why the big deal? The important difference is between nominal/ ordinal data and interval/ ratio data. The latter two can be used in what is termed: “parametric statistics” that gives us measures of center (mean) and spread (standard deviation). We have already touched on this in previous posts. See here: Great Expectations. It makes no sense to talk about the average sex of a sample students in your study. These data must be considered as frequencies in separate categories. We previously talked about this a little here: Ogive and this type of data leads to “non-parametric” analysis.

Enough already! I’ll let you get back to streaming Star Trek re-runs…

Next time lets talk a little about parametric statistics and how thy came to be. I’ll leave you with this quote as a teaser from one of the greatest statisticians to ever walk the earth – Ronald Fisher: **“The analysis of variance is not a mathematical theorem, but rather a convenient method of arranging the arithmetic.”**

Pascal Tyrrell

## Pick me! Pick me! Pick me!

*Meatballs*

*)*that you have committed to?

*? But why would you do that? Let me tell you why. First listen to CeeLo Green to get pumped (yes, it is about firefighters but just pretend he is signing about volunteer scientists…).*

**volunteering****Create opportunities for yourself by volunteering**. You will be glad you did.

## Ogive? What the what? Oh, “jive”… right!

Ahhh, the 80’s. Interesting years to be in high school. I think I never quite fully recovered. I don’t wear Corduroy pants anymore but the acid wash jean jacket… maybe. Not sure what I am talking about? Have a peek here: 80’s-fashion.

So in my last post we talked about the concept of expectation (see Great-expectations) and the importance of organizing our data. Ask me what I think is the most important step to understanding your data? Organizing and graphing it – always. It is such a simple thing to do and yet it gives you crazy perspective and insight for any analysis that may follow.

**The concept of a frequency distribution in statistics is paramount.** By organizing your data values into an appropriate number of classes we in fact make more explicit the information that is there in the data. The resulting frequency table can then provide us with some basic summary statistics such as class frequencies and proportions. By the way, classes have end marks. The upper and the lower. The average of these two marks is the mid-point and the interval is the difference between adjacent class mid-points. Lastly, the class mid-point plus or minus half the interval gives you the class boundaries… Boring? Maybe you need a break. Watch the trailer for the epic 1980 movie *Airplane!* to decompress a little: Airplane! movie trailer…

So what now? We need to present this data graphically. The first chart to think of is the bar chart. It is simply a plot of the frequency against class, where the class frequencies are represented by bars. Classes in this case are made up of SINGLE readings. How about an example using radiation counts?

If your classes are made up of a GROUP of readings than you would consider a histogram as in this example using velocity of light measurements.

Now if you were to join the mid-point of each class by a straight line you would obtain a frequency polygon. This would allow you to easily compare several distributions on a single graph.

Finally, if you were to plot the CUMULATIVE frequency against the upper class boundary you would produce a cumulative frequency polygon – AKA the “ogive” as it has the characteristic arch-like shape found in architecture.

If you ever find yourself using the term *ogive* in a public setting and getting blank stares from your friends then refer to the funny “jive” scene in the infamous movie *Airplane!* to diffuse the situation: Airplane! – Jive Scene.

Hopefully, everyone will say: “Oh, *jive*. I get it!”…

Let’s talk a little about data types next time. Ok?

See you in the blogosphere…

Pascal Tyrrell

## Great Expectations and What the Dickens is Probability Distribution Anyway?

If you are feeling like Pip in Charles Dickens’ wonderful novel Great Expectations every time you think of statistics, you are not alone! Not sure who Pip is? Have a peek at the latest of many movies based on this book: Great Expectations trailer

Pip started life in a poor community raised by a much older cruel sister. He did, however, grow up to be a gentleman (and a scholar?) and come to realize that our great expectations in life won’t necessarily come true. We instead work hard all of our lives and ultimately have to accept what is. Getting too serious? Have a gander at Diggy Simmons music video “Great Expectations” to relax a bit: Diggy Simmons music video

Ok we’re back. So what is the link between Pip and statistics?

As a researcher we are often interested in “what to expect” in future experiments or trials. The methodology used to perform the research and analysis of results will help to obtain an estimate of the answer to your question – see my previous post if you are in the dark about this one (Allegory of the cave).

In statistics the term “expectation” is given a precise definition in terns of probabilities (the chance that something will happen – how likely is it that some event will happen). Thus, if we consider an experiment or trial as taking a variable x at random from some population of readings and recording its value then the value to expect for x is the mean µ of this population.

Here is the rub: the population mean is usually a quantity whose value we can NEVER determine exactly – it is the value to EXPECT. This is a VERY important concept in statistics.

*** Caution: stats talk below – skip if already feeling dizzy…

When we make predictions about future trials we have to keep in mind that we are working with a sample of results that will necessarily have a measure of uncertainty associated with them. By organizing our data into frequency tables we can then present its distribution graphically (ie: frequency curve, histogram) and get our first appreciation of where the center is (mean, median, mode) and scatter (variance and standard deviation). Finally, if we convert our frequency distributions to probability distributions (divide each class frequency by the sum of frequencies) we can obtain expected values from these distributions. Plural? There are different types? Yes, and we will chat about these in future posts…

*** Safe re-entry here:

I am ok with having to work with estimates and never knowing the truth. You? As Socrates once said (a long, long time ago!): **“The only true wisdom is in knowing you know nothing.”**

So what next? Maybe watch the movie “Great Expectations” this week-end and tell everyone that you were studying for your stats class. Let’s talk about organizing data next.

Enjoy the movie.

Pascal Tyrrell

## The Truth? You Can’t Handle the Truth!

In “A Few Good Men” Jack Nicholson growls “You can’t handle the truth” to Tom Cruise in his Academy award winning performance. Watch a clip of his gritty performance: A few good men. Our pursuit of the truth leads to an interesting path indeed.

This series of posts has as objective to help you develop a scientific “sense”. Have a quick peek at my other posts (http://mivip-utoronto.blogspot.ca/) if you haven’t already and come back. So wanting to know the truth is something we all strive for on a daily basis. Finding the truth is another matter altogether and this philosophical conundrum has challenged many great minds for centuries.

The Roman Emperor Marcus Aurelius once stated many, many years ago: **“Everything we hear is an opinion, not a fact. Everything we see is a perspective, not the truth”**. Have a quick peek at the trailer for “Gladiator” to put you in the mood. Gladiator

Now Greek philosopher Plato, who predated Marcus a few centuries, got the ball rolling when he presented his *Allegory of the Cave*, in which he symbolically described his belief that the world revealed by our senses is not the real world but only a poor copy of it, and that the real world can only be apprehended intellectually. Plato used an analogy where we are represented as a gathering of people who live chained to the wall of a cave all of our lives, facing a blank wall. We watch shadows projected on the wall by things passing in front of a fire behind them, and begin to designate names to these shadows. The shadows are as close as we get to viewing reality.

**“Variety’s the very spice of life, That gives it all its flavor”**.

## To be, or not to be: what is in a research question?

**“The voyage of discovery lies not in seeking new horizons, but in seeing**

with new eyes.”Maybe by asking the right questions we can inch ever so slowly towards the truth that lies right in front of our own eyes! So take a fresh look at what and how you do all things scientific.

with new eyes.”

**P**atient, Population, Problem

**I**ntervention

**C**omparison (optional. PIO when absent!)

**O**utcome

- Is MR angiography more effective than a Doppler carotid ultrasound in diagnosing and describing carotid artery disease in obese middle-aged males and females?

or PIO – For a patient with (Problem), does (Intervention) affect (Outcome)?

- Is a MR angiography effective in diagnosing and describing carotid artery disease in obese middle-aged males and females?

**quantitative**statistical analysis you will want your question to be answerable by yes/no or a number. For

**qualitative**analysis your question will typically start with: What is/are…?

## So you want to be a researcher? Get a pocket protector…

You are a student who wants to pad the resume with extracurricular activities – maybe thinking of a career in healthcare. What could you do? You’ve always heard of your “brainy” friends getting into research. But is it for you….

Faith’s post “Research Behind Research” from yesterday gives us a glimpse into how research may be less of a scary thing than most people think. Go have a quick read and come back.

Ok, why the pocket protector? Because it’s a start. “**Fake it until you become it**” as states Amy Cuddy from Harvard University in her awesome TED Talk (Cuddy TED talk)

So here is what I suggest to get started on your new research persona:

1- Buy, borrow, or make a pocket protector. Maybe get a shirt with a pocket too.

2- Set aside one hour a week to wear your shirt and pocket protector.

3- Find somewhere quiet but inviting with as few distractions as possible for your new activity.

4- Listen to John James “I wanna know” to get you motivated (I Wanna Know).

Now for the interesting part – how do I do research?

Research is a structured approach to discovery. You need to organize your thoughts and your methods – always. Use your time to figure out what methods work best for you. What are your preferred search engines? Do you always use Google? Do you Bing every now and again? How do you record your ideas, findings, links, articles… etc. How often are you successful at finding the answer? Do you keep track of what you did when you were?

Though being organized will take you a long way, the most important component of research is the question that you are asking. Not as easy as you may think. As the well respected French anthropologist Claude Levi-Strauss suggested: “**The scientist is not a person who gives the right answers, he’s one who asks the right questions**.”

Start thinking of and asking questions – all the time. Take the time to answer some of them during your research hour. Do you find the way your question is structured helps in finding an answer? How about if your question is answerable by a yes/no? A number (average height for expl)? Any easier than if your question starts with “What are…”?

Stay tuned as we will address all of these interesting challenges in this blog…