What’s the best way to begin the analysis of data so that you don’t get bogged down and you solve the problem as efficiently as possible?
Well truth is it takes discipline to achieve that, but it’s actually quite easy.
This question relates directly to an important activity in six sigma oriented project work that few people do well without a little guidance; data analysis.
My answer to this is that a practitioner will do a brilliant job if they approach the work like a detective.
From experience a detective doesn’t jump into their analysis without some preparation. They begin with their suspicions, then they investigate and interview to answer specific questions that will either negate or prove their suspicions. They would never interview a suspect without first knowing what it is they need to ask.
You need to do the same thing. The following describes how I go about doing this work quickly, effectively, and without unnecessary stress.
The goal here is to bring all of your Xs and Ys into the mix, so you begin by grabbing your data collection plan, identifying all of the variables you’ve included in the plan, and then listing everything you or the team suspect about those variables.
For example, let’s assume we were working on resolving problems with some of our coffees being too cold in a Cafe we own. Obviously the Y in this case is ‘temperature’.
At this point we’ve collected data that includes the variables listed below for each cup of coffee:
- Temperature of coffee (the Y)
- Who made the coffee
- When it was made
- Type of coffee ordered
- Cup size
I might then have the following suspicions about these variables and their relationships with the Y, or in some cases, each other.
A. Some of our baristas deliver more defective coffees, temperature wise, than others
B. More of our ‘too cold’ coffee defects are produced in the busy times in the morning
C. The smaller cup coffees cool down more quickly than the larger ones
D. Milk based coffee might not hold heat as long as the water based coffees
These suspicions form the basis for identifying the questions we need to answer in our analysis.
All we do is look at our suspicions, and then think about them as questions.
You’ll notice that the questions begin to bring parameters into play such as proportions and averages.
A1. Do different baristas produce more defective coffees than other baristas?
A2. Does any particular barista make coffees that are on average a lower temperature than the others?
B1. Do AM periods produce more defective coffees than other times of the day?
B2. Does the AM period on average produce coffee that is a lower temperature than other times of the day?
C1. Are the majority of our ‘defective’ coffees delivered in small cups?
C2. Is the average temperature of coffee when delivered in smaller cups less than for larger cups?
D1. Are the majority of our ‘defective’ coffees milk based (i.e. latte, flat white etc)?
D2. Is the average temperature of milk based coffees less than water based coffees?
Now we think about how we would test the data so that we can get definitive answers to our specific questions.
Notice that I’ve indicated how I would display the data as well as statistically analyse to obtain an answer.
A1, B1, C1, D1 – Compare defect rates for each individual factor (baristas, AM versus PM, small versus large cups, milk versus water based coffees) – display in pie charts / analyse using Proportion Test
A2 – Compare temperature values for individual baristas – display in box plots / analyse using 1 Way ANOVA
B2, C2, D2 – Compare average temperatures for each individual factor (AM versus PM, small versus large cups, milk versus water based coffees) – display in box plots / analyse using 2 Sample t Tests
All you do now is answer your Step 2 questions by doing what you planned in Step 3.
The golden rule is simply this:
Never begin any analysis without first knowing the question you are trying to answer!
That’s it, simple and effective, and extremely important if you are to avoid paralysis by analysis.