Why We Plan Data Collection the Way We Do

2 Jun 2020
The Data Worksheet Explains Why We Plan Data Collection the Way We Do

The end result of any data collection activity, assuming we do it correctly, is a consolidation of data that we can study and analyse to find relationships between variables.

The layout of that data collection must match the way analysis packages such as SigmaXL and Minitab view the data, otherwise no analysis can be undertaken.

Those packages will simply not be able to recognise the different data types in the worksheet.

Matching the needs of those packages is quite simple and involves using the top row to name the variables in the data collection, I.e. the names of the Y and the Xs.

And then sticking the data directly underneath those headings.

An effectively laid out worksheet looks like this.

The Y variable (your primary metric) is in column A.

All of the different Xs are contained in columns B to I inclusive and are a mix of numerical and categorical variables.


The data collection plan that matches this assembly of data looks like this.

You'll notice the categorical Xs are listed as stratification variables because that's exactly what we will do to study their relationship with the primary metric ... stratify the data and compare results from each grouping.

The sampling plan guides us in the number of rows of data (I.e. data points) we collect and assemble in the data file.

Numerical variables are listed as secondary metrics which we study in a different way than categorical variables.

In most cases a correlation analysis is the primary strategy for looking at their relationship with the primary metric.


The key points are these:

(A) Our data collection (DC) plan is there to help us design the elements of the data worksheet.

(B) The list of variables in the DC plan - the primary metric, the categorical Xs and the numerical Xs - determine the column headings in the data worksheet.

(C) Every time we collect a data point for the primary metric, we also collect one data point for every other variable.

(D) The sampling plan guides us in how we collect the data and how many rows we collect.

(E) Because there can be a lot of variation in how people collect the numerical variables, we need to operationally define what those variables are and how they must be collected.

(F) Categorical variables don't need the same definition as the numerical, because they are observed data that makes it easy for data collectors to be consistent in what they record.

For more information, check the data collection planning section in Process Mastery with Lean Six Sigma 2nd Edition.

Build a CV That Sells YOU Most Effectively
Download George's CV template and guidebook here.
© 2019 Soarent Publishing - All Rights Reserved | PO Box 267, Ravenshoe, Qld. Australia 4888 | ABN: 89699416331 | Contact Us: [email protected]

We use cookies to give you the best possible experience on our website. By continuing to browse this site, you give consent for cookies to be used. For more details please read our Cookie Policy