Expanding Our Thinking

Why We Plan Data Collection the Way We Do

2 Jun 2020

The Data Worksheet Explains Why We Plan Data Collection the Way We Do

THE END RESULT

The end result of any data collection activity, assuming we do it correctly, is a consolidation of data that we can study and analyse to find relationships between variables.

The layout of that data collection must match the way analysis packages such as SigmaXL and Minitab view the data, otherwise no analysis can be undertaken.

Those packages will simply not be able to recognise the different data types in the worksheet.

Matching the needs of those packages is quite simple and involves using the top row to name the variables in the data collection, I.e. the names of the Y and the Xs.

And then sticking the data directly underneath those headings.

An effectively laid out worksheet looks like this.

The Y variable (your primary metric) is in column A.

All of the different Xs are contained in columns B to I inclusive and are a mix of numerical and categorical variables.

THE DATA COLLECTION PLAN

The data collection plan that matches this assembly of data looks like this.

You'll notice the categorical Xs are listed as stratification variables because that's exactly what we will do to study their relationship with the primary metric ... stratify the data and compare results from each grouping.

The sampling plan guides us in the number of rows of data (I.e. data points) we collect and assemble in the data file.

Numerical variables are listed as secondary metrics which we study in a different way than categorical variables.

In most cases a correlation analysis is the primary strategy for looking at their relationship with the primary metric.

KEY POINTS ABOUT DATA COLLECTION PLANNING

The key points are these:

(A) Our data collection (DC) plan is there to help us design the elements of the data worksheet.

(B) The list of variables in the DC plan - the primary metric, the categorical Xs and the numerical Xs - determine the column headings in the data worksheet.

(D) The sampling plan guides us in how we collect the data and how many rows we collect.

(E) Because there can be a lot of variation in how people collect the numerical variables, we need to operationally define what those variables are and how they must be collected.

(F) Categorical variables don't need the same definition as the numerical, because they are observed data that makes it easy for data collectors to be consistent in what they record.

For more information, check the data collection planning section in Process Mastery with Lean Six Sigma 2nd Edition.

ARE YOUR LEADERSHIP SKILLS SUFFICIENT ..
FOR THE MOST COMPETITIVE ENVIRONMENT YOU'VE EVER EXPERIENCED?

MORE INFORMATION

ADVANCING CAREER AND BUSINESS SUCCESS

Professional Development

Courses built on a foundation of 50 years of working and 25 years in business as a change leader, influencer and content creator.

GET MORE INFORMATION

Follow Us on Social Media

© 2019-2025 by George Lee Sye (Soarent Publishing ABN: 89699416331) - All Rights Reserved; no part of this publication and the publications provided in this product may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise [except as required for the use of the purchaser of this product to complete the training course for which this is an accompaniment] without either the prior written permission of the copyright owner or a license permitting restricted copying issued by the copyright owner. This publication and the publications provided in this product may not be lent, resold, hired out or otherwise disposed of by way of trade in any form of binding or cover other than that in which it is published, without the prior consent of the copyright owner.