Gage R&R (Repeatability & Reproducibility) - A Guide

01. What is a Gage R&R study?

In a perfect world, we would make every feature of every part at exactly the Nominal or Target design value. Unfortunately, in the real world, parts vary in size from the nominal or target value, and we expect to see differences in measured values from part-to-part.

But when me measure a set of parts, we do not know whether the gage is actually detecting differences in part sizes or whether the differences in measured value are caused by an unreliable gage or an untrained operator.

A Gage R&R study helps us answer the question: where does the variation come from?

  • What fraction of this variation is truly coming from differences in part sizes (Part Variation)
  • And what fraction of this variation is coming from the gage and operator?

For a measurement system to be considered reliable, most of the variation should come from differences in part sizes, with only a small fraction of variation coming from the gage and the operator. (Note: For the purpose of a Gage R&R study, we describe the measurement system as a combination of the gage, the operator, and the process by which the measurement is made.)

If a high fraction of variation is from the gage or the operator, i.e. the gage has poor R&R, parts that are close to the USL and LSL i.e. parts that fall in the yellow zone are at risk of being misclassified as Pass or Fail.

parts close to the LSL or USL are at risk of mis-classification

02. An Analogy

As an example, you may choose to use an ear thermometer (the gage) to measure your child's body temperature (the parameter being measured). The measurement method involves inserting the tip of the ear thermometer into the ear canal, and then pressing a button to record the temperature.

When you use the thermometer, you assume that:

  1. This method will detect any significant increase in the child's body temperature.
  2. The measurements will be nearly identical if you repeat the measurement (within a short interval of time), i.e. measurements are Repeatable across multiple trials.
  3. The results will be nearly identical if you and your spouse were to independently make this measurement, i.e. measurements are Reproducible across operators.

In the ideal case, if you and your spouse were to make measurements a few hours apart, the only reason for a change in reading should be an actual change in body temperature. How you position the thermometer or any inherent variability in the thermometer should have little to no impact on the measurement.

And this is why we run a Gage R&R study - to determine whether the difference in readings comes from the thermometer itself, the operator (person making the measurement), or from the patient's temperature (ideally all the variation comes from the patient's actual temperature).

ear thermometer

03. Prerequisites

Before we dive into the mechanics of a Gage R&R study, let's define some important terms.

  1. Bias: As a gage wears, we begin to see a consistent shift from the true value. The measured value will always be higher or lower than the actual value.
  2. Linearity: Sometimes, the difference between measured value and the true value increases or decreases over the range of measurements. For example, when we measure a 1.000" block, we might measure 1.0001 (i.e. an error of .0001") but when we measure a 2.000" block, we might measure 2.00045 (i.e. an error of .00045"). In this case the error term varies as the size of the feature increases and the measurement system is not satisfactory for measurement.
  3. Stability: Stability is the behavior of a measurement system over time. To ensure stability, we monitor environmental parameters such as Temperature and Humidity, and calibrate a gage periodically.
  4. Calibration: To calibrate a gage, we measure a known value (e.g. a 1.000" block traceable to NIST) with the gage, and ensure that the reported value is within the acceptable tolerance range of the gage.

Before a Gage R&R study, we must ensure that the gage is unbiased, linear, stable, and calibrated.

A perfectly linear measurement system would have:

  • Zero bias (no consistent offset)
  • Perfect correlation between measured and actual values
  • Same accuracy across entire range

For example:

Example: For ±0.010" tolerance

Ideally, we would have an error less than 5% of part tolerance (±0.0005"). Maximum error should be:

Consistent across range

No visible trend in errors

04. Types of Gage R&R Studies

Gage R& R studies are broadly divided into two types depending on the type of measurement being made:

  1. Attribute Gage R&R: Go/No-Go, Thread Gages, Visual Appraisal to a Standard etc.
  2. Variable Gage R&R: Micrometers, Calipers, CMMs etc

Variable Gage R&R studies are further sub-classified into:

  1. Crossed Gage R&R: In the most commonly used method, the study is crossed i.e., each operator measures each part or sample multiple times.
  2. Nested Gage R&R: In some cases, it is not possible for each operator to measure each part or sample multiple times (e.g. destructive tests), requiring a nested study in which each operator measures a different set of parts (i.e. the parts are "nested" under the operator).
crossed and nested gage R&R studies

05. Attribute Gage R&R Overview

A typical Attribute gage R&R study involves multiple operators, parts, and repeated measurements of the same part by each operator (we call these repeated measurements "trials").

It's common to have 3 Operators, 30 Parts, and 3 Trials on each part. Operators record a simple Pass/Fail (i.e. Attribute) measurement. (In more advanced cases, operators may need to classify parts into categories e.g. Class A, Class B, and Class C)

IMPORTANT: In an Attribute Gage R&R study, disagreement within each Operator's trials and disagreements across Operators will be apparent only in parts that are close to the Upper Spec Limit (USL) and Lower Spec Limit (LSL).

attribute gage r&r data table

06. Attribute Gage R&R Calculations

Each operator's appraisal (Pass/Fail) is compared to the appraisal (Pass/Fail) of the other two operators, and against a reference value. The reference value itself must be obtained from a reliable variable measurement system (e.g. a CMM, a color measurement device etc.) or from a master/expert appraiser.

Repeatability: For each operator, we measure how many times they arrive at the same assessment within each trial (i.e. are each operator's measurements for the same part consistent?). We then divide this number by the total number of parts to arrive at the percentage repeatability. If the operator's measurements on the same part are consistent less than 90% of the time, the operator needs re-training.

Reproducibility: Next we measure how many times the operators agree with each other's individual measurements (i.e. are measurements of the same part consistent across Operators?). We then divide this number by the total number of parts to arrive at percentage reproducibility.

07. Variable Gage R&R Overview

A typical Variable (crossed) Gage R&R involves multiple operators, parts, and repeated measurements of the same part by each operator (we call these repeated measurements "trials").

It's common to have 3 Operators (k), 10 Parts (n), and 3 Trials (r) on each part. Each operator measures each part three times, records the numeric (i.e. variable) measurement value, and we obtain a table like the one shown here.

variable gage r&r table

08. ANOVA (Analysis of Variance)

The first step is to calculate the total variation. But once we have the total variation, how do we know what fraction of the variation comes from the parts themselves, and what fraction comes from the operator or the gage?

This is where the ANOVA (Analysis of Variance) technique comes handy. It allows us to partition the variation into factors.

Variability-Total = Variability-from-Parts + Variability-from-Operators + Variability-from-Equipment + Variability-from-Interaction-Between-Operator-And-Gage

SSTotal = SSParts + SSOperator + SSEquipment +SSInteraction

09. Total Variation

To calculate the total variation, we calculate the sum of the square of the difference between each measured value and the overall average. We call this the Total Sum of Squares (SS-Total).

The steps to find total variation:

  1. Calculate the overall average of all measurements
  2. Find how much each measurement differs from this average
  3. Square these differences
  4. Add up all squared differences

For example:

  • The Square Error for "Operator A - Part 1 - Trial 1" is (1.02 - 1.008)2 = (0.012)2 = .000144
  • The Square Error for "Operator B - Part 8 - Trial 2" is (1.07 - 1.008)2 = (0.062)2= .003844
overall average

10. ANOVA and Components of Variation

ANOVA uses a very simple trick to calculate the fraction of variability from Parts, Operators, and Equipment. We group the data set in different ways to examine whether the measurement differences are between parts (desired), or between operators (a reproducibility problem), or between trials (a repeatability problem).

There are three groupings of interest:

  1. Grouping by Parts
  2. Grouping by Operators (Reproducibility)
  3. Grouping by Operator and Part i.e. by Equipment (Repeatability)

11. Grouping by Parts

Ideally, all variation comes from the parts themselves. So we expect to see a high amount of variation between parts. To determine how much variation comes from the parts, we group the data by part, and calculate the average for each Part. Then we compare each of these Part average to the overall average.

The steps to calculate variation from actual part differences:

  1. Calculate average for each part
  2. Compare each part's average to the overall average
  3. Square the differences
  4. Sum up all squared differences

This is the sum of the square of the difference between each part average and the overall average. There are 10 parts, and we calculate 10 Square-Error terms that we sum to arrive at SS-Parts.

For example:

  • The Square Error for Part 1 is (1.048 - 1.008)2 = (0.04)2 = .0016
  • The Square Error for Part 7 is (.924 - 1.008)2 = (-0.084)2= .0070

Note the negative sign in the error calculation for Part 7. We square the error values to eliminate the negative sign. Then we add all the squared values.

calculating the part variation

12. Grouping by Operator (Reproducibility)

To determine how much variation comes from the operators, we group the data by Operator and calculate the mean for each Operator. Then we compare each of these Operator averages to the overall average.

The steps to calculate variation from operator differences:

  1. Calculate average for each operator
  2. Compare each operator's average to the overall average
  3. Square the differences
  4. Sum up all squared differences

This is the sum of the square of the difference between each operator average and the overall average. There are 3 operators, and we calculate 3 Square-Error terms that we sum to arrive at SS-Operators

  • The Square Error for Operator A is (0.999 - 1.008)2 = (-0.009)2 = .000081
  • The Square Error for Operator B is (1.015 - 1.008)2 = (.007)2= .000049
  • The Square Error for Operator C is (1.010 - 1.008)2 = (.002)2= .000004
calculating the operator variation

13. Grouping by Operator and Part i.e. Equipment (Repeatability)

Quantifying the equipment measurement requires a different comparison. This time, to determine how much variation comes from the equipment, we group data by Operator and Part. This grouping gives us just the equipment variation because it is the same operator measuring the same part three times.

To calculate variation from the equipment:

  1. Group measurements by operator-part combination
  2. Calculate average for each combination
  3. Compare each measurement to its group average
  4. Square the differences
  5. Sum up all squared differences

This is the sum of the square of the difference between each operator-part measurement and the corresponding operator-part average. There are 30 operator-part combinations, and we calculate 30 Square-Error terms that we sum to arrive at SS-Equipment

For example:

  • The Square Error for "Operator A - Part 1 - Trial 1" is (1.02 - 1.023)2 = (-0.003)2= .000009
  • The Square Error for "Operator A - Part 1 - Trial 2" is (1.03 - 1.023)2 = (-0.007)2= .000049
  • The Square Error for "Operator A - Part 1 - Trial 3" is (1.02 - 1.023)2 = (-0.003)2= .000009
calculating the equipment variation

14. Interaction between Operator and Gage i.e. Error Term

The interaction sum of squares is calculated by subtracting out the SSParts, SSOperator, and the SSEquipment from the SSTotal

SSInteraction = SSTotal - SSParts - SSOperator - SSEquipment

This represents unexpected variations that occur when specific operators measure specific parts

15. From Sum of Squares to Variance: A Simple Explanation

Next, we convert the Sum of Squares into Variance numbers so that:

  • The variation makes sense in our original measurement units
  • We account for study size
  • Values can be compared fairly
  • Values can be used for decision-making

Sum of Squares (SS)

Think of Sum of Squares as the "raw total" of all the differences we see. It's like adding up all the variation without considering how many measurements we took or who took them.

Example with Room Temperature:

You measure room temperature 3 times

Your readings: 71°F, 70°F, 69°F

The average is 70°F

Sum of Squares would just add up the squared differences:

(71-70)2 + (70-70)2 + (69-70)2 = 1 + 0 + 1 = 2 deg2

Note: The units are squared of the original units (e.g. deg2)

Sum of Squares: Total accumulated differences.

Variance

Variance is like getting the "average effect" by considering:

  • How many measurements we took
  • How many operators were involved
  • How many parts we measured

Converting Sum of Squares to Variance

Using the same temperature example:

We divide the Sum of Squares (2) by how many measurements we took minus 1 (3-1 = 2)

Variance = 2 ÷ 2 = 1°

Note: Units are original units (e.g. degrees)

This gives us the average effect of the variation

Variance: Average effect of differences

16. Converting Sum of Squares to Variance

Before we calculate the final variation percentages, we need to:

  • Calculate Sum of Squares (SS) for each source of variation
  • Convert these into Variance Components
  • Calculate the percentage contribution of each source

Calculating the Part Variance Component

Part Variance = (SS-Parts) / (n-1) / (r*k)

Calculating the Operator Variance Component

Operator Variance = (SS-Operator) / (k-1) / (r*n)

Calculating the Equipment Variance Component

Equipment Variance = SS-Equipment / (r-1) / (n*k)

Calculating the Interaction Variance Component

Interaction Variance = SS-Interaction / ((n-1)*(k-1)) / r

Where:

n = number of parts

r = number of trials

k = number of operators

17. Example Variance Calculation:

Given:

10 parts (n)

3 operators (k)

3 trials (r)

SS-Parts = 0.0160

SS-Operator = 0.0012

SS-Equipment = 0.0008

SS-Interaction = 0.0004

Calculate:

Part Variance = 0.0160 / (10-1) / (3*3) = 0.000197

Operator Variance = 0.0012 / (3-1) / (3*10) = 0.000020

Equipment Variance = 0.0008 / (3-1) / (10*3) = 0.000013

Interaction Variance = 0.0004 / ((10-1)*(3-1)) / 3 = 0.000007

18. Calculating Total Variance and Percentages

Total Variance

Total Variance = Part Variance + Operator Variance + Equipment Variance + Interaction Variance

Percentage Calculations

Part % = (Part Variance / Total Variance) x 100

Operator % = (Operator Variance / Total Variance) x 100

Equipment % = (Equipment Variance / Total Variance) x 100

Interaction % = (Interaction Variance / Total Variance) x 100

Gage R&R Percentage

Gage R&R % = ((Operator Variance + Equipment Variance + Interaction Variance) / Total Variance) x 100

19. Commonly used Acceptance Criteria

To evaluate a Gage R&R report, you must review the following results:

  • Total Gage R&R percentage
  • Repeatability percentage
  • Reproducibility percentage
  • Number of distinct categories
  • Part-to-part variation

20. Total Gage R&R percentage

  • Less than 10%: Excellent measurement system (Can reliably detect small part differences. Appropriate for tight tolerance measurements)
  • 10-30%: Conditionally acceptable (May be acceptable depending on: Application criticality, Measurement cost, Process capability)
  • Over 30%: Needs improvement

21. Number of Distinct Categories

The Number of Distinct Categories (ndc) tells us how many different groups of parts our measurement system can reliably distinguish. It's a practical way to understand if our measurement system is good enough for our needs.

Calculating NDC

NDC = SQRT(2 x (Part Variation / Measurement Variation))

Evaluating NDC

An example: Imagine sorting marbles by size. If ndc = 2, your measurement system can only reliably tell if marbles are "small" or "large". But if ndc = 5, you can reliably sort marbles into 5 size groups. I.e. you can detect smaller differences between parts.

ndc = 2: You can only sort parts into "good" or "bad"

ndc = 5: You can track parts trending toward specification limits

ndc = 10: You can detect subtle process shifts

Automating Gage R&R Calculations

1factory's Gage Calibration and Gage R&R software automates and speeds-up Gage R&R / MSA calculations (total variation, reproducibility, repeatability, NDC). Learn More