top of page

Test Method Validation of Continuous Non-Destructive Measurements

Updated: Mar 14

Also known as Gage Repeatability and Reproducibility!

Gage Repeatability & Reproducibility

In the previous blog post about Test Method Validation, we already discussed different types of Test Method Validations, depending on the type of data and whether the test method is destructive or not. This blog is about a variable, non-destructive test method, e.g., measuring an injection molded part. This type of Test Method Validation (TMV) might be the best way to show what TMV is all about.

This post talks about:

  • Why you should perform an IQ

  • Why the measurement range is necessary

  • How to approach a GR&R

  • How to interpret the results of a GR&R (based on Minitab)

Why should you perform an IQ?

The equipment and fixtures used during the test method shall have an installation qualification (IQ) performed, the software, including the spreadsheets for calculations, shall be validated [1].

The GHTF defines Installation Qualification (IQ) as follows:

“establishing by objective evidence that all key aspects of the process (here test) equipment and ancillary system installation adhere to the manufacturer’s approved specification and that the recommendations of the supplier of the equipment are suitably considered.” [3].

So, an IQ basically asks whether the equipment is installed correctly.

However, installed correctly also means that the equipment has been “installed” within your quality management system (QMS), e.g., preventative maintenance, calibration, and other relevant subsystems. The same goes for the software and/or spreadsheets used for a test method.

Why is the measurement range important?

Let us first explain what we mean by a measurement range.

The measurement range is the range of values for which the method has been validated in terms of accuracy and precision [1].

There are two different ways to consider the measurement range in a gage R&R.

One is to consider the measurement range only for the specification range of a particular test method, and the other is to consider it for multiple specification ranges. The specification range is the design requirement that defines the allowable range within which the variable must be controlled [1]. Thus, if a test method is used for only one product requirement, the measurement range may be the specification range for that requirement. However, if multiple requirements are to be evaluated, the measurement must encompass the specification range of all product requirements.

This may be somewhat confusing, so we recommend performing individual gage R&R studies for individual specification ranges. This may save you a challenging argument with the auditor.

How to approach a Gage R&R?

So, we already know that we have variable data (as opposed to attribute) and that we test in a non-destructive way (as opposed to a destructive way) – that's quite a lot.

The next step is to design the study and decide on the number of the following three items:

1. operators (O),

2. parts (P) and

3. repetitions (R).

With the number of operators (O), we can assume that the more operators we use, the better we understand possible weaknesses. At least three operators must be selected from the pool of regular operators. Each operator must then independently perform the entire test method, including any necessary preparatory steps (e.g., calibration or sample preparation) [1].

The number of parts (P) should be representative of the measurement range. In a single specification range, parts would be selected near the edges and center of the specification range [1].

The number of repetitions (R) should satisfy the following equation to ensure the precision of the measurement device is estimated to be within 25% of its actual value [1]:


Three operators (O) should randomly measure ten samples (P) at least two times (R) to satisfy the above equation. Studies with 60 or more measurements should be aimed for.

Now that we know how many operators, samples, and repetitions we need, we can execute the actual measurements. The execution of the measurements should be randomized. We recommend you organize the results in the following way:

After gathering the data, use statistical software to analyze it. We proceed using Minitab as our statistical software of choice.

How to interpret the results of a Gage R&R study (based on Minitab)?

When running an analysis in Minitab, we get quite a lot of tables and graphs. Let’s start with looking at the graphs that Minitab provides (see Figure 1).

In the top right, we see the Measurement by Part graph. This contains all measurements, and outliers would be apparent here. We can also see in the graph below that the three operators have different means and different standard deviations, with operator C having the smallest and operator B having the most significant standard deviation.

The next step is to check the ANOVA table Minitab provides. The p-values provided are calculated for each source (Part, Operator, and Part*Operator). All p-values small or equal to 0.05 are considered statistically significant; thus, one can assume that 95% of parts, operator, and part*operator are significant.

NOTE: statistically significant only means that there is a detectable difference, but not that this difference has any technical meaning [1].

Next, we consider the Gage Evaluation provided by Minitab. The acceptance criteria for a gage R&R study is a Total Gage R&R %Tolerance of £ 30%. In Figure 3, the Total Gage R&R %Tolerance for this example is 99,92%, so the study failed. In evaluating the individual Rs, it appears that the R for reproducibility is currently the largest contributor to a poor GRR result. However, even repeatability alone would not be passing.

The high value for reproducibility means that the operators have problems aligning with each other. We already discussed this and can be seen in the Measurement by Operator graph in Figure 1.

Now that we know that we failed and where the problem is coming from, we need to improve the test method by improving reproducibility.

Author: Simon Föger