In our experience, most people don’t realize that a measurement value doesn’t always accurately reflect that which it is supposed to be measuring. And most people are also far too optimistic about the measurement error involved.

THE IMPORTANCE OF MEASUREMENT SYSTEM EVALUATION

Many decisions regarding products and processes are based on measurement values. Unreliable measurement systems can lead to mistakenly adjusting processes and even to the unjustified release of products for mass production.

During the verification phase of a product’s development, all critical parameters need to be validated. It therefore follows that all measurement methods for those parameters also need to have been evaluated.

THE KEY ASPECTS

A measurement system evaluation covers the following aspects (see figure 1):

**Definition of the characteristic**

The measurement system assigns a value that needs to reflect a certain product or process feature. Sometimes a definition may seem straightforward when actually it isn’t.

Take, for example, a situation where we want to know the diameter of a product that’s designed to be perfectly round, but is in fact oval: an oval product the diameter is not defined.

Similarly, the layer thickness of a product doesn’t exist if that layer thickness is not constant across a small area. Repeated measurements will lead to spread that is actually the result of bad definition. A good definition is based on the function of the layer. For example, if it’s an electric isolation layer, the minimum layer would be appropriate.

**Discrimination**

This is the smallest readable unit. That is, the smallest scale unit of measurement or output for an instrument. It is also referred to as *resolution*.

**Stability**

The measurement system should be stable in time and have an equal performance over time. In practice, measurement tools are re-calibrated at regular intervals to assure this stability. In fact, stability can be seen as an aspect of reproducibility.

**Linearity**

The performance should also be equal over the entire range of values that the system could be used to measure. After all, it’s not uncommon for larger values to have larger deviations than smaller ones!

**Accuracy – Systematic deviation**

This is the difference between the true value and the average of many repeated measurements. The true value is usually represented by a so-called golden device or sample used for calibration purposes.

**Precision – Random deviation**

This is the difference in results when the same item is measured again and again. This random deviation is divided in two categories: Repeatability and Reproducibility:

Repeatabilityis the spread of repeated measurements under almost identical conditions — the same person, same device, same position, same laboratory, at very short intervals, etc.

Reproducibilityis additional spread under different conditions. In most evaluations only one source of variation (the largest) is investigated. Typically this is the operator: the person executing the measurements. Other potential large sources of variation are due to differences between measurement tools, laboratories, days, etc.

One experiment to determine these random deviations is to take 10 products or parts and 3 operators (or any other large source of variation), and have each operator measure each part three times repeatedly. At the end of this article we give an example of such an experiment.

**And finally…
**The evaluation is a matter of first, in the case of large systematic deviations, recommending a re-calibration; and secondly, assessing the acceptability of the system by comparing the resolution and the established random deviations with the area of interest. Depending on the measured characteristic, areas of interest could be:

- product parameter specifications: specification width, USL – LSL, or
- process variation of a process parameter: 99% of the distribution, 5.15 * s
*process spread*

The rules of thumb for assessing the acceptability are:

- Resolution < 10% * Area of interest
- Repeatability and Reproducibility, also called Gage Repeatability and Reproducibility (GR&R) :

Recognizing that measurement systems contribute to the variation observed in product and process parameters, and therefore evaluating these systems, can help avoid wrong decisions, and their potentially disastrous consequences!

**Measurement System Evaluation experiment**

During the verification of a new product design, one of the CTQ (Critical to Quality) parameters is the *length* of one of the key components.

The specifications of the length are 100 ± 5. A measurement system evaluation (MSE) has been carried out on the measurement system, where 10 different key components have all been measured by 3 different operators on 3 consecutive occasions. Table 1 shows the results of these 90 measurements.

Figure 2 shows a graphical representation of these results. The graph is interpreted as follows:

- The first square is the average of the three measurements by operator 1 on component 1. These 3 individual measurements are connected by the green bar.
- The second square is the average for the measurements of operator 2 on component 1, etc.
- The three operator averages are connected by the blue line.
- So the first series of 1, 2 & 3 on the x-axis represents all 9 measurements on component 1.
- Note that the green bars for the measurements of operators 2 and 3 on component 1 are hardly noticeable, due to the small differences in the repeated measurements (see Table 1).

Representing the data in this so-called Multi-Vari-Chart clearly reveals the different sources of variations. The green lines show Repeatability, the blue lines show Reproducibility (operator-to-operator variations), and by comparing group with group we can see variations between components.

But before reading on, a quick question: which source of variation is the largest?

Analysis of variance is used to determine the magnitude of these sources of variations, delivering the following results (Note that standard deviations are added up quadratically in order to arrive at the overall GR&R):

The resolution of 0.1 is 1% of the area of interest, and hence acceptable.

When it comes to the GR&R, it turns out that the within-operator variation (the variation caused by repeating the measurement by an operator) is by far the largest. The root cause will need to be investigated, but this could potentially be due to replacing and realigning the component for the repeated measurement.

The deviation due the measurement system alone makes up 48.7% of the specification window for this CtQ parameter. And that is, obviously, unacceptable!