R&R (repeatability & reproducibility) studies are performed in order to get information about the performance of a measurement system, in terms of the variation you will see if you measure a product over and over again. This variation can have a single source, such as a short-term variation, or multiple sources, such as short-term, long-term and variation due to different operators.
Products are often a factor, too. But since an R&R study is a study of measurement, variation knowledge of the product effect is not our primary concern here. Different products are only included because we tend to want to check whether the measurement variation is different from product to product (i.e. study interactions between products and any other factor in the model), which is why the products used should vary widely.
For the moment, let’s assume that measurement variation does not depend on product characteristics. In order to characterise measurement variation we could, as an experiment, use a random product (measurement vehicle) and measure that product over and over again. The variation (standard deviation or variance) we see between those measurements is the main objective of our study. The next step is to compare the standard deviation(s) with, for example, the tolerance width (W) and calculate R&R=6s/W*100% — the so-called Gage R&R, or acceptation level.
A well-known criterion for this Gage R&R is: ‘accept’ if R&R<10%. This means that the variation in the measurement is small compared to the variation that is allowed in the products. But what if we find a value >10%, or worse still >>10%?
One solution often used is to develop or buy a new measurement system, one that will have a Gage R&R <10%. There is, however, a second option that is often overlooked: do more measurements and take the average of those measurements as the output of the measurement system.
A simple example: Let’s first take this very simple situation. Assume that measurement variation is not a function of the product measured. So the R&R study is simply to “measure a random product over and over again (say, n times)”. If S is the standard deviation of these measurements, and W the tolerance width, then G(1)=6*S/W is the estimate of the Gage R&R if we use a single measurement as our measurement tool output.
However, if we decide to use an average of m measurements on the same product as our measurement tool output, the Gage R&R would improve to G(m)=6*S/(W*√m). For m=9 this would mean an improvement of 1-1/√9 or 67%! The proof is given by a basic theorem in statistics, which states that the variance of the mean of m independent observations is σ2/m, with σ2 the variance of a single observation.
A more complex example. Very often we see that the measurement error of a measurement system is determined by the one doing the measuring. A statistical model that describes the output for a single given product is:
With μ the general mean of the measurement, αi ~N(0,σα2) the contribution of operator i to the measurement, and εij~N(0, σ2) the random noise present at the jth measurement of operator i.
A standard Gage R&R estimates the variance components, and assumes that we will use a single measurement by taking a random subject and asking him/her to measure our product just once. The standard deviation of that single measurement is then equal to √(σα2+ σ2) and the Gage R&R can be calculated if the tolerance width W is known. If the Gage R&R is sufficient, then… happy days! But if it isn’t, we could suggest defining a measurement system output as the average of the observations taken from I operators over J repeats per operator. It can be shown that the standard deviation of that new (averaged) measurement value is equal to:
Which can be made as small as one likes simply by increasing I and J. By doing this, the Gage R&R can be improved to a value that is sufficiently small; and a variety of options are often possible. In which case is it smart to choose the cheapest option available.
Conclusion. Gage R&Rs are often carried out assuming that a measurement system can only deliver its standard (single) measurement result. By making use of (smart) replications, and then taking the average of those results, the standard error of this average result can be decreased, improving the Gage R&R. So if the cost of doing replications is acceptable, there is now no need to buy or develop a new measurement system.