www.dfss.nl

Design for Six Sigma

  • Categories
    • Books
    • Case
    • DfSS
    • Download
    • Events
    • News
    • Software
    • Training
  • About Design for Six Sigma
    • Define
    • Identify
    • Design
    • Optimize
    • Verify
    • Monitor
  • Powered by CQM
  • Contact
  • LOGIN

More factors than observations: what can we do?

  • News

In the age of internet of things, connected systems and smart phone apps, the size of data sets is increasing enormously. The size of a dataset can typically be split up into two dimensions: the number of observations and the number of factors (a.k.a. features, variables, parameters, items). Increasing the number of observations typically makes life easier: we get more information, which helps to make valid transfer functions. Increasing the number of factors (we are also measuring/recording more information per observation) could make life better, since the likelihood of recording the useful information also increases. However, this increase may introduce a statistical issue, if we have even more factors than observations.

Figure 1. The size of a dataset is two dimensional: the number of factors and the number of observations
Figure 1. The size of a dataset is two dimensional: the number of factors and the number of observations

 

 

 

 

 

It is easy to see why this is a problem. Think of trying to fit a quadratic line through only 2 points (observations). The quadratic model contains 3 coefficients (a constant, a linear and a quadratic coefficient) that we need to fit, which is more than the number of observations. The problem now is that there are infinitely many possible quadratic curves that fit equally well through the data, and we cannot select the best based on prediction error. The same is true for a dataset that has more factors than observations. We cannot determine which model is best, because there are infinite combinations of coefficient values that fit exactly through the data.

Figure 2. All these quadratic curves fit equally well through the 2 data points
Figure 2. All these quadratic curves fit equally well through the 2 data points

 

 

 

 

 

 

 

In these situations, the traditional regression model fails. Fortunately, there are other options. First of all, we could inspect the structure in the observed data and try to describe all recorded items with fewer dimensions (dimension reduction). This is the idea of, e.g., Principal Component Analysis. However, the structure in the data may not always be described well by a limited number of components. Further, the interpretation of the results may be difficult.

Another option is variable selection: if we can assume that only a few factors influence the response, it makes sense to build a regression model on only those few factors. Of course, the difficulty is to find those factors. Many variable selection techniques exist, like stepwise regression (start with a few factors, add or delete factors from the model in a structured way), all-subsets regression (try all possible regression models containing only k factors) or selection via lasso regularization (limiting the sum of absolute values of the coefficients, thereby forcing many coefficients to become 0).

Stepwise regression is very start-point dependent, and may end up with a predictive model that is not very good. All subsets regression obtains better predictive models, but may take enormous amounts of computation time, especially if the number of possible factors is large (over 50). In our experience, Lasso regularization works better for selection of factors, since it is not start-point dependent and does not require enormous computation time on typical datasets.

Note however, that all aforementioned techniques are exploratory (we try to find a mathematical relation based on observed data, we do not try to confirm a hypothesis). The interpretation of outcomes should therefore be done with care. Selection of factors does not imply that these factors actually caused the response to change; they may be markers for other effects that were the true cause.

13 January, 2016 Erwin Stinstra

Post navigation

Usage of Bayesian Methods in Reliability engineering → ← DoCE for an optimal high voltage tube

Related Posts

Comparison of measurement systems

Introduction of the comparison problem In DfSS and six sigma, the gage R&R study is a commonly known investigation of a measurement system. It studies how different the outcomes are […]

Mastering the reliability of complex systems

Many companies that develop complex systems, such as cars, luggage handling systems, robot arms, or lighting systems, are faced with: increasing complexity: the number of failure modes evolve with complexity, […]

Usage of Bayesian Methods in Reliability engineering

There are many situations where product developers have solid prior information on particular aspects of reliability modelling based on physics of failure or previous experience with the same failure mechanism. […]

DoCE for an optimal high voltage tube

This case study describes the application of Design and Analysis of Computer Experiments, carried out in a real life customer project. Computer experiments are somewhat different to traditional experiments, since […]

Recent Posts

Comparison of measurement systems

Comparison of measurement systems

Introduction of the comparison problem In DfSS and six sigma, the gage R&R study is a commonly known investigation of a measurement system. It studies how different the outcomes are […]

More Info

Mastering the reliability of complex systems

Many companies that develop complex systems, such as cars, luggage handling systems, robot arms, or lighting systems, are faced with: increasing complexity: the number of failure modes evolve [...]

More Info
Usage of Bayesian Methods in Reliability engineering

Usage of Bayesian Methods in Reliability engineering

There are many situations where product developers have solid prior information on particular aspects of reliability modelling based on physics of failure or previous experience with the same [...]

More Info
DoCE for an optimal high voltage tube

DoCE for an optimal high voltage tube

This case study describes the application of Design and Analysis of Computer Experiments, carried out in a real life customer project. Computer experiments are somewhat different to traditional [...]

More Info

Archives

  • February 2017
  • June 2016
  • January 2016
  • November 2015
  • October 2015
  • September 2015
  • July 2015
  • June 2015
  • May 2015
  • April 2015
  • March 2015
  • February 2015
  • January 2015
  • December 2014
  • November 2014
  • October 2014
  • June 2014
  • April 2014
  • March 2014
  • December 2013
  • November 2013
  • August 2013
  • July 2013
  • June 2013
  • May 2013
  • April 2013
  • March 2013
  • February 2013
  • January 2013
  • December 2012
  • October 2012
  • September 2012
  • January 2012
  • December 2011

Design for Six Sigma

* Process improvement strategies such as Six Sigma help to understand and tackle bottlenecks in the production phase in a structured manner. However, about 75% of production problems can be traced back to bad choices in the design phase. To guarantee high quality faster and with lower costs, it is therefore necessary to focus on statistical dispersion (variance reduction) starting at product development. By embedding the desired quality during the design process – Design for Six Sigma (DFSS) – we realize a cheaper process and shorter time to market! *

Define

This phase is about a clear project definition and getting support and approval for execution.

Identify

The main objective for this phase is to describe in more detail who the target customers are and what exactly makes them happy.

Design

This phase results in a high level design , the ‘product architecture’, for the selected concept.

Optimize

The objective of the optimize phase is to generate a detailed product design.

Verify

This phase focuses on the preparation for mass production and realizing the market introduction.

Monitor

In this phase, user, customer and stakeholder satisfaction will be verified.

Powered by WordPress | theme SG Window