Wednesday, March 10, 2010 Home | About | CV | Software | Statistics Calculators | Programming | Web Development
   
  A-priori Sample Size for Multiple Regression

Sample size is extremely important in multiple regression analyses. Without an adequately large sample size, a research study may not possess sufficient statistical power to detect a significant effect. When this happens, a researcher may erroneously conclude that no significant effect exists in their study when, in fact, the sample size was simply not large enough to detect the hypothesized effect. It is thus always prudent for a researcher to conduct an a-priori sample size analysis before collecting data for his or her study.

Computing a-priori sample size for multiple regression requires four input parameters: (1) The alpha (probability) level, (2) The number of predictors in the linear model (not including the intercept), (3) The anticipated effect size (f-square), and (4) The desired statistical power level. Once these four input parameters are known, an a-priori sample size for multiple regression can be computed using the following method:

  1. Set the initial value of the denominator degrees of freedom equal to the number of predictors + 1.
  2. Estimate the critical value of the Fisher F-distribution using the number of predictors, the denominator degrees of freedom, and the probability level.
  3. Compute the noncentrality parameter (lambda).
  4. Estimate the value of the noncentral F-distribution using the number of predictors, the denominator degrees of freedom, lambda, and the critical F-value.
  5. Calculate the current model power by computing the cumulative area under the normal curve from zero to the noncentral F-value.
  6. If the observed statistical power is less than the desired power level, increment the denominator degrees of freedom and repeat steps 2 through 5.
  7. The a-priori sample size is equal to the number of predictors + the denominator degrees of freedom + 1.

ONLINE CALCULATOR

To calculate an a-priori sample size for multiple regression, please click here.

FORMULAE

There are several formulae involved in the computation of an a-priori sample size for multiple regression. These formulae are detailed below.

F-distribution probability density function:

Where d1 and d2 are the numerator and denominator degrees of freedom, and B is the Beta function.

Noncentrality parameter for the F-distribution (lambda):

Where f2 is the effect size and n is the sample size.

Noncentral F-distribution probability density function:

Where v1 and v2 are the numerator and denominator degrees of freedom, λ is the noncentrality parameter, f is the Fisher F-value, and B is the Beta function.

Normal curve cumulative distribution function:

Where μ is the mean, σ is the standard deviation, and erf is the Error function.

REFERENCES

Cohen, J., Cohen, P., West, S.G., and Aiken, L.S. (2003) "Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences (3rd edition)", Lawrence Earlbaum Associates, Mahwah, NJ
Cohen, J. (1988) "Statistical Power Analysis for the Behavioral Sciences (2nd Edition)", Lawrence Earlbaum Associates, Hillsdale, NJ
 
 
 
  You may also be interested in:
  • Interaction - a software program by Daniel Soper for drawing and analyzing statistical interactions.
  • N2Mplus - a free software program by Daniel Soper for converting Excel and SPSS files into Mplus-compatible data files and syntax.
 
Citing these resources. Copyright © 2007-2010 by Daniel S. Soper. All rights reserved.