#### Title

An Empirical Comparison of Four Data Generating Procedures in Parametric and Nonparametric ANOVA

#### Date of Award

5-1-2011

#### Degree Name

Doctor of Philosophy

#### Department

Educational Psychology

#### First Advisor

Headrick, Todd

#### Second Advisor

Sheng, Yanyan

#### Abstract

The purpose of this dissertation was to empirically investigate the Type I error and power rates of four data transformations that produce a variety of non-normal distributions. Specifically, the transformations investigated were (a) the g-and-h, (b) the generalized lambda distribution (GLD), (c) the power method, and (d) the Burr families of distributions in the context of between-subjects and within-subjects analysis of variance (ANOVA). The traditional parametric F tests and their nonparametric counterparts, the Kruskal-Wallis (KW) and Friedman (FR) tests, were selected to be used in this investigation. The four data transformations produce non-normal distributions that have either valid or invalid probability density functions (PDFs). Specifically, the data generating procedures will produce distributions with valid PDFs if and only if the transformations are strictly increasing - otherwise the distributions are considered to be associated with invalid PDFs. As such, the primary objective of this study was to isolate and investigate the behaviors of the four data transformation procedures themselves while holding all other conditions constant (i.e., sample sizes, effect sizes, correlation levels, skew, kurtosis, random seed numbers, etc. all remain the same). The overall results of the Monte Carlo study generally suggest that when the distributions have valid probability density functions (PDFs) that the Type I error and power rates for the parametric (or nonparametric) tests were similar across all four data transformations. It is noted that there were some dissimilar results when the distributions were very skewed and near their associated boundary conditions for a valid PDF. These dissimilarities were most pronounced in the context of the KW and FR tests. In contrast, when the four transformations produced distributions with invalid PDFs, the Type I error and power rates were more frequently dissimilar for both the parametric F and nonparametric (KW, FR) tests. The dissimilarities were most pronounced when the distributions were skewed and heavy-tailed. For example, in the context of a parametric between subjects design, four groups of data were generated with (a) sample sizes of 10, (b) standardized effect size of 0.50 between groups, (c) skew of 2.5 and kurtosis of 60, (d) power method transformations generating distributions with invalid PDFs, and (e) g-and-h and GLD transformations both generating distributions with valid PDFs. The power results associated with the power method transformation showed that the F-test (KW test) was rejecting at a rate of .32 (.86). On the other hand, the power results associated with both the g-and-h and GLD transformations showed that the F-test (KW test) was rejecting at a rate of approximately .19 (.26). The primary recommendation of this study is that researchers conducting Monte Carlo studies in the context described herein should use data transformation procedures that produce valid PDFs. This recommendation is important to the extent that researchers using transformations that produce invalid PDFs increase the likelihood of limiting their study to the data generating procedure being used i.e. Type I error and power results may be substantially disparate between different procedures. Further, it also recommended that g-and-h, GLD, Burr, and fifth-order power method transformations be used if it is desired to generate distributions with extreme skew and/or heavy-tails whereas third-order polynomials should be avoided in this context.

#### Access

This dissertation is Open Access and may be downloaded by anyone.