The purpose of this study was to investigate Type I error rate of the IRT-Likelihood Ratio (IRT-LR) statistic and Mantel Test in detecting DIF. A multiple replication Monte Carlo study was utilized for simulated data sets. In final study design, there were 18 conditions [3 (sample size) x 3 (group mean difference) x 2 (methods of DIF detection)]. WinGen3 was used to simulate ability estimates and to generate response data sets. MULTİLOG and DIFAS were used to conduct the Mantel and IRT-LR DIF analyses. Results indicated that with equal group distribution, Mantel Test and IRT-LR Test performed similarly under all testing conditions and had better Type I error rate control. Large sample size and presence of group mean difference tended to inflate the Type I error rates of both DIF detection tests. IRT-LR had higher Type I error rates than Mantel Test when large sample size and when group mean difference conditions.
Published in | American Journal of Applied Psychology (Volume 5, Issue 6) |
DOI | 10.11648/j.ajap.20160506.11 |
Page(s) | 38-43 |
Creative Commons |
This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited. |
Copyright |
Copyright © The Author(s), 2016. Published by Science Publishing Group |
Differential Item Functioning, Monte Carlo, Polytomous Items, Type I Error
[1] | S. Messick, "Validity of psychological assessment: Validation of inferences from persons’ responses and performances as scientific inquiry into score meaning", American Psychologist, 50 (9), pp. 741-749, 1995. doi: 10.1037/0003-066X.50.9.741 |
[2] | Crocker, L and J. Algina, Introduction to Classical & Modern Test Theory. CA: Wadsworth Group, 1986, pp.217-236. |
[3] | W. H. Angoff, Perspectives on Differential Item Functioning Methodology. In P. W. Holland, and H. Wainer (Eds.), Differential Item Functioning (pp. 3-23). Hillsdale, NJ: Erlbaum, 1993. |
[4] | M. J. Gierl, J Bisanz, G. L. Bisanz, K. A. Boughton, and S. Khaliq,"Illustrating the utility of differential bundle functioning analyses to identify and interpret group differences on achievement tests", Educational Measurement: Issues and Practice, vol. 20, pp. 26–36, 2001. doi: 10.1111/j.1745-3992.2001.tb00060.x |
[5] | C. F. Furlow, R. Fouladi, P. Gagné, and Whittaker, T. "A Monte Carlo study of the impact of missing data and differential item functioning on theta estimates under two polytomous Rasch family models", Journal of Applied Measurement, vol. 8 (4), pp. 388-403, 2007. |
[6] | R. D. Ankenmann, E. A Witt, and S. B. Dunbar, "An investigation of the power of the likelihood ratio goodness of fit statistic in detecting differential item functioning", Journal of Educational Measurement, vol. 36 (4), pp. 277-300, 1999. doi: 10.1111/j.17453984.1999.tb00558.x |
[7] | R. Zwick, J. R. Donoghue, and A. Grima, "Assessment of differential item functioning for performance tasks". Journal of Educational Measurement, vol. 30, pp. 233-251, 1993. doi:10.1111/j.1745-3984.1993.tb00425.x |
[8] | D. Thissen, L. Steinberg, and H. Wainer, "Detection of differential item functioning using the parameters of item response models", In P. W. Holland and H. Wainer (Eds.), Differential item functioning (pp. 67-113). Hillsdale NJ: Erlbaum, 1993. |
[9] | R. Ostini, and M. L. Nering, Polytomous Item Response Theory Models. CA: Sage, 2006. |
[10] | W.-C. Wang, and Y.-H Su, "Factors influencing the Mantel and generalized Mantel Haenszel methods for the assessment of differential item functioning in polytomous items", Applied Psychological Measurement, vol. 28, pp. 450-481, 2004. doi:10.1177/0146621604269792. |
[11] | D. Thissen, and H. Wainer, Test Scoring, New Jersey: Lawrence Erlbaum, 2001. |
[12] | J. P. Meyer, H Huynh, and M. A. Seaman, "Exact small-sample differential item functioning methods for polytomous items with illustration based on an attitude survey", Journal of Educational Measurement, vol. 41 (4), pp. 331-344, 2004. doi:10.1111/j.1745 3984.2004.tb01169.x |
[13] | P. Garrett, A Monte Carlo Study Investigating Missing Data, Differential Item Functioning and Effect Size. Unpublished doctoral dissertation, Georgia StateUniversity, 2009. |
[14] | H. Dodeen, "Stability of differential item functioning over a single population in survey data", Journal of Experimental Education, vol.72, pp. 181-193, 2004. doi: 10.3200/JEXE.72.3.181-193. |
[15] | W. S. Wood, "DIF testing for ordinal items with Poly-SIBTEST, the Mantel and GMH Tests, and IRT-LR-DIF when the latent distribution is non normal for both groups", Applied Psychological Measurement, vol. 35 (2), pp. 145–164, 2011. doi: 10.1177/0146621610377450 |
[16] | B. Artar, Differential ItemFunctioning Analyses For Mixed Response Data Using IRT Likelihood-RatioTest, Logistic Regression and Gllamm Procedures. Unpublished doctoral dissertation, Florida State University, 2007. |
[17] | D. M. Bolt, "A Monte Carlo comparison of parametric and nonparametric polytomous DIF detection methods", Applied Measurement in Education, vol. 15, pp. 113-141, 2002. doi:10.1207/S15324818AME1502_01 |
[18] | K., A. Johnson- Frotman, the Evaluation of New Criteria for Polytomous DIF in the DFIT Framework. Unpublished doctoral dissertation. Illinois Institute of Technology, Chigago, 2007. |
[19] | J. V. Bradley, Robustness? British Journal of Mathematical & Statistical Psychology, vol. 31, pp144-152, 1978. |
[20] | Y. Chang, W. Huang, and R. Tsai, "DIF detection using multiple-group categorical CFA with minimum free baseline approach", Journal of Educational Measurement, vol. 52 (2), pp. 181-199, 2015. doi: 10.1111/jedm.12073 |
[21] | P. Elosua, and C. Wells, "Detecting DIF in polytomous items using MACS, IRT and ordinal logistic regression", Psicologica: International Journal of Methodology and Experimental Psychology, vol. 34 (2), pp. 327-342, 2013. |
[22] | M. L. Bahry, Polytomous Item Response Theory Parameter Recovery: An Investigation of Non-Normal DistributionsAnd Small Sample Size. Unpublished master’s thesis, University of Alberta, Canada 2012. |
APA Style
Safiye Bilican Demir. (2016). Comparison of DIF Detection Performances of Mantel Test and Likelihood Ratio Test. American Journal of Applied Psychology, 5(6), 38-43. https://doi.org/10.11648/j.ajap.20160506.11
ACS Style
Safiye Bilican Demir. Comparison of DIF Detection Performances of Mantel Test and Likelihood Ratio Test. Am. J. Appl. Psychol. 2016, 5(6), 38-43. doi: 10.11648/j.ajap.20160506.11
@article{10.11648/j.ajap.20160506.11, author = {Safiye Bilican Demir}, title = {Comparison of DIF Detection Performances of Mantel Test and Likelihood Ratio Test}, journal = {American Journal of Applied Psychology}, volume = {5}, number = {6}, pages = {38-43}, doi = {10.11648/j.ajap.20160506.11}, url = {https://doi.org/10.11648/j.ajap.20160506.11}, eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ajap.20160506.11}, abstract = {The purpose of this study was to investigate Type I error rate of the IRT-Likelihood Ratio (IRT-LR) statistic and Mantel Test in detecting DIF. A multiple replication Monte Carlo study was utilized for simulated data sets. In final study design, there were 18 conditions [3 (sample size) x 3 (group mean difference) x 2 (methods of DIF detection)]. WinGen3 was used to simulate ability estimates and to generate response data sets. MULTİLOG and DIFAS were used to conduct the Mantel and IRT-LR DIF analyses. Results indicated that with equal group distribution, Mantel Test and IRT-LR Test performed similarly under all testing conditions and had better Type I error rate control. Large sample size and presence of group mean difference tended to inflate the Type I error rates of both DIF detection tests. IRT-LR had higher Type I error rates than Mantel Test when large sample size and when group mean difference conditions.}, year = {2016} }
TY - JOUR T1 - Comparison of DIF Detection Performances of Mantel Test and Likelihood Ratio Test AU - Safiye Bilican Demir Y1 - 2016/11/25 PY - 2016 N1 - https://doi.org/10.11648/j.ajap.20160506.11 DO - 10.11648/j.ajap.20160506.11 T2 - American Journal of Applied Psychology JF - American Journal of Applied Psychology JO - American Journal of Applied Psychology SP - 38 EP - 43 PB - Science Publishing Group SN - 2328-5672 UR - https://doi.org/10.11648/j.ajap.20160506.11 AB - The purpose of this study was to investigate Type I error rate of the IRT-Likelihood Ratio (IRT-LR) statistic and Mantel Test in detecting DIF. A multiple replication Monte Carlo study was utilized for simulated data sets. In final study design, there were 18 conditions [3 (sample size) x 3 (group mean difference) x 2 (methods of DIF detection)]. WinGen3 was used to simulate ability estimates and to generate response data sets. MULTİLOG and DIFAS were used to conduct the Mantel and IRT-LR DIF analyses. Results indicated that with equal group distribution, Mantel Test and IRT-LR Test performed similarly under all testing conditions and had better Type I error rate control. Large sample size and presence of group mean difference tended to inflate the Type I error rates of both DIF detection tests. IRT-LR had higher Type I error rates than Mantel Test when large sample size and when group mean difference conditions. VL - 5 IS - 6 ER -