Raters' Rating Quality in Assessing Students' Assignment: An Application of Multi-Facet Rasch Measurement
Authors
Jabatan Ilmu Pendidikan, Institut Pendidikan Guru Kampus Tengku Ampuan Afzan (Malaysia)
Jabatan Ilmu Pendidikan, Institut Pendidikan Guru Kampus Tengku Ampuan Afzan (Malaysia)
Jabatan Ilmu Pendidikan, Institut Pendidikan Guru Kampus Tengku Ampuan Afzan (Malaysia)
Jabatan Ilmu Pendidikan, Institut Pendidikan Guru Kampus Tengku Ampuan Afzan (Malaysia)
Jabatan Ilmu Pendidikan, Institut Pendidikan Guru Kampus Tengku Ampuan Afzan (Malaysia)
Article Information
DOI: 10.47772/IJRISS.2025.910000193
Subject Category: Social science
Volume/Issue: 9/10 | Page No: 2330-2342
Publication Timeline
Submitted: 2025-10-07
Accepted: 2025-10-14
Published: 2025-11-07
Abstract
Assessing students' assignments is essential as it reflects students' understanding and achievements. This study evaluates the marking quality among lecturers at the Teacher Education Institute (IPG) using the Multi-Facet Rasch Measurement (MFRM) model. Two hundred thirty-two students from the Postgraduate Diploma in Education (PDPP) program submitted written assignments, which were assessed by experienced lecturers. The analysis was conducted using FACETS 4.1.1 software, involving three main facets: candidates, raters, and assessment criteria. The findings indicate that the instrument used is valid in terms of construct and meets unidimensionality requirements. The reliability value of the raters was high (0.81), and the rater separation index exceeded the set threshold (2.09), indicating stability in marks given by lecturers. However, two raters were identified as showing misfit and overfit patterns respectively, suggesting inaccuracies in scoring. The Wright map and unexpected response analysis also revealed differences in the severity among raters and potential bias. These findings are valuable for the IPG in improving monitoring of inter-rater reliability and marking consistency. This study also shows that MFRM can provide comprehensive information and contribute to understanding the analysis of assessor consistency with quantitative evidence. MFRM is a suitable alternative model to overcome the limitations of the Classical Test Theory (CTT) statistical model, especially in analyses involving multiple raters.
Keywords
Rating quality, lecturers, MFRM
Downloads
References
1. M. H. Mazarul Hasan, Z. Norazimah, and M. Suhaila, “Readiness level of primary school teachers in Klang district, Selangor in the implementation of in-class assessment from the aspect of knowledge,” International Journal of Modern Education, vol. 3, no. 9, pp. 1–8, 2021. [Google Scholar] [Crossref]
2. M. E. @ E. Mohd Matore and M. F. Mohd Noh, Kualiti Penandaan Guru Dalam Pentaksiran Pendidikan. Penerbit UKM, 2023. [Google Scholar] [Crossref]
3. O. Ribut, Y. Pradana, A. Mashuri, L. S. Nirawati, ) Stkip, and M. Ngawi, “Pengaruh Model Pembelajaran Kooperatif Think Pair Share (TPS) Menggunakan Assessment For Learning Pada Prestasi Siswa Sekolah Menengah Pertama,” Jurnal Karya Pendidikan Matematika, vol. 6, 2019, [Online]. Available: http://jurnal.unimus.ac.id/index.php/JPMat/index [Google Scholar] [Crossref]
4. N. Hanafi, S. Farmasari, M. Mahyuni, M. Amin, and Y. B. Lestari, “Pelatihan Pengembangan Model Penilaian Otentik (Authentic Assessment) pada Pembelajaran Bahasa Inggris Sekolah Dasar bagi Guru-Guru Bahasa Inggris Sekolah Dasar Di Kabupaten Lombok Barat,” Jurnal Pengabdian Magister Pendidikan IPA, vol. 4, no. 2, Jul. 2021, doi: 10.29303/jpmpi.v4i2.855. [Google Scholar] [Crossref]
5. G. Engelhard Jr and S. Wind, Invariant measurement with raters and rating scales: Rasch models for rater-mediated assessments. Routledge, 2018. [Google Scholar] [Crossref]
6. S. A. Wind, “Examining the Impacts of Rater Effects in Performance Assessments,” Appl Psychol Meas, vol. 43, no. 2, pp. 159–171, Mar. 2019, doi: 10.1177/0146621618789391. [Google Scholar] [Crossref]
7. S. Sutrio, H. Sahidu, A. Harjono, I. W. Gunada, and H. Hikmawati, “Pelatihan Pembelajaran IPA Berbasis Inkuiri Berbantuan KIT Bagi Guru-Guru SD Di Kota Mataram,” Jurnal Pengabdian Masyarakat Sains Indonesia, vol. 2, no. 2, Nov. 2020, doi: 10.29303/jpmsi.v2i2.80. [Google Scholar] [Crossref]
8. D. Beutel, L. Adie, and M. Lloyd, “Assessment moderation in an Australian context: processes, practices, and challenges,” Teaching in Higher Education, vol. 22, no. 1, pp. 1–14, Jan. 2017, doi: 10.1080/13562517.2016.1213232. [Google Scholar] [Crossref]
9. J. Mason and L. D. Roberts, “Consensus moderation: the voices of expert academics,” Assess Eval High Educ, vol. 48, no. 7, pp. 926–937, 2023. [Google Scholar] [Crossref]
10. Y. Elizabeth Patras et al., “Meningkatkan Kualitas Pendidikan Melalui Kebijakan Manajemen Berbasis Sekolah Dan Tantangannya,” Jurnal Manajemen Pendidikan, vol. 7, no. 2, 2019. [Google Scholar] [Crossref]
11. D. Amir and A. Basit, “Kompetensi Pedagogik dan Profesional Mahasiswa Jurusan PAI pada Pelaksanaan PPL Tahun Akademik 2017/2018,” 2018. [Google Scholar] [Crossref]
12. S. V. Makwana, “Use of Grading System in Education,” RESEARCH REVIEW International Journal of Multidisciplinary, vol. 9, no. 4, pp. 260–266, Apr. 2024, doi: 10.31305/rrijm.2024.v09.n04.032. [Google Scholar] [Crossref]
13. T. A. Chowdhury, “International Journal of Linguistics and Translation Studies,” International Journal of Linguistics and Translation Studies, vol. 1, no. 1, pp. 32–1, 2020, doi: https://doi.org/10.36892/IJLTS.V1I1.14. [Google Scholar] [Crossref]
14. N. Abd Rahman, S. E. Mokshein, and H. Ahmad, “Validity and Reliability of Students’ Mathematical Process Rubric (Prom3) based on many-Facet Rasch Model (MRFM).,” International Journal of Academic Research in Business and Social Sciences, vol. 11, no. 2, Feb. 2021, doi: 10.6007/ijarbss/v11-i2/9207. [Google Scholar] [Crossref]
15. N. Heidari, N. Ghanbari, and A. Abbasi, “Raters’ perceptions of rating scales criteria and its effect on the process and outcome of their rating,” Language Testing in Asia, vol. 12, no. 1, Dec. 2022, doi: 10.1186/s40468-022-00168-3. [Google Scholar] [Crossref]
16. Y. Du, B. D. Wright, and W. L. Brown, “Differential Facet Functioning Detection in Direct Writing Assessment,” in Paper presented at the Annual Conference of the American Educational Research Association, Apr. 1996. [Google Scholar] [Crossref]
17. P. Sureeyatanapas, P. Sureeyatanapas, U. Panitanarak, J. Kraisriwattana, P. Sarootyanapat, and D. O’Connell, “The analysis of marking reliability through the approach of gauge repeatability and reproducibility (GR&R) study: a case of English-speaking test,” Language Testing in Asia, vol. 14, no. 1, Dec. 2024, doi: 10.1186/s40468-023-00271-z. [Google Scholar] [Crossref]
18. M. T. Braverman and M. E. Arnold, “An evaluator’s balancing act: Making decisions about methodological rigor,” New Dir Eval, vol. 2008, no. 120, pp. 71–86, 2008, doi: 10.1002/ev.277. [Google Scholar] [Crossref]
19. J. Liu and L. Xie, “Examining Rater Effects in a WDCT Pragmatics Test,” Iranian Journal of Language Testing, vol. 4, no. 1, p. 50, 2014. [Google Scholar] [Crossref]
20. C. Myford and E. W. Wolfe, “Detecting and Measuring Rater Effects Using Many-Facet Rasch Measurement: Part II,” 2004. [Google Scholar] [Crossref]
21. S. A. Wind and Y. Ge, “Detecting Rater Biases in Sparse Rater-Mediated Assessment Networks,” Educ Psychol Meas, vol. 81, no. 5, pp. 996–1022, Oct. 2021, doi: 10.1177/0013164420988108. [Google Scholar] [Crossref]
22. T. G. Bond and C. M. Fox, Applying the Rasch model: Fundamental measurement in the human sciences. Psychology Press, 2013. [Google Scholar] [Crossref]
23. J. M. Linacre, “A user’s guide to WINSTEPS® MINISTEP: Rasch-model computer programs,” Chicago: MESA Press., p. 719, 2016. [Google Scholar] [Crossref]
24. G. Zhang, “Research on the application of multifaceted Rasch model analysis software facets in English test,” in International Conference on Mechanisms and Robotics (ICMAR 2022), SPIE, 2022, pp. 760–765. [Google Scholar] [Crossref]
25. C. Elder, G. Barkhuizen, U. Knoch, and J. von Randow, “Evaluating rater responses to an online training program for L2 writing assessment,” Language Testing, vol. 24, no. 1, pp. 37–64, Jan. 2007, doi: 10.1177/0265532207071511. [Google Scholar] [Crossref]
26. N. Khabbazbashi and E. D. Galaczi, “A comparison of holistic, analytic, and part marking models in speaking assessment,” Language Testing, vol. 37, no. 3, pp. 333–360, Jul. 2020, doi: 10.1177/0265532219898635. [Google Scholar] [Crossref]
27. B. Sumintono, Aplikasi pemodelan rasch pada assessment pendidikan. Penerbit Trim Komunikata, 2015. [Google Scholar] [Crossref]
28. H. Y. Huang, “Modeling Rating Order Effects Under Item Response Theory Models for Rater-Mediated Assessments,” Appl Psychol Meas, vol. 47, no. 4, pp. 312–327, Jun. 2023, doi: 10.1177/01466216231174566. [Google Scholar] [Crossref]
29. B. Sumintono and W. Widhiarso, Aplikasi model rasch: Untuk penelitian ilmu-ilmu sosial. Trim Komunikata Publishing House, 2013. [Google Scholar] [Crossref]
30. Linacre, “Standard errors: means, measures, origins and anchor values,” Rasch Measurement Transactions, vol. 19, no. 3, p. 1030, 2005. [Google Scholar] [Crossref]
31. C. Van Zile-Tamsen, “Using Rasch Analysis to Inform Rating Scale Development,” Res High Educ, vol. 58, no. 8, pp. 922–933, Dec. 2017, doi: 10.1007/s11162-017-9448-0. [Google Scholar] [Crossref]
32. H. Misbach and B. Sumintono, “Pengembangan dan Validasi Instrumen ‘Persepsi Siswa tehadap Karakter Moral Guru’ di Indonesia dengan Model Rasch 1,” PROCEEDING Seminar Nasional Psikometri, vol. 148162, May 2014. [Google Scholar] [Crossref]
33. B. C. Wesolowski and S. A. Wind, “Pedagogical Considerations for Examining Rater Variability in Rater-Mediated Assessments: A Three-Model Framework,” J Educ Meas, vol. 56, no. 3, pp. 521–546, Sep. 2019, doi: 10.1111/jedm.12224. [Google Scholar] [Crossref]
34. M. Wu, “Some IRT-based analyses for interpreting rater effects,” 2017. [Google Scholar] [Crossref]
35. S. M. Wu and S. Tan, “Managing rater effects through the use of FACETS analysis: the case of a university placement test,” Higher Education Research and Development, vol. 35, no. 2, pp. 380–394, Mar. 2016, doi: 10.1080/07294360.2015.1087381. [Google Scholar] [Crossref]
36. J. M. Linacre, A User’s guide to WINSTEPS® Rasch-model computer programs: Program manual 4.4. 6. 2019. [Google Scholar] [Crossref]
37. D. Erguvan and B. Aksu Dunya, “Analyzing rater severity in a freshman composition course using many facet Rasch measurement,” Language Testing in Asia, vol. 10, no. 1, Dec. 2020, doi: 10.1186/s40468-020-0098-3. [Google Scholar] [Crossref]
38. T. Eckes, “Examining Rater Effects in TestDaF Writing and Speaking Performance Assessments: A Many-Facet Rasch Analysis,” Lang Assess Q, vol. 2, no. 3, pp. 197–221, Oct. 2005, doi: 10.1207/s15434311laq0203_2. [Google Scholar] [Crossref]
39. T. Eckes, Introduction to many-facet Rasch measurement. Peter Lang, 2023. [Google Scholar] [Crossref]
40. C. Myford and E. W. Wolfe, “Detecting and Measuring Rater Effects Using Many-Facet Rasch Measurement: Part I,” 2003. [Online]. Available: https://www.researchgate.net/publication/9069043 [Google Scholar] [Crossref]
41. L. H. Asbulah, M. A. Lubis, A. Aladdin, and M. Sahrim, “KESAHAN DAN KEBOLEHPERCAYAAN INSTRUMEN PENGETAHUAN KOLOKASI BAHASA ARAB IPT (i-KAC IPT) MENGGUNAKAN MODEL PENGUKURAN RASCH,” ASEAN Comparative Education Research Journal on Islam and Civilization (ACER-J), vol. 2, no. 1, pp. 97–106, 2018. [Google Scholar] [Crossref]
42. E. R. Lai, E. W. Wolfe, and D. H. Vickers, “Halo Effect s and Analytic Scoring: A Summary of Two Empirical Studies Research Report,” 2012. [Online]. Available: http://www.pearsonassessments.com/research. [Google Scholar] [Crossref]
43. Y. A. Rahman, F. Apriyanti, and R. A. Nurdini, “Rater Severity/Leniency and Bias in EFL Students’ Composition Using Many-Facet Rasch Measurement (MFRM),” Scope : Journal of English Language Teaching, vol. 8, no. 1, p. 258, Oct. 2023, doi: 10.30998/scope.v8i1.19432. [Google Scholar] [Crossref]
44. Z. Mohd Zabidi, B. Sumintono, and Z. Abdullah, “Enhancing analytic rigor in qualitative analysis: developing and testing code scheme using Many Facet Rasch Model,” Qual Quant, vol. 56, no. 2, pp. 713–727, Apr. 2022, doi: 10.1007/s11135-021-01152-4. [Google Scholar] [Crossref]
45. R. Mohamat, B. Sumintono, and H. S. Abd Hamid, “Raters’ Assessment Quality in Measuring Teachers’ Competency in Classroom Assessment: Application of Many Facet Rasch Model,” Asian Journal of Assessment in Teaching and Learning, vol. 12, no. 2, pp. 71–88, Nov. 2022, doi: 10.37134/ajatel.vol12.2.7.2022. [Google Scholar] [Crossref]
Metrics
Views & Downloads
Similar Articles
- The Impact of Ownership Structure on Dividend Payout Policy of Listed Plantation Companies in Sri Lanka
- Urban Sustainability in North-East India: A Study through the lens of NER-SDG index
- Performance Assessment of Predictive Forecasting Techniques for Enhancing Hospital Supply Chain Efficiency in Healthcare Logistics
- The Fractured Self in Julian Barnes' Postmodern Fiction: Identity Crisis and Deflation in Metroland and the Sense of an Ending
- Impact of Flood on the Employment, Labour Productivity and Migration of Agricultural Labour in North Bihar