Raters' Rating Quality in Assessing Students' Assignment: An Application of Multi-Facet Rasch Measurement

Authors

Chan Kuan Loong

Jabatan Ilmu Pendidikan, Institut Pendidikan Guru Kampus Tengku Ampuan Afzan (Malaysia)

Abdul Jalil Mohamad

Jabatan Ilmu Pendidikan, Institut Pendidikan Guru Kampus Tengku Ampuan Afzan (Malaysia)

Mohd Syafiq bin Zainuddin

Jabatan Ilmu Pendidikan, Institut Pendidikan Guru Kampus Tengku Ampuan Afzan (Malaysia)

Navindran a/l Ramanujan

Jabatan Ilmu Pendidikan, Institut Pendidikan Guru Kampus Tengku Ampuan Afzan (Malaysia)

Ikmal Hisham bin Mat Idera

Jabatan Ilmu Pendidikan, Institut Pendidikan Guru Kampus Tengku Ampuan Afzan (Malaysia)

Article Information

DOI: 10.47772/IJRISS.2025.910000193

Subject Category: Social science

Volume/Issue: 9/10 | Page No: 2330-2342

Publication Timeline

Submitted: 2025-10-07

Accepted: 2025-10-14

Published: 2025-11-07

Abstract

Assessing students' assignments is essential as it reflects students' understanding and achievements. This study evaluates the marking quality among lecturers at the Teacher Education Institute (IPG) using the Multi-Facet Rasch Measurement (MFRM) model. Two hundred thirty-two students from the Postgraduate Diploma in Education (PDPP) program submitted written assignments, which were assessed by experienced lecturers. The analysis was conducted using FACETS 4.1.1 software, involving three main facets: candidates, raters, and assessment criteria. The findings indicate that the instrument used is valid in terms of construct and meets unidimensionality requirements. The reliability value of the raters was high (0.81), and the rater separation index exceeded the set threshold (2.09), indicating stability in marks given by lecturers. However, two raters were identified as showing misfit and overfit patterns respectively, suggesting inaccuracies in scoring. The Wright map and unexpected response analysis also revealed differences in the severity among raters and potential bias. These findings are valuable for the IPG in improving monitoring of inter-rater reliability and marking consistency. This study also shows that MFRM can provide comprehensive information and contribute to understanding the analysis of assessor consistency with quantitative evidence. MFRM is a suitable alternative model to overcome the limitations of the Classical Test Theory (CTT) statistical model, especially in analyses involving multiple raters.

Keywords

Rating quality, lecturers, MFRM

Downloads

References

1. M. H. Mazarul Hasan, Z. Norazimah, and M. Suhaila, “Readiness level of primary school teachers in Klang district, Selangor in the implementation of in-class assessment from the aspect of knowledge,” International Journal of Modern Education, vol. 3, no. 9, pp. 1–8, 2021. [Google Scholar] [Crossref]

2. M. E. @ E. Mohd Matore and M. F. Mohd Noh, Kualiti Penandaan Guru Dalam Pentaksiran Pendidikan. Penerbit UKM, 2023. [Google Scholar] [Crossref]

3. O. Ribut, Y. Pradana, A. Mashuri, L. S. Nirawati, ) Stkip, and M. Ngawi, “Pengaruh Model Pembelajaran Kooperatif Think Pair Share (TPS) Menggunakan Assessment For Learning Pada Prestasi Siswa Sekolah Menengah Pertama,” Jurnal Karya Pendidikan Matematika, vol. 6, 2019, [Online]. Available: http://jurnal.unimus.ac.id/index.php/JPMat/index [Google Scholar] [Crossref]

4. N. Hanafi, S. Farmasari, M. Mahyuni, M. Amin, and Y. B. Lestari, “Pelatihan Pengembangan Model Penilaian Otentik (Authentic Assessment) pada Pembelajaran Bahasa Inggris Sekolah Dasar bagi Guru-Guru Bahasa Inggris Sekolah Dasar Di Kabupaten Lombok Barat,” Jurnal Pengabdian Magister Pendidikan IPA, vol. 4, no. 2, Jul. 2021, doi: 10.29303/jpmpi.v4i2.855. [Google Scholar] [Crossref]

5. G. Engelhard Jr and S. Wind, Invariant measurement with raters and rating scales: Rasch models for rater-mediated assessments. Routledge, 2018. [Google Scholar] [Crossref]

6. S. A. Wind, “Examining the Impacts of Rater Effects in Performance Assessments,” Appl Psychol Meas, vol. 43, no. 2, pp. 159–171, Mar. 2019, doi: 10.1177/0146621618789391. [Google Scholar] [Crossref]

7. S. Sutrio, H. Sahidu, A. Harjono, I. W. Gunada, and H. Hikmawati, “Pelatihan Pembelajaran IPA Berbasis Inkuiri Berbantuan KIT Bagi Guru-Guru SD Di Kota Mataram,” Jurnal Pengabdian Masyarakat Sains Indonesia, vol. 2, no. 2, Nov. 2020, doi: 10.29303/jpmsi.v2i2.80. [Google Scholar] [Crossref]

8. D. Beutel, L. Adie, and M. Lloyd, “Assessment moderation in an Australian context: processes, practices, and challenges,” Teaching in Higher Education, vol. 22, no. 1, pp. 1–14, Jan. 2017, doi: 10.1080/13562517.2016.1213232. [Google Scholar] [Crossref]

9. J. Mason and L. D. Roberts, “Consensus moderation: the voices of expert academics,” Assess Eval High Educ, vol. 48, no. 7, pp. 926–937, 2023. [Google Scholar] [Crossref]

10. Y. Elizabeth Patras et al., “Meningkatkan Kualitas Pendidikan Melalui Kebijakan Manajemen Berbasis Sekolah Dan Tantangannya,” Jurnal Manajemen Pendidikan, vol. 7, no. 2, 2019. [Google Scholar] [Crossref]

11. D. Amir and A. Basit, “Kompetensi Pedagogik dan Profesional Mahasiswa Jurusan PAI pada Pelaksanaan PPL Tahun Akademik 2017/2018,” 2018. [Google Scholar] [Crossref]

12. S. V. Makwana, “Use of Grading System in Education,” RESEARCH REVIEW International Journal of Multidisciplinary, vol. 9, no. 4, pp. 260–266, Apr. 2024, doi: 10.31305/rrijm.2024.v09.n04.032. [Google Scholar] [Crossref]

13. T. A. Chowdhury, “International Journal of Linguistics and Translation Studies,” International Journal of Linguistics and Translation Studies, vol. 1, no. 1, pp. 32–1, 2020, doi: https://doi.org/10.36892/IJLTS.V1I1.14. [Google Scholar] [Crossref]

14. N. Abd Rahman, S. E. Mokshein, and H. Ahmad, “Validity and Reliability of Students’ Mathematical Process Rubric (Prom3) based on many-Facet Rasch Model (MRFM).,” International Journal of Academic Research in Business and Social Sciences, vol. 11, no. 2, Feb. 2021, doi: 10.6007/ijarbss/v11-i2/9207. [Google Scholar] [Crossref]

15. N. Heidari, N. Ghanbari, and A. Abbasi, “Raters’ perceptions of rating scales criteria and its effect on the process and outcome of their rating,” Language Testing in Asia, vol. 12, no. 1, Dec. 2022, doi: 10.1186/s40468-022-00168-3. [Google Scholar] [Crossref]

16. Y. Du, B. D. Wright, and W. L. Brown, “Differential Facet Functioning Detection in Direct Writing Assessment,” in Paper presented at the Annual Conference of the American Educational Research Association, Apr. 1996. [Google Scholar] [Crossref]

17. P. Sureeyatanapas, P. Sureeyatanapas, U. Panitanarak, J. Kraisriwattana, P. Sarootyanapat, and D. O’Connell, “The analysis of marking reliability through the approach of gauge repeatability and reproducibility (GR&R) study: a case of English-speaking test,” Language Testing in Asia, vol. 14, no. 1, Dec. 2024, doi: 10.1186/s40468-023-00271-z. [Google Scholar] [Crossref]

18. M. T. Braverman and M. E. Arnold, “An evaluator’s balancing act: Making decisions about methodological rigor,” New Dir Eval, vol. 2008, no. 120, pp. 71–86, 2008, doi: 10.1002/ev.277. [Google Scholar] [Crossref]

19. J. Liu and L. Xie, “Examining Rater Effects in a WDCT Pragmatics Test,” Iranian Journal of Language Testing, vol. 4, no. 1, p. 50, 2014. [Google Scholar] [Crossref]

20. C. Myford and E. W. Wolfe, “Detecting and Measuring Rater Effects Using Many-Facet Rasch Measurement: Part II,” 2004. [Google Scholar] [Crossref]

21. S. A. Wind and Y. Ge, “Detecting Rater Biases in Sparse Rater-Mediated Assessment Networks,” Educ Psychol Meas, vol. 81, no. 5, pp. 996–1022, Oct. 2021, doi: 10.1177/0013164420988108. [Google Scholar] [Crossref]

22. T. G. Bond and C. M. Fox, Applying the Rasch model: Fundamental measurement in the human sciences. Psychology Press, 2013. [Google Scholar] [Crossref]

23. J. M. Linacre, “A user’s guide to WINSTEPS® MINISTEP: Rasch-model computer programs,” Chicago: MESA Press., p. 719, 2016. [Google Scholar] [Crossref]

24. G. Zhang, “Research on the application of multifaceted Rasch model analysis software facets in English test,” in International Conference on Mechanisms and Robotics (ICMAR 2022), SPIE, 2022, pp. 760–765. [Google Scholar] [Crossref]

25. C. Elder, G. Barkhuizen, U. Knoch, and J. von Randow, “Evaluating rater responses to an online training program for L2 writing assessment,” Language Testing, vol. 24, no. 1, pp. 37–64, Jan. 2007, doi: 10.1177/0265532207071511. [Google Scholar] [Crossref]

26. N. Khabbazbashi and E. D. Galaczi, “A comparison of holistic, analytic, and part marking models in speaking assessment,” Language Testing, vol. 37, no. 3, pp. 333–360, Jul. 2020, doi: 10.1177/0265532219898635. [Google Scholar] [Crossref]

27. B. Sumintono, Aplikasi pemodelan rasch pada assessment pendidikan. Penerbit Trim Komunikata, 2015. [Google Scholar] [Crossref]

28. H. Y. Huang, “Modeling Rating Order Effects Under Item Response Theory Models for Rater-Mediated Assessments,” Appl Psychol Meas, vol. 47, no. 4, pp. 312–327, Jun. 2023, doi: 10.1177/01466216231174566. [Google Scholar] [Crossref]

29. B. Sumintono and W. Widhiarso, Aplikasi model rasch: Untuk penelitian ilmu-ilmu sosial. Trim Komunikata Publishing House, 2013. [Google Scholar] [Crossref]

30. Linacre, “Standard errors: means, measures, origins and anchor values,” Rasch Measurement Transactions, vol. 19, no. 3, p. 1030, 2005. [Google Scholar] [Crossref]

31. C. Van Zile-Tamsen, “Using Rasch Analysis to Inform Rating Scale Development,” Res High Educ, vol. 58, no. 8, pp. 922–933, Dec. 2017, doi: 10.1007/s11162-017-9448-0. [Google Scholar] [Crossref]

32. H. Misbach and B. Sumintono, “Pengembangan dan Validasi Instrumen ‘Persepsi Siswa tehadap Karakter Moral Guru’ di Indonesia dengan Model Rasch 1,” PROCEEDING Seminar Nasional Psikometri, vol. 148162, May 2014. [Google Scholar] [Crossref]

33. B. C. Wesolowski and S. A. Wind, “Pedagogical Considerations for Examining Rater Variability in Rater-Mediated Assessments: A Three-Model Framework,” J Educ Meas, vol. 56, no. 3, pp. 521–546, Sep. 2019, doi: 10.1111/jedm.12224. [Google Scholar] [Crossref]

34. M. Wu, “Some IRT-based analyses for interpreting rater effects,” 2017. [Google Scholar] [Crossref]

35. S. M. Wu and S. Tan, “Managing rater effects through the use of FACETS analysis: the case of a university placement test,” Higher Education Research and Development, vol. 35, no. 2, pp. 380–394, Mar. 2016, doi: 10.1080/07294360.2015.1087381. [Google Scholar] [Crossref]

36. J. M. Linacre, A User’s guide to WINSTEPS® Rasch-model computer programs: Program manual 4.4. 6. 2019. [Google Scholar] [Crossref]

37. D. Erguvan and B. Aksu Dunya, “Analyzing rater severity in a freshman composition course using many facet Rasch measurement,” Language Testing in Asia, vol. 10, no. 1, Dec. 2020, doi: 10.1186/s40468-020-0098-3. [Google Scholar] [Crossref]

38. T. Eckes, “Examining Rater Effects in TestDaF Writing and Speaking Performance Assessments: A Many-Facet Rasch Analysis,” Lang Assess Q, vol. 2, no. 3, pp. 197–221, Oct. 2005, doi: 10.1207/s15434311laq0203_2. [Google Scholar] [Crossref]

39. T. Eckes, Introduction to many-facet Rasch measurement. Peter Lang, 2023. [Google Scholar] [Crossref]

40. C. Myford and E. W. Wolfe, “Detecting and Measuring Rater Effects Using Many-Facet Rasch Measurement: Part I,” 2003. [Online]. Available: https://www.researchgate.net/publication/9069043 [Google Scholar] [Crossref]

41. L. H. Asbulah, M. A. Lubis, A. Aladdin, and M. Sahrim, “KESAHAN DAN KEBOLEHPERCAYAAN INSTRUMEN PENGETAHUAN KOLOKASI BAHASA ARAB IPT (i-KAC IPT) MENGGUNAKAN MODEL PENGUKURAN RASCH,” ASEAN Comparative Education Research Journal on Islam and Civilization (ACER-J), vol. 2, no. 1, pp. 97–106, 2018. [Google Scholar] [Crossref]

42. E. R. Lai, E. W. Wolfe, and D. H. Vickers, “Halo Effect s and Analytic Scoring: A Summary of Two Empirical Studies Research Report,” 2012. [Online]. Available: http://www.pearsonassessments.com/research. [Google Scholar] [Crossref]

43. Y. A. Rahman, F. Apriyanti, and R. A. Nurdini, “Rater Severity/Leniency and Bias in EFL Students’ Composition Using Many-Facet Rasch Measurement (MFRM),” Scope : Journal of English Language Teaching, vol. 8, no. 1, p. 258, Oct. 2023, doi: 10.30998/scope.v8i1.19432. [Google Scholar] [Crossref]

44. Z. Mohd Zabidi, B. Sumintono, and Z. Abdullah, “Enhancing analytic rigor in qualitative analysis: developing and testing code scheme using Many Facet Rasch Model,” Qual Quant, vol. 56, no. 2, pp. 713–727, Apr. 2022, doi: 10.1007/s11135-021-01152-4. [Google Scholar] [Crossref]

45. R. Mohamat, B. Sumintono, and H. S. Abd Hamid, “Raters’ Assessment Quality in Measuring Teachers’ Competency in Classroom Assessment: Application of Many Facet Rasch Model,” Asian Journal of Assessment in Teaching and Learning, vol. 12, no. 2, pp. 71–88, Nov. 2022, doi: 10.37134/ajatel.vol12.2.7.2022. [Google Scholar] [Crossref]

Metrics

Views & Downloads

Similar Articles