Evaluating Visual Accuracy in AI-Generated Images of Malaysian-themed Icons

Authors

Azahar Harun

Graphic Design Department, Faculty of Art and Design, Universiti Teknologi MARA Cawangan Melaka, Melaka (Malaysia)

Mohd Zaki Mohd Fadil

Graphic Design Department, Faculty of Art and Design, Universiti Teknologi MARA Cawangan Melaka, Melaka (Malaysia)

Tengku Shahril Norzaimi Tengku Hariffadzillah

Graphic Design Department, Faculty of Art and Design, Universiti Teknologi MARA Cawangan Melaka, Melaka (Malaysia)

Article Information

DOI: 10.47772/IJRISS.2026.10100164

Subject Category: Social science

Volume/Issue: 10/1 | Page No: 2059-2072

Publication Timeline

Submitted: 2026-01-14

Accepted: 2026-01-19

Published: 2026-01-28

Abstract

Generative AI models are becoming widely accessible, enabling users across diverse backgrounds to create unprecedented artwork. However, this accessibility raises questions regarding the accuracy with which these models portray real-world subjects. Therefore, this paper examines three prominent generative AI models—Midjourney, DALL-E, and Stable Diffusion—to evaluate their efficacy in generating images of specific Malaysian-themed icons. Utilizing simple text prompts, the research phase was rigorously recorded and evaluated by an expert panel based on the Visual Appeal Rating Scale (VARS), encompassing eight criteria: Reliability, Consistency, Credibility, Professionalism, Aesthetics, Artistry, Harmony, and Balance. The results of the study indicate notable differences in model performance depending upon subject complexity. Midjourney emerged as the preeminent leader (Overall Mean: 3.25), exhibiting remarkable skill in culinary portrayal, attaining "Near Perfect" expert agreement on the aesthetics of the Nasi Lemak images. Stable Diffusion achieved a close second place (Overall Mean: 3.23), demonstrating proficiency in managing intricate structural geometry (Landmarks) and portraiture; yet, its elevated scores frequently coincided with "Slight" agreement, signifying considerable subjectivity in its technical performance. DALL-E was positioned third as a generalist model, yielding balanced albeit frequently contentious outcomes among specialists. A significant "Cultural Accuracy Gap" was identified across all models, wherein the representation of particular cultural icons (Politician) and intricate architecture (Landmark) was considerably more difficult than that of broad subjects (Food). DALL-E demonstrated significant inability in depicting the Malaysian politician, due to ethical concern. The study indicates that the existing generative AI models are specialized rather than universal; achieving high visual fidelity necessitates the deliberate selection of the model most appropriate for the specific aesthetic or structural requirements of the assignment.

Keywords

Creative arts, Digital images, Generative AI models, Inter-Rater Reliability

Downloads

References

1. Anyoha, R. (2017, August 28). The history of artificial intelligence. Science in the News. Harvard University. https://sitn.hms.harvard.edu/flash/2017/history-artificial-intelligence [Google Scholar] [Crossref]

2. Califano, G., & Spence, C. (2024). Assessing the visual appeal of real/AI-generated food images. Food Quality and Preference, 116, 105149. https://doi.org/10.1016/j.foodqual.2024.105149 [Google Scholar] [Crossref]

3. Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20(1), 37–46. https://doi.org/10.1177/001316446002000104 [Google Scholar] [Crossref]

4. Dehouche, N., & Dehouche, K. (2023). What’s in a text-to-image prompt? The potential of stable diffusion in visual arts education. Heliyon, 9(6), e16757. https://doi.org/10.1016/j.heliyon.2023.e16757 [Google Scholar] [Crossref]

5. Hankey, A. (2021). Kasparov versus Deep Blue: An illustration of the Lucas–Gödelian argument. Cosmos and History: The Journal of Natural and Social Philosophy, 17(3), 60–67. https://www.cosmosandhistory.org/index.php/journal/article/view/989 [Google Scholar] [Crossref]

6. Krippendorff, K. (1970). Bivariate agreement coefficients for reliability of data. Sociological Methodology, 2, 139–150. https://doi.org/10.2307/270769 [Google Scholar] [Crossref]

7. Lavie, T., & Tractinsky, N. (2004). Assessing dimensions of perceived visual aesthetics of web sites. Int. J. Hum. Comput. Stud., 60, 269-298. [Google Scholar] [Crossref]

8. Mazzone, M., & Elgammal, A. (2019). Art, creativity, and the potential of artificial intelligence. Arts, 8(1), 26. https://doi.org/10.3390/arts8010026 [Google Scholar] [Crossref]

9. Malayian Tourism Promotion Board (n.d). Famous Architectural Landmarks In Malaysia. https://www.malaysia.travel/explore/petronas-twin-tower#:~:text=The%20Petronas%20Twin%20Towers%20are%20a%20famous,Park**%20*%20**Aquaria%20KLCC**%20*%20**Kinokuniya%20Bookstore** [Google Scholar] [Crossref]

10. Messer, U. (2024). Co-creating art with generative artificial intelligence: Implications for artworks and artists. Computers in Human Behavior: Artificial Humans, 2(1), 100056. https://doi.org/10.1016/j.chbah.2023.100056 [Google Scholar] [Crossref]

11. McHugh, M. L. (2012). Interrater reliability: The kappa statistic. Biochem Med (Zagreb), 22(3), 276-282. PMID: 23092060; PMCID: PMC3900052. [Google Scholar] [Crossref]

12. Newton, A., & Dhole, K. (2023). Is AI art another industrial revolution in the making? arXiv. https://doi.org/10.48550/arxiv.2301.05133 [Google Scholar] [Crossref]

13. Nolan, B. (2023, January 15). This man used AI to write and illustrate a children’s book in one weekend. He wasn’t prepared for the backlash. Business Insider. https://www.businessinsider.com/chatgpt-midjourney-ai-write-illustrate-childrens-book-one-weekend-alice-2023-1 [Google Scholar] [Crossref]

14. Ramesh, A., Pavlov, M., Goh, G., Gray, S., Voss, C., Radford, A., Chen, M., & Sutskever, I. (2021). Zero-shot text-to-image generation. arXiv. https://arxiv.org/abs/2102.12092 [Google Scholar] [Crossref]

15. Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language Models are Unsupervised Multitask Learners. [Google Scholar] [Crossref]

16. Rombach, R., Blattmann, A., Lorenz, D., Esser, P., & Ommer, B. (2022). High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 10684–10695). IEEE. https://doi.org/10.1109/CVPR52688.2022.01042 [Google Scholar] [Crossref]

17. Srinivasan, R. (2021). Quantifying confounding bias in generative art: A case study. arXiv. https://doi.org/10.48550/arxiv.2102.11957 [Google Scholar] [Crossref]

18. Smith, A., Schroeder, H., Epstein, Z., Cook, M., Colton, S., & Lippman, A. (2023). Trash to treasure: Using text-to-image models to inform the design of physical artefacts. arXiv. https://doi.org/10.48550/arXiv.2302.00561 [Google Scholar] [Crossref]

19. Stemler, S. E. (2004). A comparison of consensus, consistency, and measurement approaches to estimating interrater reliability. Practical Assessment, Research, and Evaluation, 9(1), 4. [Google Scholar] [Crossref]

20. Tay, L., & Jebb, A. T. (2016). Scale development. In S. Rogelberg (Ed.), The SAGE encyclopedia of industrial and organizational psychology (2nd ed., Vol. 4, pp. 1365–1370). Sage. [Google Scholar] [Crossref]

21. The Sun Daily. (2024, November 10). Netizens outraged by Brickfields PDRM's AI Merdeka billboard depicting 3 KLCC towers. The Sun Daily. https://thesun.my/style-life/going-viral/netizens-outraged-by-brickfields-pdrm-s-ai-merdeka-billboard-depicting-3-klcc-towers-AI12837403 [Google Scholar] [Crossref]

22. YouGov. (2020). Malaysia’s most admired. https://yougov.com/articles/32284-malaysias-most-admired [Google Scholar] [Crossref]

23. Yeoh, P. (2024, September 5). Iconic dishes: Nasi lemak, the quintessential Malay breakfast. MICHELIN Guide. https://guide.michelin.com/my/en/article/features/what-is-nasi-lemak [Google Scholar] [Crossref]

Metrics

Views & Downloads

Similar Articles