www.rsisinternational.org
Page 4691
INTERNATIONAL JOURNAL OF RESEARCH AND INNOVATION IN SOCIAL SCIENCE (IJRISS)
ISSN No. 2454-6186 | DOI: 10.47772/IJRISS | Volume IX Issue X October 2025
The Challenges and Issues in Artificial Intelligence (AI) in Vocal
Performance Education: A Comprehensive Narrative Review
Yap Jin Hin
Sultan Idris Education University, Malaysia
DOI: https://dx.doi.org/10.47772/IJRISS.2025.910000386
Received: 12 October 2025; Accepted: 20 October 2025; Published: 13 November 2025
ABSTRACT
This narrative review examines the challenges that emerge when artificial intelligence enters vocal
performance education, a domain where human expressiveness, cultural tradition, and interpersonal connection
have always been central. While AI offers compelling possibilities like instant feedback and personalized
learning paths, vocal training's inherently expressive and culturally embedded nature raises important
questions about whether these technologies truly serve students and teachers effectively. Rather than simply
cataloguing AI tools, this review takes a critical stance by comparing how different systems actually work,
evaluating their pedagogical value across diverse musical contexts, and acknowledging that most current AI
models reflect Western classical biases. We draw on examples from Indian classical music, Chinese opera, and
Arabic maqam traditions to show how inclusivity matters, not just as an ethical add-on, but as essential to
technical effectiveness. The review also proposes a framework for responsible AI integration that addresses
curriculum design, data governance, and fair assessment practices.
Keywords: Artificial Intelligence, Vocal Performance, Music Education, Pedagogy, Ethics, Cultural Inclusivity,
Machine Learning
INTRODUCTION
Artificial intelligence has entered music education with promises of personalization, efficiency, and
accessibility. In vocal performance specifically, AI systems claim to analyze pitch, tone, and phrasing in real
time, capabilities that sound transformative for students who lack regular access to expert instruction. Yet these
promises deserve careful scrutiny, especially in an art form where emotion, cultural context, and teacher-
student relationships shape learning as much as technical precision.
Previous discussions of AI in music education have often been uncritically optimistic, listing benefits without
examining how these systems actually function or whether they work equally well across different teaching
philosophies and musical traditions. This review takes a different approach. We compare specific AI platforms,
explain their underlying algorithms, and assess their pedagogical effectiveness in contexts beyond Western
classical training.
The timing matters. Most AI vocal systems have been trained primarily on English-language recordings and
Western classical techniques. This means they may struggle with, or completely misunderstand, the microtonal
variations in Indian ragas, the tonal inflections essential to Chinese opera, or the ornamental complexity of
Arabic vocal music. These aren't minor technical problems; they reflect fundamental questions about whose
voices and which traditions count as "correct" in algorithmic assessment. Throughout this review, we examine
how AI systems might be adapted to genuinely serve diverse vocal practices rather than simply imposing one
aesthetic framework on all students.
Methods
We searched Scopus, Web of Science, Google Scholar, CNKI (China), SciELO (Latin America), and RILM
(music research) for relevant literature published from January 2000 through July 2025. Our search terms
www.rsisinternational.org
Page 4692
INTERNATIONAL JOURNAL OF RESEARCH AND INNOVATION IN SOCIAL SCIENCE (IJRISS)
ISSN No. 2454-6186 | DOI: 10.47772/IJRISS | Volume IX Issue X October 2025
included variations on "artificial intelligence," "machine learning," "vocal pedagogy," and "music education,"
combined using AND/OR operators to capture both technical and pedagogical perspectives.
To move beyond English-language sources, we included articles in Mandarin, Spanish, and Arabic, though we
acknowledge that language barriers and database access likely limited our coverage of some regional
scholarship. We prioritized studies that discussed pedagogical implications, ethical dimensions, or cultural
perspectives, not just technical specifications. Case studies and comparative analyses received particular
attention because they ground abstract claims in actual classroom experiences.
Figure 1: Search Strategy for "Challenges and Issues in Artificial Intelligence (AI) in Vocal Performance
Education" using Consensus AI
Figure 1 shows our systematic process: from 1,050 initially identified papers, we removed 563 during
screening and 114 during eligibility assessment (mostly due to irrelevance or lack of pedagogical context).
This left 50 high-quality papers that form the core of our analysis. We developed 21 distinct search queries to
capture ethical, pedagogical, and interdisciplinary dimensions that purely technical searches might miss.
Synthesis Of Findings
What AI Actually Does in Vocal Education
When we talk about "AI in vocal education," we're usually referring to systems that use voice recognition,
deep learning, or intelligent tutoring algorithms to simulate aspects of human teaching. Platforms like
MyVoiceAI, Melodyne Studio, and SingSharp each employ different machine learning architectures,
convolutional neural networks (CNNs) or recurrent neural networks (RNNs), to analyse pitch, timbre, and
phrasing. They provide feedback in real time, which sounds straightforward until you consider what "correct"
singing means across different genres and cultures.
The challenge lies in how these models learn. They're trained on datasets of recorded performances, and their
ability to evaluate "good" singing depends entirely on whose performances they've studied. If the training data
consists primarily of Western classical singers, the system may flag as "errors" the intentional microtonal slides
essential to Indian classical music or the particular throat techniques used in traditional Chinese opera.
Comparing AI Tools: What Works, and for Whom?
Yousician approaches vocal training as a game, using points and levels to maintain engagement, an approach
that may motivate some learners but risks trivializing the expressive depths that make singing meaningful.
Melodyne Studio takes a different path, focusing on detailed pitch correction and spectral analysis. It excels at
www.rsisinternational.org
Page 4693
INTERNATIONAL JOURNAL OF RESEARCH AND INNOVATION IN SOCIAL SCIENCE (IJRISS)
ISSN No. 2454-6186 | DOI: 10.47772/IJRISS | Volume IX Issue X October 2025
technical precision but can't easily capture the emotional nuances that distinguish mechanical accuracy from
genuine musicality.
More encouraging are regional developments like AI-Swara, developed specifically for Indian classical music,
and TuneUp, which serves East Asian vocal training. These systems trained on locally relevant datasets and
consequently handle microtonal variations and culturally specific ornamentation far better than generic
Western-trained models. This isn't just about fairness or representation, though those matter, it's about technical
effectiveness. A system trained exclusively on Western opera will simply fail to understand what a skilled
Indian classical singer is doing, not because the singing lacks quality, but because the AI lacks context.
Technical Limitations That Matter for Teachers
Most AI vocal analysis relies on supervised learning: the system studies many examples of "correct" and
"incorrect" performances (as labelled by human experts) and learns to distinguish between them. This
approach has real limitations. Current systems struggle to capture vocal dynamics, the subtle changes in
volume and intensity that convey emotion. They also have trouble with tonal variation beyond simple pitch
accuracy, and they inherit whatever biases existed in their training data.
Long Short-Term Memory (LSTM) networks have improved how AI handles the temporal flow of vocal
phrases, understanding how one phrase connects to the next, but modelling emotional expression remains
deeply challenging. You can teach an algorithm to recognize when a student sings off-pitch. Teaching it to
recognize when a phrase lacks genuine feeling? That's much harder, and arguably requires the kind of intuitive,
contextual understanding that humans develop through years of listening and performing.
The path forward requires expanding training datasets to include multilingual singing samples, diverse timbral
qualities, and varied expressive traditions. Without this expansion, AI systems will continue to work
reasonably well for some students while fundamentally misunderstanding others.
A Framework for Responsible AI Integration
To address these challenges, we propose the Responsible AI Integration Framework (RAIIF), built on four
principles:
1. Transparency: Developers and educators should clearly document what datasets trained each AI system,
how its algorithms make decisions, and what its limitations are. Students and teachers deserve to know
when they're receiving feedback from a system trained primarily on Western opera or Mandarin pop
music.
2. Equity: Training data must include diverse vocal traditions and multilingual recordings. This isn't
optional. It's technically necessary for systems that will serve diverse student populations.
3. Privacy and Consent: Vo cal recordings contain personal, intimate information. Systems must handle
this data securely, with clear informed consent from all participants, and should minimize data retention.
4. Human Collaboration: Teachers shouldn't be replaced by AI; they should be empowered to use AI as
one tool among many. This means rethinking the teacher's role as a mentor who helps students interpret
and contextualize algorithmic feedback rather than simply accepting it as truth.
This framework offers concrete steps, not just aspirational goals. It asks: What dataset trained this system?
Who labeled the "correct" examples? What cultural contexts does it understand, and which does it miss?
Learning from Cultural Case Studies
Real progress requires genuine engagement with diverse vocal traditions, not superficial gestures toward
"diversity." AI-Swara demonstrates this: it was designed specifically to evaluate Indian raga structures,
incorporating the microtonal pitch movements and ornamental patterns that define this tradition. Similarly,
OperaVoiceAI adapts to the tonal inflections critical in Chinese opera, where pitch carries linguistic meaning
as well as musical expression.
www.rsisinternational.org
Page 4694
INTERNATIONAL JOURNAL OF RESEARCH AND INNOVATION IN SOCIAL SCIENCE (IJRISS)
ISSN No. 2454-6186 | DOI: 10.47772/IJRISS | Volume IX Issue X October 2025
These examples show that technical accuracy and cultural authenticity aren't competing values, they're
connected. When developers work closely with practitioners from specific musical traditions, they create
systems that serve students better because they understand the music better. This approach requires more effort
than training a single "universal" model, but it produces tools that actually work in diverse classrooms.
What We Still Need to Learn
Most research on AI in vocal education describes short-term projects or pilot programs. We need longitudinal
studies that follow students over months or years to understand AI's sustained impact on learning outcomes,
motivation, and artistic identity. Does reliance on AI feedback change how students listen to their own voices?
Does it affect their relationship with their teachers? Does it influence which musical traditions they value or
pursue?
Answering these questions requires collaboration among educators, ethnomusicologists, cognitive scientists,
and AI developers who understand both the technologies and the human experiences they're meant to serve.
We also need continuous feedback loops where student and teacher experiences inform iterative AI design,
ensuring these systems evolve alongside educational needs rather than being deployed as fixed solutions.
Policy Implications
Integrating AI into vocal performance education responsibly requires supportive institutional and governmental
policies. These policies should mandate culturally representative training datasets and ensure teachers
participate meaningfully in AI implementation, not just as users, but as designers and evaluators of these
systems.
Professional development programs should help educators understand how AI systems work, what their
limitations are, and how to interpret algorithmic feedback critically. National curriculum frameworks should
position AI as a supplementary learning tool, never as a replacement for human instruction or judgment.
Funding priorities matter too. Resources should support longitudinal research, international collaborations, and
the development of open-access multilingual datasets. Without public investment, we risk leaving AI
development to commercial interests that may prioritize marketability over pedagogical effectiveness or
cultural inclusivity.
LIMITATIONS AND RECOMMENDATIONS
This review relies on secondary sources, published research about AI systems rather than direct empirical
testing. While we included non-English sources, language barriers and database access undoubtedly limited
our coverage of some regional scholarship, particularly from areas where AI vocal education research is
emerging but not yet widely published in indexed journals.
Future research should test AI platforms directly in diverse classroom settings, comparing learning outcomes
and artistic development across different systems and cultural contexts. This work requires partnerships with
linguists who understand tonal languages, ethnomusicologists who know diverse vocal traditions, and ethicists
who can help navigate questions about data use, algorithmic bias, and the changing nature of musical authority.
CONCLUSION
AI holds real potential for vocal performance education, but realizing that potential requires critical
engagement with its technological limitations, cultural assumptions, and ethical implications. This review has
moved beyond describing what AI tools exist to examining how they work, comparing their effectiveness
across contexts, and proposing a framework for responsible implementation.
The Responsible AI Integration Framework (RAIIF) offers actionable guidelines for educators and developers
working to integrate these technologies thoughtfully. Expanding multilingual datasets, conducting longitudinal
www.rsisinternational.org
Page 4695
INTERNATIONAL JOURNAL OF RESEARCH AND INNOVATION IN SOCIAL SCIENCE (IJRISS)
ISSN No. 2454-6186 | DOI: 10.47772/IJRISS | Volume IX Issue X October 2025
studies, and involving diverse practitioners in system design will strengthen both the technical capabilities and
ethical foundation of AI in this field.
Ultimately, the question isn't whether AI can help students learn to sing. It clearly can, in some contexts and for
some purposes. The real questions are: Help which students? In what traditions? With what understanding of
musical expression? And at what cost to the human relationships and cultural knowledge that have always been
central to vocal education? By engaging these questions seriously, we can develop AI systems that enhance
rather than diminish the artistry and humanity at the heart of vocal learning.
REFERENCES
1. Ahmed, S., & Cho, H. (2021). Cultural representation in artificial intelligence datasets for music
education. Journal of Music Technology and Education, 14(3), 211229.
https://doi.org/10.1386/jmte_00045
2. Bengio, Y., Goodfellow, I., & Courville, A. (2017). Deep learning. MIT Press.
3. Brown, A. R., & Dillon, S. (2020). Music, technology, and education: Critical perspectives. Routledge.
4. Cambria, E., & White, B. (2014). Jumping NLP curves: A review of natural language processing
research. IEEE Computational Intelligence Magazine, 9(2), 4857.
https://doi.org/10.1109/MCI.2014.2307227
5. Chen, L., & Xu, Y. (2022). AI-assisted vocal pedagogy: New frontiers in artistic learning. Music
Education Research International, 15(1), 5570.
6. Creech, A., & Gaunt, H. (2018). Musicians as teachers: Professional agency and identity. Routledge.
7. Das, P., & Rao, V. (2020). AI-Swara: A computational model for Indian classical vocal assessment.
Journal of the Acoustical Society of India, 48(2), 8798.
8. Green, B. N., Johnson, C. D., & Adams, A. (2006). Writing narrative literature reviews for peer-
reviewed journals: Secrets of the trade. Journal of Chiropractic Medicine, 5(3), 101117.
9. Huang, P., & Li, S. (2021). OperaVoiceAI: Adapting AI systems for tonal analysis in Chinese opera.
Asian Musicology Journal, 33(2), 144160.
10. Kim, J., Park, H., & Lee, D. (2021). Adaptive feedback systems in AI-assisted vocal training. Computers
& Education, 175, 104324.
11. Kowalewski, M., & Kruse-Weber, S. (2023). Artificial intelligence in higher music education: A review
and outlook. Arts Education Policy Review, 124(1), 1226.
12. Lopez, C., Smith, E., & Tan, R. (2023). Automated assessment in music education: Pedagogical
opportunities and challenges. British Journal of Music Education, 40(2), 189204.
13. Magill, J. (2020). Vocal pedagogy and technology: Balancing art and science. Voice and Speech Review,
14(3), 222236.
14. Moumtzidou, A., & Papadopoulos, S. (2019). AI-driven sound analysis for performance education.
Journal of Intelligent Systems, 28(4), 569582.
15. Nassif, A. B., & Shahin, I. (2021). Speech and voice recognition using deep learning: A survey. IEEE
Access, 9, 9994399973.
16. Park, Y., & Lee, J. (2020). Artificial intelligence as a supplementary tutor in music performance
education. Music Education Research, 22(4), 365382.
17. Pachet, F., & Roy, P. (2014). Musical interactions with artificial intelligence: Creative partnerships.
Computer Music Journal, 38(3), 2132.
18. Sarkar, S., & Bhatia, G. (2021). Writing and appraising narrative reviews. Journal of Clinical and
Scientific Research, 10(3), 169172.
19. Shen, W., & Chen, J. (2022). AI for cultural music preservation: Challenges and opportunities.
Ethnomusicology Forum, 31(1), 90107.
20. Tan, R., & Wong, L. (2019). Gamification in AI-based vocal learning environments. Educational
Technology Research and Development, 67(6), 14231441.
21. Turnbull, D., Chugh, R., & Luck, J. (2023). Systematic-narrative hybrid literature review: A strategy for
integrating concise methodology into a manuscript. Social Sciences & Humanities Open, 7, 100381.
22. Vasanth, S., & Sridhar, R. (2020). Machine learning for expressive singing synthesis. Journal of the
Audio Engineering Society, 68(10), 812823.
www.rsisinternational.org
Page 4696
INTERNATIONAL JOURNAL OF RESEARCH AND INNOVATION IN SOCIAL SCIENCE (IJRISS)
ISSN No. 2454-6186 | DOI: 10.47772/IJRISS | Volume IX Issue X October 2025
23. Wang, X., & Oard, D. W. (2018). Real-time pitch correction and adaptive learning in AI-based music
applications. International Journal of Artificial Intelligence in Education, 28(3), 411429.
24. Zhao, L., & Ng, S. (2020). Expressive nuance and emotional modeling in AI-assisted vocal systems.
Frontiers in Psychology, 11, 1452.
25. Zhou, Q., & Lin, H. (2021). Cultural adaptation in machine learning-based vocal analysis. International
Journal of Multicultural Education, 23(2), 87103.