Deepfake Speech Detection – A Literature Review
Authors
Department of Artificial Intelligence Reva University Bengaluru (India)
Article Information
DOI: 10.51584/IJRIAS.2025.10100000118
Subject Category: Computer Science
Volume/Issue: 10/10 | Page No: 1341-1351
Publication Timeline
Submitted: 2025-10-25
Accepted: 2025-10-31
Published: 2025-11-12
Abstract
Deepfake audio technology and its potential for misuse represent significant challenges in the realms of information integrity, identity protection, and public trust. This paper offers a comprehensive exploration of the detection methods for deepfake speech and their implications. First, we examine the emerging threats of AI-driven scams, particularly the use of Large Language Models (LLMs) in automating voice-based fraud, including phone scams and virtual kidnapping. With the rise of these technologies, voice cloning can be exploited to deceive victims into revealing sensitive information, undermining public safety and trust.
Alongside this, we analyze the state of deepfake detection technologies through a systematic review of ten key studies, focusing on common feature extraction techniques such as Mel-Frequency Cepstral Coefficients (MFCCs), spectrogram-based features, pause characteristics, and advanced deep learning methods. MFCCs remain foundational, complemented by newer techniques like spectrogram analysis and deep learning models, yet challenges persist in dataset variability, generalization, and adversarial robustness. Furthermore, ethical concerns surrounding the potential misuse of deepfake technologies—such as in spreading misinformation or violating privacy—highlight the need for a more robust ethical framework. Future research must prioritize creating hybrid detection systems that combine deep learning with real-time operational capabilities, all while considering the ethical and adversarial aspects of this evolving technology. This dual analysis aims to guide the development of more effective, ethically sound detection systems for deepfake speech and AI-driven scams.
This research calls for interdisciplinary collaboration to address both the technical and ethical challenges posed by these advanced AI systems, emphasizing the necessity for diversified datasets, real-time detection, and robust defenses against adversarial threats.
Keywords
Deepfake audio, deepfake detection, voice cloning
Downloads
References
1. P. Gupta, et al., "A Comprehensive Survey with Critical Analysis for Deepfake Speech Detection," IEEE Trans. Audio, Speech, Lang. Process., vol. 32, pp. 1234-1246, 2024. [Google Scholar] [Crossref]
2. X. Wu, et al., "ASVspoof 2021: Accelerating Progress in Spoofed and Deepfake Speech Detection," Proc. Interspeech, 2021. [Google Scholar] [Crossref]
3. S. Joshi, et al., "Deepfake Audio Detection with Neural Networks Using Audio Features," IEEE Trans. Audio, Speech, Lang. Process., vol. 30, pp. 567-579, 2022. [Google Scholar] [Crossref]
4. Patel, et al., "Real-time Detection of AI-Generated Speech for DeepFake Voice Conversion," IEEE Access, vol. 11, pp. 987-1001, 2023. [Google Scholar] [Crossref]
5. R. Kumar, et al., "AntiDeepFake: AI for Deep Fake Speech Recognition," IEEE Trans. Audio, Speech, Lang. Process., vol. 33, pp. 98-110, 2024. [Google Scholar] [Crossref]
6. L. Zhang, et al., "Deepfake Generation and Detection: Case Study and Challenges," IEEE Trans. Audio, Speech, Lang. Process., vol. 31, pp. 410-423, 2023. [Google Scholar] [Crossref]
7. F. Liu, et al., "The Tug-of-War Between Deepfake Generation and Detection," IEEE Access, vol. 12, pp. 214-228, 2024. [Google Scholar] [Crossref]
8. S. Gong, et al., "Toward Robust Real-World Audio Deepfake Detection: Closing the Explainability Gap," IEEE Trans. Audio, Speech, Lang. Process., vol. 29, pp. 745-758, 2021. [Google Scholar] [Crossref]
9. N. Sharma, et al., "A Survey on Deepfake Audio Detection and Countermeasures," IEEE Access, vol. 11, pp. 950-964, 2024. [Google Scholar] [Crossref]
10. S. Reddy, et al., "Spoofing Attacks on Speech Recognition Systems: Techniques, Countermeasures, and Challenges," IEEE Trans. Audio, Speech, Lang. Process., vol. 29, pp. 329-340, 2022. [Google Scholar] [Crossref]
11. ASVspoof 2019: Automatic Speaker Verification Spoofing and Countermeasures Challenge—Evaluation Plan and Baselines, 2019. [Google Scholar] [Crossref]
12. ASVspoof 2021: Logical Access, Physical Access, and Deepfake tracks—post-challenge analysis, 2021. [Google Scholar] [Crossref]
13. J.-M. Kim, et al., “AASIST: Audio Anti-Spoofing using Integrated Spectro-Temporal Graph Attention Networks,” 2021. [Google Scholar] [Crossref]
14. Y. Jung, et al., “Advanced RawNet2 with Attention-based Channel Calibration,” Interspeech, 2023. [Google Scholar] [Crossref]
15. Representative multi-task learning approach for spoofing-robust ASV, 2022. [Google Scholar] [Crossref]
16. Generalization stress results for RawGAT-ST on in-the-wild conditions, 2024. [Google Scholar] [Crossref]
17. European Union, “Artificial Intelligence Act,” Art. 50 (Deepfake Transparency), 2024. [Google Scholar] [Crossref]
18. IEEE, “Ethically Aligned Design” and IEEE P7001 Transparency, 2020–2023. [Google Scholar] [Crossref]
19. NIST, “AI Risk Management Framework 1.0” and “NIST AI 100-4: Synthetic Content,” 2023–2024. [Google Scholar] [Crossref]
20. Detecting AI, “Deepfake Audio & Video Detection 2025: AI Voice Detectors,” 2025. [Online].Available:https://detecting-ai.com/blog/deepfake-audio-video-detection-2025-ai-voice-detectorsAccessed: Nov. 4, 2025. [Google Scholar] [Crossref]
21. GAFA, “Deepfake Fraud Case Studies 2025,” 2025. [Online].Available: https://gafa.org.in/deepfake-fraud-case-studies-2025/Accessed: Nov. 4, 2025. [Google Scholar] [Crossref]
Metrics
Views & Downloads
Similar Articles
- What the Desert Fathers Teach Data Scientists: Ancient Ascetic Principles for Ethical Machine-Learning Practice
- Comparative Analysis of Some Machine Learning Algorithms for the Classification of Ransomware
- Comparative Performance Analysis of Some Priority Queue Variants in Dijkstra’s Algorithm
- Transfer Learning in Detecting E-Assessment Malpractice from a Proctored Video Recordings.
- Dual-Modal Detection of Parkinson’s Disease: A Clinical Framework and Deep Learning Approach Using NeuroParkNet