AI & ML Enabled Video Analysis and Interpretation
Authors
B. Tech (CSE) -Final Year Student, Dept Computer Science & Engineering, IIMT College of Engineering, Greater Noida (India)
B. Tech (CSE) -Final Year Student, Dept Computer Science & Engineering, IIMT College of Engineering, Greater Noida (India)
B. Tech (CSE) -Final Year Student, Dept Computer Science & Engineering, IIMT College of Engineering, Greater Noida (India)
B. Tech (CSE) -Final Year Student, Dept Computer Science & Engineering, IIMT College of Engineering, Greater Noida (India)
Project Supervisor, Dept. of Computer Science & Engineering, IIMT College of Engineering, Greater Noida,, Greater Noida, UP (India)
Project Supervisor, Dept. of Computer Science & Engineering, IIMT College of Engineering, Greater Noida,, Greater Noida, UP (India)
Article Information
DOI: 10.51584/IJRIAS.2025.10120067
Subject Category: Computer Science
Volume/Issue: 10/12 | Page No: 801-809
Publication Timeline
Submitted: 2025-12-24
Accepted: 2025-12-29
Published: 2026-01-16
Abstract
With video content absolutely everywhere these days—on learning platforms, in business settings, across social media—trying to analyze it all by hand has become practically impossible. Our paper describes a framework we built that uses AI and machine learning to make understanding videos much simpler, whether you're uploading your own footage or just sharing a link to something online.
Here's how it works: the system examines what's actually happening on screen while also listening to the audio, then brings everything together into summaries that actually make sense. We're using a Transformer-based model that's really good at figuring out how different moments in a video relate to each other and what they mean in context. After you get your summary, there's also a lightweight language model that lets you have an actual conversation about what you watched—you can ask questions and get answers that show a real understanding of the content.
Keywords
Video Analysis, Video Summarization, Artificial Intelligence, Machine Learning
Downloads
References
1. E. Apostolidis, E. Adamantidou, A. I. Metsai, V. Mezaris, and I. Patras, “Video summarization using deep neural networks: A survey,” IEEE Transactions on Neural Networks and Learning Systems, 2024. [Google Scholar] [Crossref]
2. G. Peronikolis and C. Panagiotakis, “Personalized video summarization: A comprehensive survey of methods and datasets,” Machine Learning and Knowledge Extraction, MDPI, 2024. [Google Scholar] [Crossref]
3. X. Xu, et al., “MHSCNet: Multimodal hierarchical shot-aware convolutional network for video summarization,” arXiv preprint, 2024. [Google Scholar] [Crossref]
4. Y. Qiu, et al., “Semantics-consistent cross-domain video summarization via optimal transport alignment,” arXiv preprint, 2023. [Google Scholar] [Crossref]
5. J. Park, et al., “Multimodal frame-scoring transformer for video summarization,” arXiv preprint, 2023. [Google Scholar] [Crossref]
6. M. Krubiński and P. Pecina, “MLASK: Multimodal summarization of video-based news articles,” in Proceedings of the European Chapter of the Association for Computational Linguistics (EACL), 2023. [Google Scholar] [Crossref]
7. M. Alaa, et al., “Video summarization techniques: A comprehensive review,” SciTePress, 2022. [Google Scholar] [Crossref]
8. H. Zhou, et al., “End-to-end dense video captioning with masked transformer,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022. [Google Scholar] [Crossref]
9. A. Baevski, et al., “wav2vec 2.0: A framework for self-supervised learning of speech representations,” in Proceedings of the Neural Information Processing Systems (NeurIPS), 2021. [Google Scholar] [Crossref]
10. H. Ji, D. Hooshyar, et al., “A semantic-based video scene segmentation using a deep neural network,” SAGE Journals, 2021. [Google Scholar] [Crossref]
11. Y. Otani, et al., “Video summarization using deep semantic features,” IEEE Transactions on Circuits and Systems for Video Technology, 2021. [Google Scholar] [Crossref]
12. J. Zhong, et al., “Video summarization with attention-based encoder–decoder networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020. [Google Scholar] [Crossref]
13. S. Sahoo, et al., “A unified multi-faceted video summarization system,” arXiv preprint, 2020. [Google Scholar] [Crossref]
14. V. D. Desai, “A review paper on keyframe extraction techniques for video summarization,” International Journal of Research and Analytical Reviews (IJRAR), 2019. [Google Scholar] [Crossref]
15. “Review of keyframe extraction techniques for video summarization,” International Journal of Computer Applications (IJCA), 2019. [Google Scholar] [Crossref]
16. “Digital video summarization techniques: A survey,” International Journal of Engineering Research and Technology (IJERT), 2018. [Google Scholar] [Crossref]
17. N. Ejaz, et al., “Adaptive key frame extraction for video summarization,” Elsevier Journal, 2018. [Google Scholar] [Crossref]
18. H. Yu, et al., “Video summarization using U-shaped non-local network,” Elsevier Journal, 2017. [Google Scholar] [Crossref]
19. Y. Zhu, et al., “Topic-aware video summarization using multimodal feature learning,” Elsevier Journal, 2016. [Google Scholar] [Crossref]
20. S. Sahoo, et al., “A unified multi-faceted video summarization system,” arXiv preprint, 2012. [Google Scholar] [Crossref]
Metrics
Views & Downloads
Similar Articles
- What the Desert Fathers Teach Data Scientists: Ancient Ascetic Principles for Ethical Machine-Learning Practice
- Comparative Analysis of Some Machine Learning Algorithms for the Classification of Ransomware
- Comparative Performance Analysis of Some Priority Queue Variants in Dijkstra’s Algorithm
- Transfer Learning in Detecting E-Assessment Malpractice from a Proctored Video Recordings.
- Dual-Modal Detection of Parkinson’s Disease: A Clinical Framework and Deep Learning Approach Using NeuroParkNet