AI & ML Enabled Video Analysis and Interpretation

Vivek Chauhan; Vivek Sharma; Yash Rajput; Shani Rathore; Mr. Suman Kumar Jha; Badal Bhushan

doi:10.51584/IJRIAS.2025.10120067

AI & ML Enabled Video Analysis and Interpretation

Authors

B. Tech (CSE) -Final Year Student, Dept Computer Science & Engineering, IIMT College of Engineering, Greater Noida (India)

Vivek Sharma

B. Tech (CSE) -Final Year Student, Dept Computer Science & Engineering, IIMT College of Engineering, Greater Noida (India)

Yash Rajput

B. Tech (CSE) -Final Year Student, Dept Computer Science & Engineering, IIMT College of Engineering, Greater Noida (India)

Shani Rathore

B. Tech (CSE) -Final Year Student, Dept Computer Science & Engineering, IIMT College of Engineering, Greater Noida (India)

Mr. Suman Kumar Jha

Project Supervisor, Dept. of Computer Science & Engineering, IIMT College of Engineering, Greater Noida,, Greater Noida, UP (India)

Badal Bhushan

Project Supervisor, Dept. of Computer Science & Engineering, IIMT College of Engineering, Greater Noida,, Greater Noida, UP (India)

Article Information

DOI: 10.51584/IJRIAS.2025.10120067

Subject Category: Computer Science

Volume/Issue: 10/12 | Page No: 801-809

Publication Timeline

Submitted: 2025-12-24

Accepted: 2025-12-29

Published: 2026-01-16

Abstract

With video content absolutely everywhere these days—on learning platforms, in business settings, across social media—trying to analyze it all by hand has become practically impossible. Our paper describes a framework we built that uses AI and machine learning to make understanding videos much simpler, whether you're uploading your own footage or just sharing a link to something online.
Here's how it works: the system examines what's actually happening on screen while also listening to the audio, then brings everything together into summaries that actually make sense. We're using a Transformer-based model that's really good at figuring out how different moments in a video relate to each other and what they mean in context. After you get your summary, there's also a lightweight language model that lets you have an actual conversation about what you watched—you can ask questions and get answers that show a real understanding of the content.

Keywords

Video Analysis, Video Summarization, Artificial Intelligence, Machine Learning

Downloads

PDF JATS XML

References

1. E. Apostolidis, E. Adamantidou, A. I. Metsai, V. Mezaris, and I. Patras, “Video summarization using deep neural networks: A survey,” IEEE Transactions on Neural Networks and Learning Systems, 2024. [Google Scholar] [Crossref]

2. G. Peronikolis and C. Panagiotakis, “Personalized video summarization: A comprehensive survey of methods and datasets,” Machine Learning and Knowledge Extraction, MDPI, 2024. [Google Scholar] [Crossref]

3. X. Xu, et al., “MHSCNet: Multimodal hierarchical shot-aware convolutional network for video summarization,” arXiv preprint, 2024. [Google Scholar] [Crossref]

4. Y. Qiu, et al., “Semantics-consistent cross-domain video summarization via optimal transport alignment,” arXiv preprint, 2023. [Google Scholar] [Crossref]

5. J. Park, et al., “Multimodal frame-scoring transformer for video summarization,” arXiv preprint, 2023. [Google Scholar] [Crossref]

6. M. Krubiński and P. Pecina, “MLASK: Multimodal summarization of video-based news articles,” in Proceedings of the European Chapter of the Association for Computational Linguistics (EACL), 2023. [Google Scholar] [Crossref]

7. M. Alaa, et al., “Video summarization techniques: A comprehensive review,” SciTePress, 2022. [Google Scholar] [Crossref]

8. H. Zhou, et al., “End-to-end dense video captioning with masked transformer,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022. [Google Scholar] [Crossref]

9. A. Baevski, et al., “wav2vec 2.0: A framework for self-supervised learning of speech representations,” in Proceedings of the Neural Information Processing Systems (NeurIPS), 2021. [Google Scholar] [Crossref]

10. H. Ji, D. Hooshyar, et al., “A semantic-based video scene segmentation using a deep neural network,” SAGE Journals, 2021. [Google Scholar] [Crossref]

11. Y. Otani, et al., “Video summarization using deep semantic features,” IEEE Transactions on Circuits and Systems for Video Technology, 2021. [Google Scholar] [Crossref]

12. J. Zhong, et al., “Video summarization with attention-based encoder–decoder networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020. [Google Scholar] [Crossref]

13. S. Sahoo, et al., “A unified multi-faceted video summarization system,” arXiv preprint, 2020. [Google Scholar] [Crossref]

14. V. D. Desai, “A review paper on keyframe extraction techniques for video summarization,” International Journal of Research and Analytical Reviews (IJRAR), 2019. [Google Scholar] [Crossref]

15. “Review of keyframe extraction techniques for video summarization,” International Journal of Computer Applications (IJCA), 2019. [Google Scholar] [Crossref]

16. “Digital video summarization techniques: A survey,” International Journal of Engineering Research and Technology (IJERT), 2018. [Google Scholar] [Crossref]

17. N. Ejaz, et al., “Adaptive key frame extraction for video summarization,” Elsevier Journal, 2018. [Google Scholar] [Crossref]

18. H. Yu, et al., “Video summarization using U-shaped non-local network,” Elsevier Journal, 2017. [Google Scholar] [Crossref]

19. Y. Zhu, et al., “Topic-aware video summarization using multimodal feature learning,” Elsevier Journal, 2016. [Google Scholar] [Crossref]

20. S. Sahoo, et al., “A unified multi-faceted video summarization system,” arXiv preprint, 2012. [Google Scholar] [Crossref]

AI & ML Enabled Video Analysis and Interpretation

Authors

Article Information

Publication Timeline

Abstract

Keywords

Downloads

References

Metrics

Views & Downloads

Similar Articles