Scalability: To provide an easily extensible architecture where new object categories can be introduced
by retraining on custom datasets without altering the entire model pipeline.
Environmental Robustness: To make the detection system adaptable to changing environmental factors
like lighting, motion blur, and occlusion, ensuring stable performance in practical applications.
User-Friendly Interface: To design a responsive, browser-based web interface using Flask that allows
users to perform object detection seamlessly through webcam, video, or image uploads.
Secure Deployment: To implement authentication and access control, ensuring that system usage remains
secure and consistent with ethical AI principles [3].
LITERATURE SURVEY
Research in real-time object detection has evolved significantly with the advancement of deep learning
frameworks. The study by R. K. et al. [1] presented a comparative analysis of YOLO-based architectures,
emphasizing how YOLOv8 improves upon earlier versions by adopting a decoupled head, enhanced backbone,
and anchor-free detection approach. This enables higher speed and accuracy for small and large objects alike.
Mohammed Kawser Jahan et al. [2] proposed an enhanced YOLOv8 model for improving online platform safety
by detecting objectionable content in live streams. Their work introduced fine-tuning techniques and
hyperparameter optimization that significantly reduced false positives. The integration of YOLOv8 with edge
AI technology demonstrated the model’s ability to operate efficiently under limited hardware conditions.
U. Dwivedi et al. [3] explored moving object detection using YOLOv5 and introduced tracking mechanisms
using Kalman filters and DeepSORT. This allowed continuous object tracking in videos, a concept that forms
the foundation for integrating detection and tracking in future iterations of our system.
S. Borkar et al. [4] proposed a reinforcement learning approach combined with YOLOv4 to dynamically improve
detection accuracy over time. Their hybrid framework demonstrated adaptability to environmental changes by
rewarding models for successful detections.
Afdhal et al. [ 5 ] applied YOLOv8 for self-driving cars, emphasizing performance in mixed traffic environments.
Their research validated YOLOv8’s scalability and real-time efficiency, proving it to be an ideal choice for
critical applications like automated driving.
Problem Statement
Traditional object detection methods are computationally expensive and inefficient in real- time scenarios. They
rely heavily on manual feature extraction and often fail in dynamic environments with changing light, occlusion,
or motion. Therefore, there is a pressing need for an automated deep learning–based system that can detect and
classify multiple objects in real-time from various input sources such as webcams, videos, and images, while
maintaining low latency and offering user accessibility through a web platform [1] [2].
Existing System
Existing systems like SSD, Faster R-CNN, and early YOLO versions have contributed immensely to object
detection. However, most of these rely on anchor-based detection and complex region proposal mechanisms that
increase inference time [3]. Systems such as Faster R-CNN offer high precision but are not suitable for real-time
performance due to their two-stage processing nature. SSD and YOLOv3 improved on speed but still struggle
with smaller objects or cluttered scenes.
Moreover, current detection solutions lack easy accessibility for non-technical users. They often require
installing specialized software or deep learning libraries locally. The absence of an integrated, browser-based
interface further limits their usability in practical applications such as surveillance or live analytics [4].
Disadvantages of the Existing System
Limited Real-Time Performance: Conventional models struggle to maintain high frame rates for real-time
scenarios [2].