Combinatorial Video Captioning With Deep Learning

Angadi Ahalya, Cheelam Aishwarya Laxmi

Authors

Angadi Ahalya, Cheelam Aishwarya Laxmi B.Tech Students, Department of CSE, Bhoj Reddy Engineering College for Women, India Author

Abstract

Video captioning is the task of generating natural language descriptions for videos by analyzing visual scenes, objects, and actions. Unlike video subtitling, which transcribes spoken dialogue, video captioning provides a comprehensive interpretation of all visual elements. Traditional approaches relied on rule-based and feature-based methods, which struggled with complex videos due to their rigidity and lack of contextual understanding.
Modern techniques leverage deep learning models, such as Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs), to extract video features and generate captions. Recent advancements focus on weakly supervised dense video captioning, which generates descriptions without predefined key events. This approach is particularly useful for long, untrimmed videos where multiple overlapping events occur, improving event recognition and caption accuracy. By combining event captioning with caption localization, this method enhances both contextual understanding and flexibility in video captioning tasks.

Combinatorial Video Captioning With Deep Learning

Authors

Abstract

Downloads

Published

Issue

Section

How to Cite

Call For Paper

Submission

MenuBar

Visitors in IJESR

Images

Indexed

Information

Reach Us

Important Links

Downloads & Indexing

Ethics & Policies