Skip to content
/ ECHR Public

Code for paper "Event-centric hierarchical representation for dense video captioning" (TCSVT2020)

License

Notifications You must be signed in to change notification settings

ttengwang/ECHR

Repository files navigation

Event-Centric Hierarchical Representation for Dense Video Captioning

In this paper, we propose event-centric hierarchical representation for dense video captioning. We enhance the event-level representation by capturing rich relationship between events in terms of both temporal structure and semantic meaning. Then, a caption generator with late fusion is developed to generate surrounding-event-aware and topic-aware sentences, conditioned on the hierarchical representation of visual cues from the scene level, the event level, and the frame level.

This repo contains main codes of experiments on the ActivityNet Captions dataset.

Usage

  • Install Python 2.7 + PyTorch 0.4 + CUDA 10.0. Then run pip install environment.txt.
  • Prepare the video and annotation data. Please refer to url.
  • Training scripts are in this folder experiments.

Reference

@ARTICLE{Wang2020echr,
  author={T. {Wang} and H. {Zheng} and M. {Yu} and Q. {Tian} and H. {Hu}},
  journal={IEEE Transactions on Circuits and Systems for Video Technology}, 
  title={Event-Centric Hierarchical Representation for Dense Video Captioning}, 
  year={2020}}

Acknowledgement

This code is based on ImageCaptioning.Pytorch.

About

Code for paper "Event-centric hierarchical representation for dense video captioning" (TCSVT2020)

Resources

License

Stars

Watchers

Forks