Skip to content

stoneyezhenxu/Multimodal_Video_Classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

1、Introduction

This project designed and implemented a "General Multi-model Video Classification Frameworks", including the overall process of training and predicting. The source code based on tensorflow-1.14 platform.

The specific supported functions are as follows:

  • Video cate classification tasks(multi-classes)
  • Video tag classification tasks(multi-labels)
  • Text vector aggregation: textCNN and Bi-LSTM
  • Video frames and audio frames aggregation : nextvlad and trn
  • Adversarial perturbation(improving the robustness and generalization capabilities of the model)
  • Multi-gpus based on single machine
  • Eval metrics: Precison, Recall, F1,GAP and mAP
  • Multi-task learning: cate & tag classification
  • Generate the multi-modal video embedding: which can be used to construct similar video recall and other tasks

2、Framework

The architecture of "General Multi-model Video Classification Frameworks" includes two stages:

​ Stage1: Multi-modal feature representation

​ Stage II: Multi-modal feature fusion and classification

framework

2.1 Modal Aggreration Module

2.1.1. Nextvald

nextvlad.png

2.1.2. TRN

trn

2.1.3. TextCNN

textCNN

2.1.4. Bi-LSTM

BiLSTM

2.2 Multi-modal Funsion

2.2.1. GateFunsion Block

gate-fusion

2.2.2. SE-Gate Block

gate-fusion

2.3 Adversarial Perturbation Loss

![adversarial Perturbations](imgs/adversarial Perturbations.png)

3、Directory Structure

├── README.md         -->documentation
├── requirements.txt  -->environment dependencies
├── scripts
│   ├── infer.sh      --> pipeline for predict       
│   └── train.sh      --> pipeline for train
└── src
    ├── data.py							--> process for data
    ├── eval_metrics.py     --> eval metrics
    ├── models.py           --> the implementation of each model
    ├── train.py            --> the entrance of train of predict
    ├── utils.py            --> multi-gpus
    └── video_model.py      --> the whole framework of model           

4、How to run

4.1. install the environment

pip install -r requirements.txt

4.2. Train process

cd ~
sh scripts/train.sh

4.3. Predict process

cd ~
sh scripts/infer.sh

5、Experimental Results

results

Tips: For any other inquiries, kindly contact me through this email : [email protected]

About

Designed and implemented a "General Multi-model Video Classification Frameworks" based on Tensorflow platform.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published