GitHub - stoneyezhenxu/Multimodal_Video_Classification: Designed and implemented a "General Multi-model Video Classification Frameworks" based on Tensorflow platform.

1、Introduction

This project designed and implemented a "General Multi-model Video Classification Frameworks", including the overall process of training and predicting. The source code based on tensorflow-1.14 platform.

The specific supported functions are as follows:

Video cate classification tasks（multi-classes）
Video tag classification tasks（multi-labels）
Text vector aggregation: textCNN and Bi-LSTM
Video frames and audio frames aggregation : nextvlad and trn
Adversarial perturbation（improving the robustness and generalization capabilities of the model）
Multi-gpus based on single machine
Eval metrics: Precison, Recall, F1,GAP and mAP
Multi-task learning: cate & tag classification
Generate the multi-modal video embedding: which can be used to construct similar video recall and other tasks

2、Framework

The architecture of "General Multi-model Video Classification Frameworks" includes two stages:

Stage1: Multi-modal feature representation

Stage II: Multi-modal feature fusion and classification

2.1 Modal Aggreration Module

2.1.1. Nextvald

2.1.2. TRN

2.1.3. TextCNN

2.1.4. Bi-LSTM

2.2 Multi-modal Funsion

2.2.1. GateFunsion Block

2.2.2. SE-Gate Block

2.3 Adversarial Perturbation Loss

![adversarial Perturbations](imgs/adversarial Perturbations.png)

3、Directory Structure

├── README.md         -->documentation
├── requirements.txt  -->environment dependencies
├── scripts
│   ├── infer.sh      --> pipeline for predict       
│   └── train.sh      --> pipeline for train
└── src
    ├── data.py							--> process for data
    ├── eval_metrics.py     --> eval metrics
    ├── models.py           --> the implementation of each model
    ├── train.py            --> the entrance of train of predict
    ├── utils.py            --> multi-gpus
    └── video_model.py      --> the whole framework of model

4、How to run

4.1. install the environment

pip install -r requirements.txt

4.2. Train process

cd ~
sh scripts/train.sh

4.3. Predict process

cd ~
sh scripts/infer.sh

5、Experimental Results

Tips: For any other inquiries, kindly contact me through this email : [email protected]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

1、Introduction

2、Framework

2.1 Modal Aggreration Module

2.1.1. Nextvald

2.1.2. TRN

2.1.3. TextCNN

2.1.4. Bi-LSTM

2.2 Multi-modal Funsion

2.2.1. GateFunsion Block

2.2.2. SE-Gate Block

2.3 Adversarial Perturbation Loss

3、Directory Structure

4、How to run

4.1. install the environment

4.2. Train process

4.3. Predict process

5、Experimental Results

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
imgs		imgs
scripts		scripts
src		src
utils		utils
README.md		README.md
requirements.txt		requirements.txt

stoneyezhenxu/Multimodal_Video_Classification

Folders and files

Latest commit

History

Repository files navigation

1、Introduction

2、Framework

2.1 Modal Aggreration Module

2.1.1. Nextvald

2.1.2. TRN

2.1.3. TextCNN

2.1.4. Bi-LSTM

2.2 Multi-modal Funsion

2.2.1. GateFunsion Block

2.2.2. SE-Gate Block

2.3 Adversarial Perturbation Loss

3、Directory Structure

4、How to run

4.1. install the environment

4.2. Train process

4.3. Predict process

5、Experimental Results

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages