You can use the code in this repo to genearte a MSCOCO evaluation server submission with CIDEr=0.96+ with just a few hours.
No fine-tuning required. No fancy tricks. Just train three end-to-end review networks and do an ensemble.
- Feature extraction: 2 hours in parallel
 - Single model training: 6 hours
 - Ensemble model training: 30 mins
 - Beam search for caption generation: 3 hours in parallel
 
Below is a comparison with other state-of-the-art systems (with according published papers) on the MSCOCO evaluation server:
| Model | BLEU-4 | METEOR | ROUGE-L | CIDEr | Fine-tuned | Task specific features | 
|---|---|---|---|---|---|---|
| Attention | 0.537 | 0.322 | 0.654 | 0.893 | No | No | 
| MS Research | 0.567 | 0.331 | 0.662 | 0.925 | No | Yes | 
| Google NIC | 0.587 | 0.346 | 0.682 | 0.946 | Yes | No | 
| Semantic Attention | 0.599 | 0.335 | 0.682 | 0.958 | No | Yes | 
| Review Net | 0.597 | 0.347 | 0.686 | 0.969 | No | No | 
In the diretcory image_caption_online, you can use the code therein to reproduce our evaluation server results.
In the directory image_caption_offline, you can rerun experiments in our paper using offline evaluation.
Predicting comments for a piece of source code is another interesting task. In the repo we also release a dataset with train/dev/test splits, along with the code of a review network.
Check out the directory code_caption.
Below is a comparison with baselines on the code captioning dataset:
| Model | LLH | CS-1 | CS-2 | CS-3 | CS-4 | CS-5 | 
|---|---|---|---|---|---|---|
| LSTM Language Model | -5.34 | 0.2340 | 0.2763 | 0.3000 | 0.3153 | 0.3290 | 
| Encoder-Decoder | -5.25 | 0.2535 | 0.2976 | 0.3201 | 0.3367 | 0.3507 | 
| Encoder-Decoder (Bidir) | -5.19 | 0.2632 | 0.3068 | 0.3290 | 0.3442 | 0.3570 | 
| Attentive Encoder-Decoder (Bidir) | -5.14 | 0.2716 | 0.3152 | 0.3364 | 0.3523 | 0.3651 | 
| Review Net | -5.06 | 0.2889 | 0.3361 | 0.3579 | 0.3731 | 0.3840 | 
This repo contains the code and data used in the following paper:
Review Networks for Caption Generation
Zhilin Yang, Ye Yuan, Yuexin Wu, Ruslan Salakhutdinov, William W. Cohen
NIPS 2016