facebookresearch
diff --git a/‎.gitignore‎
Lines changed: 1 addition & 0 deletions b/‎.gitignore‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎README.md‎
Lines changed: 11 additions & 8 deletions b/‎README.md‎
Lines changed: 11 additions & 8 deletions
diff --git a/‎docs/source/tutorials/concepts.md‎
Lines changed: 37 additions & 33 deletions b/‎docs/source/tutorials/concepts.md‎
Lines changed: 37 additions & 33 deletions
diff --git a/‎docs/source/tutorials/pretrained_models.md‎
Lines changed: 13 additions & 11 deletions b/‎docs/source/tutorials/pretrained_models.md‎
Lines changed: 13 additions & 11 deletions
@@ -4,6 +4,7 @@
 *.swp
 .idea/*
 **/__pycache__/*
+**/output/*
 data/.DS_Store
 docs/build
 results/*
 
@@ -97,13 +97,15 @@ wget imdb_link
 tar xf [imdb].tar.gz
 ```
 
-| Dataset      | Key | Task | ImDB Link                                                                         | Features Link  | Features checksum                                                                 |
-|--------------|-----|-----|-----------------------------------------------------------------------------------|---------------------------------------------------------------------------------|---------|
-| TextVQA      | textvqa | vqa | [TextVQA 0.5 ImDB](https://dl.fbaipublicfiles.com/pythia/data/imdb/textvqa_0.5.tar.gz) | [OpenImages](https://dl.fbaipublicfiles.com/pythia/features/open_images.tar.gz) | `b22e80997b2580edaf08d7e3a896e324` | 
-| VQA 2.0      | vqa2 | vqa | [VQA 2.0 ImDB](https://dl.fbaipublicfiles.com/pythia/data/imdb/vqa.tar.gz)                 | [COCO](https://dl.fbaipublicfiles.com/pythia/features/coco.tar.gz)              | `ab7947b04f3063c774b87dfbf4d0e981` |
-| VizWiz       | vizwiz | vqa | [VizWiz ImDB](https://dl.fbaipublicfiles.com/pythia/data/imdb/vizwiz.tar.gz)           | [VizWiz](https://dl.fbaipublicfiles.com/pythia/features/vizwiz.tar.gz)          | `9a28d6a9892dda8519d03fba52fb899f` |
-| VisualDialog | visdial | dialog | Coming soon!                                                                      | Coming soon!                                                                    | Coming soon! | 
-| MS COCO  | coco    | captioning | [COCO Caption](https://dl.fbaipublicfiles.com/pythia/data/imdb/coco_captions.tar.gz)      | [COCO](https://dl.fbaipublicfiles.com/pythia/features/coco.tar.gz)           | `ab7947b04f3063c774b87dfbf4d0e981`| 
+| Dataset      | Key | Task | ImDB Link                                                                         | Features Link  | Features checksum                                                                 | Notes|
+|--------------|-----|-----|-----------------------------------------------------------------------------------|---------------------------------------------------------------------------------|---------|-----|
+| TextVQA      | textvqa | vqa | [TextVQA 0.5 ImDB](https://dl.fbaipublicfiles.com/pythia/data/imdb/textvqa_0.5.tar.gz) | [OpenImages](https://dl.fbaipublicfiles.com/pythia/features/open_images.tar.gz) | `b22e80997b2580edaf08d7e3a896e324` || 
+| VQA 2.0      | vqa2 | vqa | [VQA 2.0 ImDB](https://dl.fbaipublicfiles.com/pythia/data/imdb/vqa.tar.gz)                 | [COCO](https://dl.fbaipublicfiles.com/pythia/features/coco.tar.gz)              | `ab7947b04f3063c774b87dfbf4d0e981` ||
+| VizWiz       | vizwiz | vqa | [VizWiz ImDB](https://dl.fbaipublicfiles.com/pythia/data/imdb/vizwiz.tar.gz)           | [VizWiz](https://dl.fbaipublicfiles.com/pythia/features/vizwiz.tar.gz)          | `9a28d6a9892dda8519d03fba52fb899f` ||
+| VisualDialog | visdial | dialog | Coming soon!                                                                      | Coming soon!                                                                    | Coming soon! | |
+| VisualGenome | visual_genome | vqa | Automatically downloaded                                                                      | Automatically downloaded                                                                    | Coming soon! | Also supports scene graphs|
+| CLEVR | clevr | vqa | Automatically downloaded                                                                      | Automatically downloaded                                                                    |  | |
+| MS COCO  | coco    | captioning | [COCO Caption](https://dl.fbaipublicfiles.com/pythia/data/imdb/coco_captions.tar.gz)      | [COCO](https://dl.fbaipublicfiles.com/pythia/features/coco.tar.gz)           | `ab7947b04f3063c774b87dfbf4d0e981`| |
 
 After downloading the features, verify the download by checking the md5sum using 
 
@@ -119,8 +121,9 @@ supported by the models in Pythia's model zoo.
 
 | Model  | Key | Supported Datasets    | Pretrained Models | Notes                                                     |
 |--------|-----------|-----------------------|-------------------|-----------------------------------------------------------|
-| Pythia | pythia    | vqa2, vizwiz, textvqa | [vqa2 train+val](https://dl.fbaipublicfiles.com/pythia/pretrained_models/vqa2/pythia_train_val.pth), [vqa2 train only](https://dl.fbaipublicfiles.com/pythia/pretrained_models/vqa2/pythia.pth), [vizwiz](https://dl.fbaipublicfiles.com/pythia/pretrained_models/vizwiz/pythia_pretrained_vqa2.pth)  | VizWiz model has been pretrained on VQAv2 and transferred |
+| Pythia | pythia    | vqa2, vizwiz, textvqa, visual_genome | [vqa2 train+val](https://dl.fbaipublicfiles.com/pythia/pretrained_models/vqa2/pythia_train_val.pth), [vqa2 train only](https://dl.fbaipublicfiles.com/pythia/pretrained_models/vqa2/pythia.pth), [vizwiz](https://dl.fbaipublicfiles.com/pythia/pretrained_models/vizwiz/pythia_pretrained_vqa2.pth)  | VizWiz model has been pretrained on VQAv2 and transferred |
 | LoRRA  | lorra     | vqa2, vizwiz, textvqa       | [textvqa](https://dl.fbaipublicfiles.com/pythia/pretrained_models/textvqa/lorra_best.pth)      |                               |
+| CNN LSTM  | cnn_lstm     | clevr       |       | Features are calculated on fly. |                             
 | BAN    | ban       | vqa2, vizwiz, textvqa | Coming soon!      | Support is preliminary and haven't been tested thoroughly. |
 | BUTD    | butd       | coco | [coco](https://dl.fbaipublicfiles.com/pythia/pretrained_models/coco_captions/butd.pth)    |              |
 
 
@@ -31,33 +31,37 @@ to refer it in the command line arguments.
 Following table shows the tasks and their datasets:
 
 ```eval_rst
-+--------+------------+------------------------+
-|**Task**| **Key**    | **Datasets**           |
-+--------+------------+------------------------+
-| VQA    | vqa        | VQA2.0, VizWiz, TextVQA|
-+--------+------------+------------------------+
-| Dialog | dialog     | VisualDialog           |
-+--------+------------+------------------------+
-| Caption| captioning | MS COCO                |
-+--------+------------+------------------------+
++--------+------------+---------------------------------------------+
+|**Task**| **Key**    | **Datasets**                                |
++--------+------------+---------------------------------------------+
+| VQA    | vqa        | VQA2.0, VizWiz, TextVQA, VisualGenome, CLEVR|
++--------+------------+---------------------------------------------+
+| Dialog | dialog     | VisualDialog                                |
++--------+------------+---------------------------------------------+
+| Caption| captioning | MS COCO                                     |
++--------+------------+---------------------------------------------+
 ```
 
 Following table shows the inverse of the above table, datasets along with their tasks and keys:
 
 ```eval_rst
-+--------------+---------+-----------+--------------------+
-| **Datasets** | **Key** | **Task**  |**Notes**           |
-+--------------+---------+-----------+--------------------+
-| VQA 2.0      | vqa2    | vqa       |                    |
-+--------------+---------+-----------+--------------------+
-| TextVQA      | textvqa | vqa       |                    |
-+--------------+---------+-----------+--------------------+
-| VizWiz       | vizwiz  | vqa       |                    |
-+--------------+---------+-----------+--------------------+
-| VisualDialog | visdial | dialog    |   Coming soon!     |
-+--------------+---------+-----------+--------------------+
-| MS COCO      | coco    | captioning|                    |
-+--------------+---------+-----------+--------------------+
++--------------+---------------+-----------+--------------------+
+| **Datasets** | **Key**       | **Task**  |**Notes**           |
++--------------+---------------+-----------+--------------------+
+| VQA 2.0      | vqa2          | vqa       |                    |
++--------------+---------------+-----------+--------------------+
+| TextVQA      | textvqa       | vqa       |                    |
++--------------+---------------+-----------+--------------------+
+| VizWiz       | vizwiz        | vqa       |                    |
++--------------+---------------+-----------+--------------------+
+| VisualDialog | visdial       | dialog    |   Coming soon!     |
++--------------+---------------+-----------+--------------------+
+| VisualGenome | visual_genome | vqa       |                    |
++--------------+---------------+-----------+--------------------+
+| CLEVR        | clevr         | vqa       |                    |
++--------------+---------------+-----------+--------------------+
+| MS COCO      | coco          | captioning|                    |
++--------------+---------------+-----------+--------------------+
 ```
 
 ## Models
@@ -75,17 +79,17 @@ reference in configuration and command line arguments. Following table shows eac
 key name and datasets it can be run on.
 
 ```eval_rst
-+-----------+---------+-----------------------+
-| **Model** | **Key** | **Datasets**          |
-+-----------+---------+-----------------------+
-| LoRRA     | lorra   | textvqa, vizwiz       |
-+-----------+---------+-----------------------+
-| Pythia    | pythia  | textvqa, vizwiz, vqa2 |
-+-----------+---------+-----------------------+
-| BAN       | ban     | textvqa, vizwiz, vqa2 |
-+-----------+---------+-----------------------+
-| BUTD      | butd    | coco                  |
-+-----------+---------+-----------------------+
++-----------+---------+--------------------------------------+
+| **Model** | **Key** | **Datasets**                         |
++-----------+---------+--------------------------------------+
+| LoRRA     | lorra   | textvqa, vizwiz                      |
++-----------+---------+--------------------------------------+
+| Pythia    | pythia  | textvqa, vizwiz, vqa2, visual_genome |
++-----------+---------+--------------------------------------+
+| BAN       | ban     | textvqa, vizwiz, vqa2                |
++-----------+---------+--------------------------------------+
+| BUTD      | butd    | coco                                 |
++-----------+---------+--------------------------------------+
 ```
 
 ```eval_rst
 
@@ -6,17 +6,19 @@ predictions for EvalAI evaluation. This section expects that you have already in
 required data as explained in [quickstart](./quickstart).
 
 ```eval_rst
-+--------+-----------+-----------------------+---------------------------------------------------+-----------------------------------------------------------+
-| Model  | Model Key | Supported Datasets    | Pretrained Models                                 | Notes                                                     |
-+--------+-----------+-----------------------+---------------------------------------------------+-----------------------------------------------------------+
-| Pythia | pythia    | vqa2, vizwiz, textvqa | `vqa2 train+val`_, `vqa2 train only`_,  `vizwiz`_ | VizWiz model has been pretrained on VQAv2 and transferred |
-+--------+-----------+-----------------------+---------------------------------------------------+-----------------------------------------------------------+
-| LoRRA  | lorra     | vqa2, vizwiz, textvqa | `textvqa`_                                        |                                                           |
-+--------+-----------+-----------------------+---------------------------------------------------+-----------------------------------------------------------+
-| BAN    | ban       | vqa2, vizwiz, textvqa | Coming soon!                                      | Support is preliminary and haven't been tested throughly. |
-+--------+-----------+-----------------------+---------------------------------------------------+-----------------------------------------------------------+
-| BUTD   | butd      | coco                  | `coco`_                                           |                                                           |
-+--------+-----------+-----------------------+---------------------------------------------------+-----------------------------------------------------------+
++--------+-----------+---------------------------------------+---------------------------------------------------+-----------------------------------------------------------+
+| Model  | Model Key | Supported Datasets                    | Pretrained Models                                 | Notes                                                     |
++--------+-----------+---------------------------------------+---------------------------------------------------+-----------------------------------------------------------+
+| Pythia | pythia    | vqa2, vizwiz, textvqa, visual_genome, | `vqa2 train+val`_, `vqa2 train only`_,  `vizwiz`_ | VizWiz model has been pretrained on VQAv2 and transferred |
++--------+-----------+---------------------------------------+---------------------------------------------------+-----------------------------------------------------------+
+| LoRRA  | lorra     | vqa2, vizwiz, textvqa                 | `textvqa`_                                        |                                                           |
++--------+-----------+---------------------------------------+---------------------------------------------------+-----------------------------------------------------------+
+| CNNLSTM| cnn_lstm  | clevr                                 |                                                   | Features are calculated on fly in this on                 |
++--------+-----------+---------------------------------------+---------------------------------------------------+-----------------------------------------------------------+
+| BAN    | ban       | vqa2, vizwiz, textvqa                 | Coming soon!                                      | Support is preliminary and haven't been tested throughly. |
++--------+-----------+---------------------------------------+---------------------------------------------------+-----------------------------------------------------------+
+| BUTD   | butd      | coco                                  | `coco`_                                           |                                                           |
++--------+-----------+---------------------------------------+---------------------------------------------------+-----------------------------------------------------------+
 
 .. _vqa2 train+val: https://dl.fbaipublicfiles.com/pythia/pretrained_models/vqa2/pythia_train_val.pth
 .. _vqa2 train only: https://dl.fbaipublicfiles.com/pythia/pretrained_models/vqa2/pythia.pth