Skip to content

Commit 12f67cd

Browse files
authored
[feat] Visual Genome Support (#140)
* [feature] Extractor for various filetypes * [feat] Add builder for visual genome - Fixes #82 - Automatically downloads features and other things required for the dataset - Extracts them also * [chores] Extra things for .gitignore as per new scripts * [feat] Support for loading _info.npy files for each image * [feat] Load jsonl files in image database and scene graph database * [feat] Visual Genome dataset, various options for loading scene graphs etc - You can load scene_graphs, info about features, objects and relationships separately - QA will be loaded by default * [chores] Update README and docs * [fix] Address comments
1 parent b60e31d commit 12f67cd

File tree

20 files changed

+575
-70
lines changed

20 files changed

+575
-70
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@
44
*.swp
55
.idea/*
66
**/__pycache__/*
7+
**/output/*
78
data/.DS_Store
89
docs/build
910
results/*

README.md

Lines changed: 11 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -97,13 +97,15 @@ wget imdb_link
9797
tar xf [imdb].tar.gz
9898
```
9999

100-
| Dataset | Key | Task | ImDB Link | Features Link | Features checksum |
101-
|--------------|-----|-----|-----------------------------------------------------------------------------------|---------------------------------------------------------------------------------|---------|
102-
| TextVQA | textvqa | vqa | [TextVQA 0.5 ImDB](https://dl.fbaipublicfiles.com/pythia/data/imdb/textvqa_0.5.tar.gz) | [OpenImages](https://dl.fbaipublicfiles.com/pythia/features/open_images.tar.gz) | `b22e80997b2580edaf08d7e3a896e324` |
103-
| VQA 2.0 | vqa2 | vqa | [VQA 2.0 ImDB](https://dl.fbaipublicfiles.com/pythia/data/imdb/vqa.tar.gz) | [COCO](https://dl.fbaipublicfiles.com/pythia/features/coco.tar.gz) | `ab7947b04f3063c774b87dfbf4d0e981` |
104-
| VizWiz | vizwiz | vqa | [VizWiz ImDB](https://dl.fbaipublicfiles.com/pythia/data/imdb/vizwiz.tar.gz) | [VizWiz](https://dl.fbaipublicfiles.com/pythia/features/vizwiz.tar.gz) | `9a28d6a9892dda8519d03fba52fb899f` |
105-
| VisualDialog | visdial | dialog | Coming soon! | Coming soon! | Coming soon! |
106-
| MS COCO | coco | captioning | [COCO Caption](https://dl.fbaipublicfiles.com/pythia/data/imdb/coco_captions.tar.gz) | [COCO](https://dl.fbaipublicfiles.com/pythia/features/coco.tar.gz) | `ab7947b04f3063c774b87dfbf4d0e981`|
100+
| Dataset | Key | Task | ImDB Link | Features Link | Features checksum | Notes|
101+
|--------------|-----|-----|-----------------------------------------------------------------------------------|---------------------------------------------------------------------------------|---------|-----|
102+
| TextVQA | textvqa | vqa | [TextVQA 0.5 ImDB](https://dl.fbaipublicfiles.com/pythia/data/imdb/textvqa_0.5.tar.gz) | [OpenImages](https://dl.fbaipublicfiles.com/pythia/features/open_images.tar.gz) | `b22e80997b2580edaf08d7e3a896e324` ||
103+
| VQA 2.0 | vqa2 | vqa | [VQA 2.0 ImDB](https://dl.fbaipublicfiles.com/pythia/data/imdb/vqa.tar.gz) | [COCO](https://dl.fbaipublicfiles.com/pythia/features/coco.tar.gz) | `ab7947b04f3063c774b87dfbf4d0e981` ||
104+
| VizWiz | vizwiz | vqa | [VizWiz ImDB](https://dl.fbaipublicfiles.com/pythia/data/imdb/vizwiz.tar.gz) | [VizWiz](https://dl.fbaipublicfiles.com/pythia/features/vizwiz.tar.gz) | `9a28d6a9892dda8519d03fba52fb899f` ||
105+
| VisualDialog | visdial | dialog | Coming soon! | Coming soon! | Coming soon! | |
106+
| VisualGenome | visual_genome | vqa | Automatically downloaded | Automatically downloaded | Coming soon! | Also supports scene graphs|
107+
| CLEVR | clevr | vqa | Automatically downloaded | Automatically downloaded | | |
108+
| MS COCO | coco | captioning | [COCO Caption](https://dl.fbaipublicfiles.com/pythia/data/imdb/coco_captions.tar.gz) | [COCO](https://dl.fbaipublicfiles.com/pythia/features/coco.tar.gz) | `ab7947b04f3063c774b87dfbf4d0e981`| |
107109

108110
After downloading the features, verify the download by checking the md5sum using
109111

@@ -119,8 +121,9 @@ supported by the models in Pythia's model zoo.
119121

120122
| Model | Key | Supported Datasets | Pretrained Models | Notes |
121123
|--------|-----------|-----------------------|-------------------|-----------------------------------------------------------|
122-
| Pythia | pythia | vqa2, vizwiz, textvqa | [vqa2 train+val](https://dl.fbaipublicfiles.com/pythia/pretrained_models/vqa2/pythia_train_val.pth), [vqa2 train only](https://dl.fbaipublicfiles.com/pythia/pretrained_models/vqa2/pythia.pth), [vizwiz](https://dl.fbaipublicfiles.com/pythia/pretrained_models/vizwiz/pythia_pretrained_vqa2.pth) | VizWiz model has been pretrained on VQAv2 and transferred |
124+
| Pythia | pythia | vqa2, vizwiz, textvqa, visual_genome | [vqa2 train+val](https://dl.fbaipublicfiles.com/pythia/pretrained_models/vqa2/pythia_train_val.pth), [vqa2 train only](https://dl.fbaipublicfiles.com/pythia/pretrained_models/vqa2/pythia.pth), [vizwiz](https://dl.fbaipublicfiles.com/pythia/pretrained_models/vizwiz/pythia_pretrained_vqa2.pth) | VizWiz model has been pretrained on VQAv2 and transferred |
123125
| LoRRA | lorra | vqa2, vizwiz, textvqa | [textvqa](https://dl.fbaipublicfiles.com/pythia/pretrained_models/textvqa/lorra_best.pth) | |
126+
| CNN LSTM | cnn_lstm | clevr | | Features are calculated on fly. |
124127
| BAN | ban | vqa2, vizwiz, textvqa | Coming soon! | Support is preliminary and haven't been tested thoroughly. |
125128
| BUTD | butd | coco | [coco](https://dl.fbaipublicfiles.com/pythia/pretrained_models/coco_captions/butd.pth) | |
126129

docs/source/tutorials/concepts.md

Lines changed: 37 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -31,33 +31,37 @@ to refer it in the command line arguments.
3131
Following table shows the tasks and their datasets:
3232

3333
```eval_rst
34-
+--------+------------+------------------------+
35-
|**Task**| **Key** | **Datasets** |
36-
+--------+------------+------------------------+
37-
| VQA | vqa | VQA2.0, VizWiz, TextVQA|
38-
+--------+------------+------------------------+
39-
| Dialog | dialog | VisualDialog |
40-
+--------+------------+------------------------+
41-
| Caption| captioning | MS COCO |
42-
+--------+------------+------------------------+
34+
+--------+------------+---------------------------------------------+
35+
|**Task**| **Key** | **Datasets** |
36+
+--------+------------+---------------------------------------------+
37+
| VQA | vqa | VQA2.0, VizWiz, TextVQA, VisualGenome, CLEVR|
38+
+--------+------------+---------------------------------------------+
39+
| Dialog | dialog | VisualDialog |
40+
+--------+------------+---------------------------------------------+
41+
| Caption| captioning | MS COCO |
42+
+--------+------------+---------------------------------------------+
4343
```
4444

4545
Following table shows the inverse of the above table, datasets along with their tasks and keys:
4646

4747
```eval_rst
48-
+--------------+---------+-----------+--------------------+
49-
| **Datasets** | **Key** | **Task** |**Notes** |
50-
+--------------+---------+-----------+--------------------+
51-
| VQA 2.0 | vqa2 | vqa | |
52-
+--------------+---------+-----------+--------------------+
53-
| TextVQA | textvqa | vqa | |
54-
+--------------+---------+-----------+--------------------+
55-
| VizWiz | vizwiz | vqa | |
56-
+--------------+---------+-----------+--------------------+
57-
| VisualDialog | visdial | dialog | Coming soon! |
58-
+--------------+---------+-----------+--------------------+
59-
| MS COCO | coco | captioning| |
60-
+--------------+---------+-----------+--------------------+
48+
+--------------+---------------+-----------+--------------------+
49+
| **Datasets** | **Key** | **Task** |**Notes** |
50+
+--------------+---------------+-----------+--------------------+
51+
| VQA 2.0 | vqa2 | vqa | |
52+
+--------------+---------------+-----------+--------------------+
53+
| TextVQA | textvqa | vqa | |
54+
+--------------+---------------+-----------+--------------------+
55+
| VizWiz | vizwiz | vqa | |
56+
+--------------+---------------+-----------+--------------------+
57+
| VisualDialog | visdial | dialog | Coming soon! |
58+
+--------------+---------------+-----------+--------------------+
59+
| VisualGenome | visual_genome | vqa | |
60+
+--------------+---------------+-----------+--------------------+
61+
| CLEVR | clevr | vqa | |
62+
+--------------+---------------+-----------+--------------------+
63+
| MS COCO | coco | captioning| |
64+
+--------------+---------------+-----------+--------------------+
6165
```
6266

6367
## Models
@@ -75,17 +79,17 @@ reference in configuration and command line arguments. Following table shows eac
7579
key name and datasets it can be run on.
7680

7781
```eval_rst
78-
+-----------+---------+-----------------------+
79-
| **Model** | **Key** | **Datasets** |
80-
+-----------+---------+-----------------------+
81-
| LoRRA | lorra | textvqa, vizwiz |
82-
+-----------+---------+-----------------------+
83-
| Pythia | pythia | textvqa, vizwiz, vqa2 |
84-
+-----------+---------+-----------------------+
85-
| BAN | ban | textvqa, vizwiz, vqa2 |
86-
+-----------+---------+-----------------------+
87-
| BUTD | butd | coco |
88-
+-----------+---------+-----------------------+
82+
+-----------+---------+--------------------------------------+
83+
| **Model** | **Key** | **Datasets** |
84+
+-----------+---------+--------------------------------------+
85+
| LoRRA | lorra | textvqa, vizwiz |
86+
+-----------+---------+--------------------------------------+
87+
| Pythia | pythia | textvqa, vizwiz, vqa2, visual_genome |
88+
+-----------+---------+--------------------------------------+
89+
| BAN | ban | textvqa, vizwiz, vqa2 |
90+
+-----------+---------+--------------------------------------+
91+
| BUTD | butd | coco |
92+
+-----------+---------+--------------------------------------+
8993
```
9094

9195
```eval_rst

docs/source/tutorials/pretrained_models.md

Lines changed: 13 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -6,17 +6,19 @@ predictions for EvalAI evaluation. This section expects that you have already in
66
required data as explained in [quickstart](./quickstart).
77

88
```eval_rst
9-
+--------+-----------+-----------------------+---------------------------------------------------+-----------------------------------------------------------+
10-
| Model | Model Key | Supported Datasets | Pretrained Models | Notes |
11-
+--------+-----------+-----------------------+---------------------------------------------------+-----------------------------------------------------------+
12-
| Pythia | pythia | vqa2, vizwiz, textvqa | `vqa2 train+val`_, `vqa2 train only`_, `vizwiz`_ | VizWiz model has been pretrained on VQAv2 and transferred |
13-
+--------+-----------+-----------------------+---------------------------------------------------+-----------------------------------------------------------+
14-
| LoRRA | lorra | vqa2, vizwiz, textvqa | `textvqa`_ | |
15-
+--------+-----------+-----------------------+---------------------------------------------------+-----------------------------------------------------------+
16-
| BAN | ban | vqa2, vizwiz, textvqa | Coming soon! | Support is preliminary and haven't been tested throughly. |
17-
+--------+-----------+-----------------------+---------------------------------------------------+-----------------------------------------------------------+
18-
| BUTD | butd | coco | `coco`_ | |
19-
+--------+-----------+-----------------------+---------------------------------------------------+-----------------------------------------------------------+
9+
+--------+-----------+---------------------------------------+---------------------------------------------------+-----------------------------------------------------------+
10+
| Model | Model Key | Supported Datasets | Pretrained Models | Notes |
11+
+--------+-----------+---------------------------------------+---------------------------------------------------+-----------------------------------------------------------+
12+
| Pythia | pythia | vqa2, vizwiz, textvqa, visual_genome, | `vqa2 train+val`_, `vqa2 train only`_, `vizwiz`_ | VizWiz model has been pretrained on VQAv2 and transferred |
13+
+--------+-----------+---------------------------------------+---------------------------------------------------+-----------------------------------------------------------+
14+
| LoRRA | lorra | vqa2, vizwiz, textvqa | `textvqa`_ | |
15+
+--------+-----------+---------------------------------------+---------------------------------------------------+-----------------------------------------------------------+
16+
| CNNLSTM| cnn_lstm | clevr | | Features are calculated on fly in this on |
17+
+--------+-----------+---------------------------------------+---------------------------------------------------+-----------------------------------------------------------+
18+
| BAN | ban | vqa2, vizwiz, textvqa | Coming soon! | Support is preliminary and haven't been tested throughly. |
19+
+--------+-----------+---------------------------------------+---------------------------------------------------+-----------------------------------------------------------+
20+
| BUTD | butd | coco | `coco`_ | |
21+
+--------+-----------+---------------------------------------+---------------------------------------------------+-----------------------------------------------------------+
2022
2123
.. _vqa2 train+val: https://dl.fbaipublicfiles.com/pythia/pretrained_models/vqa2/pythia_train_val.pth
2224
.. _vqa2 train only: https://dl.fbaipublicfiles.com/pythia/pretrained_models/vqa2/pythia.pth

0 commit comments

Comments
 (0)