Note: This codebase has only been tested on Debian-based Linux systems. CUDA is required to install the dependencies and run experiments.
The Open-Insect dataset with GBIF images is publicly avaiable at Open-Insect on hugggingface.
To download the images and generate metadata for training:
- Change
download_dirto the directory where you want the downloaded dataset to be saved. - The
resize_sizeis the smaller edge of the image after resizing. Changeresize_sizeaccordingly. The default value is 224. - If you do not want to resize the images, simply delete
--resize_size 224from the command. Without resizing, the downloaded images will require approximately 6TB of storage. - Run
bash download.sh
Once downloading finishes,
- Images will be saved under
<download_dir>/images. - Metadata for training and evaluation of each region will be saved as
<download_dir>/metadata/<region> │ ├── test_id.csv │ ├── test_ood_local.csv │ ├── test_ood_non-local.csv │ ├── test_ood_non-moth.csv │ ├── train_aux.csv │ ├── train_id.csv │ ├── val_id.csv │ └── val_ood.csv - Change
data_dir,imglist_pth, andpre_sizein the configs underconfigs/datasetsaccordingly before training or evaluation.
The C-America O-BCI dataset is hosted separately at Open-Insect-BCI on huggingface.
Run
python download_bci.py --download_dir .
to download the BCI dataset to the current directory, or change the download_dir accordingly.
- Images will be saved under
<download_dir>/images/bci. - Metadata will be saved as
<dowload_dir>/metadata/c-america/test_ood_bci.txt.
Run the following commands to install dependencies.
conda create -n oi_env python=3.10
conda activate oi_env
pip install -e .
pip install libmr
The default batch size is 512 and the number of works is 16. With this setting, models can be trained with 1 RTX800 GPU with 48 GB memory, 16 CPUs (and 16 workers), and 100 GB CPU memory in total.
| Argument | Description | Possible Values / Examples |
|---|---|---|
REGION |
Specifies the geographical region of the dataset to evaluate. | ne-america (Northeastern America), w-europe (Western Europe), c-america (Central America) |
METHOD |
The training method. | See the list METHOD Options below. |
POSTHOC_METHOD |
The post-hoc open-set detection method applied to the trained classifier. | See the list POSTHOC_METHOD Options below. |
NETWORK |
The backbone network used in the model. | See the list NETWORK Options below. |
CHECKPOINT_DIR |
Path to the directory to save the trained model checkpoints. | Example: $HOME/weights |
METHOD Options (See Table 2 in the paper for more details of the methods)
basics- the basic classifier trained with Cross Entropy loss with only the closed-setconf_branch- ConfBranchlogitnorm- LogitNormgodin- GODINrotpred- RotPredoe- OEudg- UDGmixoe- MixOEenergy- Energyextended- Extendednovel_branch- NovelBranch
POSTHOC_METHOD Options (See Table 2 in the paper for more details of the post-hoc methods)
- Generic post-hoc methods
openmax— OpenMaxmsp— MSPtemperature_scaling— TempScaleodin— ODINmds— MDSmds_ensemble— MDSEnsrmds— RMDSgram— Gramebo— EBOgradnorm— GradNormreact- ReActmls— MLSklm— KLMvim— VIMknn— k-Nearest Neighbor in feature spacedice- DICErankfeat- RankFeatash— ASHshe— SHEneco— NECOfdbd— FDBDrp_msp,rp_odin,rp_ebo,rp_gradnorm- RP_MSP, RP_ODIN, RP_EBO, RP_GradNormnci— NCI
- Post-hoc methods for a specific training method
conf_branch- To be used withMETHOD:conf_branchgodin- To be used withMETHOD:godinrotpred- To be used withMETHOD:rotpred
NETWORK Options
conf_branch- forMETHOD:conf_branchgodin_net- forMETHOD:godinrot_net- forMETHOD:rotpredudg_net- forMETHOD:udgextended_net- forMETHOD:extendedornovel_branchresnet50- for all other methods
Run the following command to train from scratch. The model checkpoint will be saved in ${CHECKPOINT_DIR}/${REGION}/${METHOD}/train_from_scratch/s${RANDOM_SEED}.
bash scripts/train.sh REGION METHOD NETWORK CHECKPOINT_DIR
Run the following command to fine-tune the CHECKPOINT. The fine-tuned model will be saved in CHECKPOINT_DIR.
bash scripts/finetune.sh REGION METHOD NETWORK CHECKPOINT_DIR CHECKPOINT
bash scripts/eval_arpl.sh REGION arpl msp arpl_net CHECKPOINT_DIR
bash scripts/eval_opengan.sh REGION opengan opengan resnet50 CHECKPOINT_DIR
For all other methods, run
bash scripts/eval.sh REGION METHOD POSTHOC_METHOD NETWORK CHECKPOINT_DIR
For example, to evaluate the basic classifier for Central America with MSP using the checkpoint saved under $HOME/weights, run
bash scripts/eval.sh c-america basics msp resnet50 $HOME/weights
See scripts/test_eval_script.sh for more examples.
- OpenMax: This method requires predictions of the test set to cover all training species. Otherwise, the following error will occur:
RuntimeError: torch.cat(): expected a non-empty list of Tensors.
Checkpoints can be downloaded from https://huggingface.co/yuyan-chen/open-insect-model-weights or by running
python download_pretrained_weights.py --weight_dir WEIGHT_DIR
Here are some minimal examples to test this codebase. First, activate the virtual environment by
conda activate oi_env
then download the pretrained weights by running
python download_pretrained_weights.py
The model should be saved under weights/c-america_resnet50_baseline.pth.
To test training methods that do no require auxiliary data, run
bash scripts/examples/train.sh
To test training methods that require auxiliary data, run
bash scripts/examples/train_with_aux_data.sh
The training and validation accuracy are expected to be 0 after 2 epochs as there is only 1 image per speices in the training set, and the model is trained from scratch.
Run
bash scripts/examples/eval.sh
The output will be saved as output/open-insect-example/base/msp.csv. You can compare the output with output/open-insect-example/base/msp_expected.csv.
This codebase is built using OpenOOD. We sincerely appreciate their efforts in making this valuable resource publicly available.