Accuracy benchmarks

The code in benchmarks/accuracy and the module src/AccuracyBenchmark.jl supports a variety of possible benchmarks of Celeste's accuracy.

Overview

Each accuracy benchmark consists of running Celeste on a single field images (five images techincally, one for each band) and comparing inferred parameters for all sources present to known "ground truth" values. Here are the components of an accuracy benchmark:

We start with a ground truth catalog.
We get a set of band images corresponding to this catalog.
We run Celeste on these images, with particular initialization values corresponding to the given images, generating a predition catalog.
We compare one or more prediction catalogs to the ground truth, summarizing accuracy for each parameter.

All catalogs are stored in a common CSV format; see AccuracyBenchmark.{read,write}_catalog().

At each step we have some choices:

Ground truth catalog

The command

$ julia write_ground_truth_catalog_csv.jl [coadd|prior]

writes a ground truth catalog to the output subdirectory.

coadd uses the SDSS Stripe82 "coadd" catalog, already pulled from the SDSS CasJobs server (in FITS format). By default, this uses the coadd file for the 4263/5/119 RCF stored under test/data, but you can specify a path manually as the next command-line argument. (Use make RUN=... CAMCOL=... FIELD=... in the test/data directory to pull these files from the SDSS server.)
prior draws 500 random sources from the Celeste prior (with a few added prior distributions for parameters which don't have a prior specified in Celeste).

Imagery

For Stripe82, one can use real SDSS imagery which has already been downloaded (under test/data).

For any ground truth catalog, one can generate synthetic imagery in two ways:

Using GalSim, with the benchmark/galsim/galsim_field.py script. See the README in that directory for more details (setting up GalSim is nontrivial). This process will generate a FITS file under benchmark/galsim/output.
By drawing from the Celeste likelihood model (implemented in src/Synthetic.jl), and inserting synthetic light sources into template imagery (metadata only, pixels are all new), using the command
```
$ julia generate_synthetic_field.jl <ground truth CSV>
```

Reading the Stripe82 "primary" catalog

The command

$ julia sdss_rcf_to_csv.jl

will read the Stripe82 "primary" catalog (from pre-downloaded FITS files) and write it in CSV form to the output subdirectory. This is useful for two things:

Initializing Celeste for a run on Stripe82 imagery.
Comparing Celeste's accuracy to the "primary" catalog.

You can also specify a different RCF and omit the ground truth catalog, to run Celeste outside of Stripe82 and compare to primary.

Running Celeste

The script run_celeste_on_field.jl will run Celeste on given images, writing predictions to a new catalog under the output subdirectory.

The default behavior is to read Stripe82 SDSS images. You can specify a JLD file containing imagery to use instead with --images-jld <filename>.
By default, Celeste detects sources on the images. To skip this and initialize sources from an existing CSV catalog, use --initialization-catalog <filename>. Sources will be initialized only with a noisy position, so you can pass a ground truth catalog for synthetic imagery without "cheating". Alternatively, if you pass --use-full-initialization, Celeste will be initialized with all information from the given catalog.
The script supports single (default) or joint inference (--joint).
The script writes predictions in the common catalog CSV format.

Scoring accuracy

The command

$ julia benchmark/accuracy/score_predictions.jl \
    <ground truth CSV> <predictions CSV> [predictions CSV]

compares one or two prediction catalogs to a ground truth catalog, summarizing their performance (and comparing to each other, if two are given). You can also use

$ julia benchmark/accuracy/score_uncertainty.jl \
    <ground truth CSV> <Celeste predictions CSV>

to examine the distribution of errors relative to posterior SDs (for those parameters with a posterior distribution).

Examples

Here are some examples of use (commands are relative to the benchmark/accuracy directory):

To run Celeste on Stripe82 real imagery using "primary" predictions for initialization, as in real runs, and compare Celeste to Stripe82 primary accuracy:

$ julia write_ground_truth_catalog_csv.jl coadd
$ julia sdss_rcf_to_csv.jl \
    --objid-csv output/coadd_for_4263_5_119_<hash>.csv
$ julia run_celeste_on_field.jl --use-full-initialization \
    output/sdss_4263_5_119_primary_<hash>.csv
$ julia score_predictions.jl \
    output/coadd_for_4263_5_119_<hash>.csv \
    output/sdss_4263_5_119_primary_<hash>.csv \
    output/sdss_4263_5_119_predictions_<hash>.csv
$ julia score_uncertainty.jl \
    output/coadd_for_4263_5_119_<hash>.csv \
    output/sdss_4263_5_119_predictions_<hash>.csv

To run Celeste on GalSim imagery from a "prior" ground truth catalog, using partial information from the ground truth catalog for initialization, and compare single to joint inference:

$ julia write_ground_truth_catalog_csv.jl prior
# go to benchmark/galsim/ and generate synthetic imagery from the above-generated catalog
$ julia run_celeste_on_field.jl \
    output/prior_<hash>.csv
    --images-jld output/prior_<hash>_synthetic_<hash>.jld
$ julia run_celeste_on_field.jl \
    output/prior_<hash>.csv
    --images-jld output/prior_<hash>_synthetic_<hash>.jld
    --joint
$ julia score_predictions.jl \
    output/prior_<hash>.csv \
    output/prior_<hash>_images_<hash>_predictions_<first hash>.csv
    output/prior_<hash>_images_<hash>_predictions_<second hash>.csv

To run Celeste on another SDSS RCF using "primary" predictions both for initialization and as ground truth:

$ julia sdss_rcf_to_csv.jl
$ julia run_celeste_on_field.jl --use-full-initialization \
    output/sdss_4263_5_119_primary_<hash>.csv
$ julia score_predictions.jl \
    output/sdss_4263_5_119_primary_<hash>.csv \
    output/sdss_4263_5_119_predictions_<hash>.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Accuracy benchmarks

Overview

Ground truth catalog

Imagery

Reading the Stripe82 "primary" catalog

Running Celeste

Scoring accuracy

Examples

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally