Tutorial

日本語

In this tutorial is described a concrete example usage of the analysis pipeline distributed at the HSC Pipeline Binary Distribution site. Before reading this page, install the pipeline first.

Emplace images

Crate a data folder

Create a folder at a place you like. All input data and products will be placed in it. I hereafter assume it ~/HSC. To create it, use the script create-rootdir.sh instead of mkdir, for a few files have to be put in the data folder at this stage.

[user@example ~]$ # Set up hscPipe
[user@example ~]$ setup-hscpipe
[user@example ~]$ create-rootdir.sh ~/HSC /path/to/ps1_pv3_3pi_20170110 /path/to/STRAY_LIGHT

↑ The second argument ps1_pv3_3pi_20170110 is the path to the reference catalog folder. This path (REFDIR) is such that REFDIR/{number}.fits and REFDIR/config.py exists. This folder (REFDIR) must be named ps1_pv3_3pi_20170110 since the pipeline assumes it by default.

↑ The third argument STRAY_LIGHT is the path to the y-band stray light file. You can omit this argument in this tutorial since the input data does not contain y-band images.

In addition, execute the following command to install transmission curves.

[user@example ~]$ installTransmissionCurves.py ~/HSC

Download input images

Download images for tutorial from the data list in the pipeline distribution site. When the download completes, expand the archive in an arbitrary directory:

[user@example ~]$ tar xvaf rawdata.tar.xz

Move the images

Emplace the downloaded images in ~/HSC:

[user@example ~]$ ## Don't forget to set up hscPipe if you haven't.
[user@example ~]$ # setup-hscpipe
[user@example ~]$ cd rawdata/
[user@example rawdata]$ # Move the images
[user@example rawdata]$ hscIngestImages.py ~/HSC  */*.fits --mode=move

↑ The destination path (~/HSC) must be an absolute path. Otherwise the command may not work.

Verify

The images moved to ~/HSC have been registered in ~/HSC/registry.sqlite3. Look into it to check what images there are.

[user@example ~]$ sqlite3 ~/HSC/registry.sqlite3
sqlite> .header on
sqlite> SELECT visit, filter, field, count(visit)
    FROM raw
    GROUP BY visit, filter, field;
sqlite> .q

If you get something like the following list, the emplacement was successful.

visit|filter|field|count(visit)
905496|HSC-G|DEN_C|112
905498|HSC-G|DEN_C|112
905500|HSC-G|DEN_C|112
905502|HSC-G|DEN_C|112
905504|HSC-G|DEN_C|112
907002|HSC-G|DOMEFLAT|112
907004|HSC-G|DOMEFLAT|112
907006|HSC-G|DOMEFLAT|112
907008|HSC-G|DOMEFLAT|112
907010|HSC-G|DOMEFLAT|112

↑If you see smaller figures in the rightmost column, some images are lacking.

You can notice that there are DOMEFLATs with visit=907002..907010:2 (“first..last:interval”), and DEN_Cs with visit=905496..905504:2.

If you want to know the sqlite3 command, google SQLite or SQL.

Create data for calibration

Create flat data

From the dome flats create flat data. (As for the visit numbers of the dome flats, see above)

[user@example ~]$ constructFlat.py $HOME/HSC \
    --rerun=tutorial \
    --calib=$HOME/HSC/CALIB \
    --id visit=907002..907010:2 $(: or --id field=DOMEFLAT filter=HSC-G) \
    --config \
        isr.doBias=False \
        isr.doDark=False \
    --batch-type=smp --cores=4
List of outputs

↑ "--cores=4" signifies “ by 4 threads.” Dividing the amount of memory by 4GB provides a good indication for the number of threads.

Pass an arbitrary rerun name into --rerun. Data analysis goes in sequence: reduce A to B, then B to C, ... We call this sequence A→B→C→... “rerun”.

For further explanation of options, see Command line options.

The averaged flat data are written under the directory you passed in by --rerun=. To be more precise:

[user@example ~]$ ls $HOME/HSC/rerun/tutorial/FLAT/2014-02-03/HSC-G
FLAT-2014-02-03-HSC-G-000.fits  FLAT-2014-02-03-HSC-G-056.fits
FLAT-2014-02-03-HSC-G-001.fits  FLAT-2014-02-03-HSC-G-057.fits
FLAT-2014-02-03-HSC-G-002.fits  FLAT-2014-02-03-HSC-G-058.fits
             :                               :

Normally, we should look into each of the FITS files, but let us minimally check that all of the 112 CCDs have been output:

[user@example ~]$ ls $HOME/HSC/rerun/tutorial/FLAT/2014-02-03/HSC-G/FLAT-*.fits |
    wc -l
112

Under $HOME/HSC/rerun/tutorial have also been output intermediate files in addition to these final outputs. Look into them if you are interested.

After constructing flat, you have to register it in order that the analysis pipeline can find it in subsequent stages.

[user@example ~]$ ingestCalibs.py $HOME/HSC \
    @<(find $HOME/HSC/rerun/tutorial/FLAT -name "*.fits") \
    --calib=$HOME/HSC/CALIB \
    --validity=100 \
    --mode=link

--validity=100 means that this flat can be used to process exposures taken ±100 days around the date of the flat.

That is all for flat-making. We have here neglected biases, darks and fringes. You may create them in need with similar commands (constructBias.py, constructDark.py, constructFringe.py)

Create sky background

We tend to overestimate the sky level around large galaxies. Too high sky subtracted results in artificial moats encircling such galaxies. We will fill up these moats in the stacking step, and in so doing we need to re-estimate the sky level with a very large mesh. This process needs an average sky pattern pre-computed. We call it "a global sky pattern".

We create a global sky pattern from object frames, or images that record the actual starry sky. Invoke the following program. (As for the range of visit, see above)

[user@example ~]$ constructSky.py $HOME/HSC \
    --rerun=tutorial \
    --calib=$HOME/HSC/CALIB \
    --id visit=905496..905504:2 $(: or --id field=DEN_C)\
    --config \
        isr.doBias=False \
        isr.doDark=False \
    --batch-type smp --cores=4

After constructing sky, you have to register it in order that the analysis pipeline can find it in subsequent stages.

[user@example ~]$ ingestCalibs.py $HOME/HSC \
    @<(find $HOME/HSC/rerun/tutorial/SKY -name "*.fits") \
    --calib=$HOME/HSC/CALIB \
    --validity=100 \
    --mode=link

Analyze individual CCD images

singleFrameDriver

For users familiar with old versions of the pipeline: We no longer need to setup the reference catalog. create-rootdir.sh has already created a symbolic link to the reference catalog.

Invoke the analysis program that processes CCD images as below. (As for the range of visit, see above)

[user@example ~]$ singleFrameDriver.py $HOME/HSC \
    --rerun=tutorial \
    --calib=$HOME/HSC/CALIB \
    --id visit=905496..905504:2 $(: or --id field=DEN_C)\
    --config \
        processCcd.isr.doBias=False \
        processCcd.isr.doDark=False \
    --batch-type=smp --cores=4
List of outputs

↑ If you are using the binary distributed in this site, and if this command do not work with error messages “pthread_create ...”, then try env OMP_NUM_THREADS=1 singleFrameDriver.py $HOME/HSC... instead of the command above.

For further explanation of options, see Command line options.

When singleFrameDriver.py finishes, its output files are under $HOME/HSC/rerun/tutorial. The images: ↓

[user@example ~]$ cd $HOME/HSC/rerun/tutorial/00762/HSC-G/corr/
[user@example corr]$ ls CORR-*
CORR-0905496-000.fits  CORR-0905498-070.fits  CORR-0905502-036.fits
CORR-0905496-001.fits  CORR-0905498-071.fits  CORR-0905502-037.fits
CORR-0905496-002.fits  CORR-0905498-072.fits  CORR-0905502-038.fits
    :                      :                       :

And the object catalog (FITS BinTable) ↓

[user@example ~]$ cd $HOME/HSC/rerun/tutorial/00762/HSC-G/output/
[user@example output]$ ls SRC-*
SRC-0905496-000.fits  SRC-0905498-070.fits  SRC-0905502-036.fits
SRC-0905496-001.fits  SRC-0905498-071.fits  SRC-0905502-037.fits
SRC-0905496-002.fits  SRC-0905498-072.fits  SRC-0905502-038.fits
   :                     :                     :

We should look into these things by our own eyes, but let us only check the number of files here, too.

[user@example ~]$ ls $HOME/HSC/rerun/tutorial/00762/HSC-G/output/SRC-*.fits |
    wc -l
520

There are only 520 = 5×104 files.

The number of CCDs per shot, excluding those for auto-focus, is not 112 but 104. Therefore this is normal.

Sky correction

We re-estimate the sky level here. See this section for explanation.

[user@example ~]$ skyCorrection.py $HOME/HSC \
    --rerun=tutorial \
    --calib=$HOME/HSC/CALIB \
    --id visit=905496..905504:2 $(: or --id field=DEN_C)\
    --batch-type=smp --cores=4

The output from this command will be used as the input of stacking.

Stack

Create a skymap

Before mosaicking, create a “tract”. A tract means a region of the sky that is (gnomonically) projected to a single plane. Different tracts are projected to different planes, so that they have different WCS.

In a wide survey, we cut the sky into predefined chunks, and treat each chunk as a tract. The collection of the tracts are called a "sky map". But it is inconvenient for us to use tracts that are cut out of the sky simply offhand, when we want to make a mosaic from data obtained by dithering a small region about a field of view. The region would probably be separated into several tracts.

And so, we will define a special tract that encompasses all the images we have to make a mosaic from. The sky map comprising such tracts is called "Discrete sky map".

[user@example ~]$ makeDiscreteSkyMap.py $HOME/HSC \
    --rerun=tutorial \
    --calib=$HOME/HSC/CALIB \
    --id visit=905496..905504:2 $(: or --id field=DEN_C)
List of outputs

↑ If the following message is output finally:

makeDiscreteSkyMap INFO: tract 0 has corners (134.970, 22.961), (132.538, 22.961), (132.517, 25.201), (134.992, 25.201) (RA, Dec deg) and 12 x 12 patches

then tract 0 is defined properly.

Mosaic

Make a mosaic of tract 0. Mosaicking only determines positions, but it does not create images.

[user@example ~]$ mosaic.py $HOME/HSC \
    --rerun=tutorial \
    --calib=$HOME/HSC/CALIB \
    --id visit=905496..905504:2 tract=0 $(: or --id field=DEN_C filter=HSC-G tract=0)
List of outputs

Calibrate images

By mosaicking, magnitude zero-points and WCS have been determined. We make CCD images that include the calibration. (You can skip this stage if you do not need calibrated CCD images.)

[user@example ~]$ calibrateExposure.py $HOME/HSC \
    --rerun=tutorial \
    --calib=$HOME/HSC/CALIB \
    --id visit=905496..905504:2 tract=0 $(: or --id field=DEN_C tract=0) \
    -j 4 # by 4 threads
List of outputs

↑In so doing, warnings might occur “NoResults: No locations for get (data that are on the edge of the FOV)” This means that mosaicking could not determine the positions of those CCDs on the edge of the tract. We can ignore it.

The calibrated images are output in↓

[user@example ~]$ ls $HOME/HSC/rerun/tutorial/00762/HSC-G/corr/0000/CALEXP-*.fits

Calibrate catalogs

Make catalogs that include the calibration, in the same way. (You can skip this stage if you do not need calibrated catalogs.)

[user@example ~]$ calibrateCatalog.py $HOME/HSC \
    --rerun=tutorial \
    --calib=$HOME/HSC/CALIB \
    --id visit=905496..905504:2 tract=0 $(: or --id field=DEN_C tract=0) \
    --config doApplyCalib=False \
    -j 4 # by 4 threads
List of outputs

The calibrated catalogs are output in↓

[user@example ~]$ ls $HOME/HSC/rerun/tutorial/00762/HSC-G/output/0000/CALSRC-*.fits

Stack

Though mosaicking has been done, coadd images have not yet been made. We make coadd images by calling stack.py

[user@example ~]$ coaddDriver.py $HOME/HSC \
    --rerun=tutorial \
    --calib=$HOME/HSC/CALIB \
    --id filter=HSC-G tract=0 \
    --selectId visit=905496..905504:2 ccd=0..103 $(: or --selectId field=DEN_C filter=HSC-G) \
    --batch-type=smp --cores=4
List of outputs

The coadd images are output in↓

[user@example ~]$ cd $HOME/HSC/rerun/tutorial/deepCoadd-results/HSC-G/0
[user@example 0]$ ls */calexp-*.fits
1,3/calexp-HSC-G-0-1,3.fits    5,3/calexp-HSC-G-0-5,3.fits
1,4/calexp-HSC-G-0-1,4.fits    5,4/calexp-HSC-G-0-5,4.fits
1,5/calexp-HSC-G-0-1,5.fits    5,5/calexp-HSC-G-0-5,5.fits
            :                              :

This many files are created though only the tract 0 is stacked. In fact, a tract is split into small patches since the tract is so large that it can hardly be loaded into memory. Because the patches are simply split from a single large image, we can, using ds9 for example, view all the patches joined at a time (Beware that it requires more than 16GB of memory)。

[user@example 0]$ ds9 -mosaic wcs */calexp-*.fits

The whole tract will look like: stacked image (8MB; downscaled by 102)

You can see that "det-*.fits" files coexist with the image files "calexp-*.fits". ↓

[user@example ~]$ cd $HOME/HSC/rerun/tutorial/deepCoadd-results/HSC-G/0
[user@example 0]$ ls */det-*.fits
1,3/det-HSC-G-0-1,3.fits    5,3/det-HSC-G-0-5,3.fits
1,4/det-HSC-G-0-1,4.fits    5,4/det-HSC-G-0-5,4.fits
1,5/det-HSC-G-0-1,5.fits    5,5/det-HSC-G-0-5,5.fits
            :                              :

These files contain "detections" of objects in the stacked images. The objects have not been measured for now. They will be measured in the "multiband analysis" below.

Multiband analysis

Though the tutorial data contains only a single band, we force our way to multiband analysis: for this is a tutorial. Let us merge respective detections of objects that have been found in multiple bands, and measure the merged detections.

[user@example ~]$ multiBandDriver.py $HOME/HSC \
    --rerun=tutorial \
    --calib=$HOME/HSC/CALIB \
    --id tract=0 filter=HSC-G \
        $(: filter=HSC-G^HSC-I^HSC-Z e.g. if there are several bands) \
    --batch-type=smp --cores=4
List of outputs

The result of measuring each band independently with a common merged catalog↓

[user@example ~]$ ls $HOME/HSC/rerun/tutorial/deepCoadd-results/HSC-G/0/*/meas-*.fits

The result of measuring each band with positions (etc.) of objects fixed at common values↓

[user@example ~]$ ls $HOME/HSC/rerun/tutorial/deepCoadd-results/HSC-G/0/*/forced_src-*.fits

That is all.

(Excursus) Example of a script

Suppose you want a large, whole image instead of chopped-up patches. Since hscPipe is a huge aggregate of Python libraries, you can use the libraries in Python scripts.

As an example, I paste below a script to make a large image from patches.

#!/usr/bin/env python
# encoding: utf-8

# A library of AFW (Application FrameWork), related to images
import lsst.afw.image as afwImage
# A library of AFW, related to geometry
import lsst.afw.geom  as afwGeom

import sys

def main():
    """
    Arrange patches to make a large image, and save it in "large.fits"

    make-large-image PATCHES...
        (e.g. make-large-image *,*.fits)
    """

    if len(sys.argv) <= 1:
        print "Input 1 patch or more."
        return 1

    largeImage = makeLargeImageFromPatches(sys.argv[1:])

    print "Saving (Wait a while)..."
    largeImage.writeFits("large.fits")


def makeLargeImageFromPatches(filenames):
    """
    @param filenames List of file names
    @return afwImage.ExposureF
    """

    # Loading all image would be dangerous by memory limitation.
    # Load metric information only.
    bboxes = []
    for filename in filenames:
        print "Loading a bounding box:", filename, "..."
        bboxes.append(
            # Get the bounding box of this patch
            # in the parent (= the tract)
            afwImage.ExposureF(filename).getBBox(afwImage.PARENT)
        )

    # Calculate the bbox that includes all the bboxes.
    #     Return values of getMin/Max method family are inclusive
    #     (range = [min, max]) cf. getBegin/End (used below)
    x0 = min(bbox.getMinX() for bbox in bboxes)
    y0 = min(bbox.getMinY() for bbox in bboxes)
    x1 = max(bbox.getMaxX() for bbox in bboxes)
    y1 = max(bbox.getMaxY() for bbox in bboxes)

    width  = x1 - x0 + 1
    height = y1 - y0 + 1

    # Prepare a large image
    print "Preparing a large image: ({}, {})".format(width, height)
    exposure = afwImage.ExposureF(width, height)

    # Set the position of this image in the tract
    exposure.setXY0(afwGeom.Point2I(x0, y0))

    # Extract the image data from the exposure
    largeImage = exposure.getMaskedImage()

    # Paste patches into largeImage
    for filename in filenames:
        print "Loading a patch:", filename, "..."

        # Load a patch again
        patch = afwImage.ExposureF(filename)
        bbox = patch.getBBox(afwImage.PARENT)

        # The position of this patch within largeImage:
        #     Return values of getBegin/End method family are one past the end
        #     (range = [begin,end) ) cf. getMin/Max (above)
        xp0 = bbox.getBeginX() - x0
        yp0 = bbox.getBeginY() - y0
        xp1 = bbox.getEndX  () - x0
        yp1 = bbox.getEndY  () - y0

        # Paste the patch (As for the notation in [], see numpy's document)
        largeImage[xp0:xp1, yp0:yp1] = patch.getMaskedImage()[:]

        # Blindly trust that all the patches have the same WCS information etc,
        # and set arbitrary one of them in the large image.
        if not exposure.hasWcs() and patch.hasWcs():
            exposure.setCalib   (patch.getCalib   ())
            exposure.setFilter  (patch.getFilter  ())
            exposure.setMetadata(patch.getMetadata())
            exposure.setWcs     (patch.getWcs     ())

    return exposure


if __name__ == "__main__":
    main()

↑Save the script in make-large-image.py, and you can use it as follows:

[user@example ~]$ setup-hscpipe
[user@example ~]$ python make-large-image.py \
    $HOME/HSC/rerun/tutorial/deepCoadd-results/HSC-G/0/*/calexp-*.fits

↑ Execute it in a machine with more than 16GB of memory. Or, Linux may force a double suicide with the script. Viewing the output large image requires that amount of memory, after all.

Caution: ↑The above script is not very good at coping with the mask layer. It may be that meanings of mask bits (“Saturated”, “Cosmic ray”, &c) in one patch are different from those in another patch. The meanings of the mask bits are inscribed in the FITS header of an extension EXTNAME=MASK. Correctly, you should make the script read the FITS header to permutate the mask bits. (This is beyond the scope of this tutorial.)

Command line options

Command lines of hscPipe commands have a common syntax. Let us take singleFrameDriver.py for example:

[user@example ~]$ singleFrameDriver.py $HOME/HSC \
    --rerun=tutorial \
    --calib=$HOME/HSC/CALIB \
    --id visit=905496..905504:2 \
    --config \
        processCcd.isr.doBias=False \
        processCcd.isr.doDark=False \
    --batch-type=smp --cores=4

The first thing you have to know is the last line. Commands named **Driver.py are, basically, commands to post jobs into a job queue (Torque PBS &c.). But if you do not have such a system, you can execute the job immediately on the local computer, by the --batch-type=smp option. You have to specify the number of threads to use by --cores=.

[user@example ~]$ singleFrameDriver.py $HOME/HSC \
    --rerun=tutorial \
    --calib=$HOME/HSC/CALIB \
    --id visit=905496..905504:2 \
    --config \
        processCcd.isr.doBias=False \
        processCcd.isr.doDark=False \
    --batch-type=smp --cores=4

Next, let us focus on the --config option. You can change configuration of analysis here. To know available parameters and their default values, append --show config to the command line and execute it.

If you change the configuration, the program will refuse to run.

If you change the configuration, the program will refuse to run.

This is so important that it is worth saying twice. The program will refuse to run because it is undesirable that there coexist data made with different analysis parameters in a single rerun.

When the program does not run, append --clobber-config to the command line. Thereby the cached configuration will be overwritten.

[user@example ~]$ singleFrameDriver.py $HOME/HSC \
    --rerun=tutorial \
    --calib=$HOME/HSC/CALIB \
    --id visit=905496..905504:2 \
    --config \
        processCcd.isr.doBias=False \
        processCcd.isr.doDark=False \
    --batch-type=smp --cores=4

Lastly, --id. Put conditions here with which to search for input data. Multiple conditions specified will be connected by AND. For instance, if you specify --id field="DEN_C" filter="HSC-G", then all data will be selected that are in the field "DEN_C" and of the band "HSC-G". "field" and "filter" are column names of tables in the registry.

Though it is intuitive to use field &c. instead of specifying visits directly, you will be confused which data were input or not if, e.g., FITS headers should be wrong. Be careful.