Tutorial

日本語

In this tutorial is described a concrete example usage of the analysis pipeline distributed at the HSC Pipeline Binary Distribution site. Before reading this page, install the pipeline first.

Emplace images

Crate a data folder

Create a folder at a place you like. All input data and products will be placed in it. I hereafter assume it ~/HSC.

[user@example ~]$ mkdir ~/HSC
[user@example ~]$ # Cast a spell to initiate this folder
[user@example ~]$ echo 'lsst.obs.hsc.HscMapper' > ~/HSC/_mapper

Download input images

Download images for tutorial from the data list in the pipeline distribution site. When the download completes, expand the archive in an arbitrary directory:

[user@example ~]$ tar xvaf rawdata.tar.xz

Move the images

Emplace the downloaded images in ~/HSC:

[user@example ~]$ # Set up hscPipe
[user@example ~]$ setup-hscpipe
[user@example ~]$ cd rawdata/
[user@example rawdata]$ # Move the images
[user@example rawdata]$ hscIngestImages.py ~/HSC  */*.fits --mode=move --create

↑ The destination path (~/HSC) must be an absolute path. Otherwise the command may not work.

Verify

The images moved to ~/HSC have been registered in ~/HSC/registry.sqlite3. Look into it to check what images there are.

[user@example ~]$ sqlite3 ~/HSC/registry.sqlite3
sqlite> .header on
sqlite> SELECT visit, filter, field, count(visit)
    FROM raw
    GROUP BY visit, filter, field;
sqlite> .q

If you get something like the following list, the emplacement was successful.

visit|filter|field|count(visit)
905496|HSC-G|DEN_C|112
905498|HSC-G|DEN_C|112
905500|HSC-G|DEN_C|112
905502|HSC-G|DEN_C|112
905504|HSC-G|DEN_C|112
907002|HSC-G|DOMEFLAT|112
907004|HSC-G|DOMEFLAT|112
907006|HSC-G|DOMEFLAT|112
907008|HSC-G|DOMEFLAT|112
907010|HSC-G|DOMEFLAT|112

↑If you see smaller figures in the rightmost column, you lack some images.

You can notice that there are DOMEFLATs with visit=907002..907010:2 (“first..last:interval”), and DEN_Cs with visit=905496..905504:2.

If you want to know the sqlite3 command, google SQLite or SQL.

Create data for calibration

Create flat data

From the dome flats create flat data. (As for the visit numbers of the dome flats, see above)

[user@example ~]$ reduceFlat.py $HOME/HSC \
    --calib=$HOME/HSC/CALIB_tutorial \
    --rerun=calib_tutorial \
    --id visit=907002..907010:2 \
    --detrendId calibVersion=calib_tutorial \
    --config isr.doBias=False \
        isr.doDark=False \
        isr.doFringe=False \
    --do-exec --mpiexec "-n 4"

Note carefully: Values of --calib and calibVersion must match the regular expression [A-Za-z0-9_+-]+. It must not, for example, contain an period.

↑ "-n 4" signifies “ by 4 threads.” Dividing the amount of memory by 4GB provides a good indication for the number of threads.

Pass an arbitrary rerun name into --rerun. Data analysis goes in sequence: reduce A to B, then B to C, ... We call this sequence A→B→C→... “rerun”.

For further explanation of options, see Command line options.

The averaged flat data are written under the directory you passed in by --calib=. To be more precise:

[user@example ~]$ ls $HOME/HSC/CALIB_tutorial/FLAT/2014-02-03/HSC-G/calib_tutorial
FLAT-000.fits  FLAT-023.fits  FLAT-046.fits  FLAT-069.fits  FLAT-092.fits
FLAT-001.fits  FLAT-024.fits  FLAT-047.fits  FLAT-070.fits  FLAT-093.fits
FLAT-002.fits  FLAT-025.fits  FLAT-048.fits  FLAT-071.fits  FLAT-094.fits
    :              :              :              :              :

Normally, we should look into each of the FITS file, but let us minimally check that all of the 112 CCDs have been output:

[user@example ~]$ ls $HOME/HSC/CALIB_tutorial/FLAT/2014-02-03/HSC-G/calib_tutorial/FLAT-*.fits |
    wc -l
112

In addition, under $HOME/HSC/rerun/(rerun name) have been output intermediate files other than these final outputs. Look into them if you are interested.

After the data are created, register their information in the registry..

[user@example ~]$ genCalibRegistry.py \
    --root=$HOME/HSC/CALIB_tutorial \
    --camera=HSC \
    --create

↑Beware that no process below will succeed if you forget to do this.

That is all for flat-making. We have here neglected biases, darks and fringes. You may create them in need with similar commands (reduceBias.py, reduceDark.py, reduceFringe.py)

Analyze CCDs

Before proceeding, you must specify astrometry_net_data in order for the astrometry.

[user@example ~]$ setup astrometry_net_data sdss-dr9-fink-v5b

Next, invoke the analysis program that processes CCDs. (As for the range of visit, see above)

[user@example ~]$ reduceFrames.py $HOME/HSC \
    --calib=$HOME/HSC/CALIB_tutorial \
    --rerun=tutorial \
    --id visit=905496..905504:2 \
    --config processCcd.isr.doBias=False \
        processCcd.isr.doDark=False \
        processCcd.isr.doFringe=False \
    --do-exec --mpiexec "-n 4"
List of outputs

↑ If you are using the binary distributed in this site, and if this command do not work with error messages “pthread_create ...”, then try OMP_NUM_THREADS=1 reduceFrames.py $HOME/HSC... instead of the command above.

↑ The rerun name was --rerun=calib_tutorial when we made flats. Now we use --rerun=tutorial. This means that we finish the old rerun (workflow) and start a new one from this point.

For further explanation of options, see Command line options.

When reduceFrames.py finishes, the output files are under $HOME/HSC/rerun/(rerun name): the images ↓

[user@example ~]$ cd $HOME/HSC/rerun/tutorial/00762/HSC-G/corr/
[user@example corr]$ ls CORR-*
CORR-0905496-000.fits  CORR-0905498-070.fits  CORR-0905502-036.fits
CORR-0905496-001.fits  CORR-0905498-071.fits  CORR-0905502-037.fits
CORR-0905496-002.fits  CORR-0905498-072.fits  CORR-0905502-038.fits
    :                      :                       :

And the object catalog (FITS BinTable) ↓

[user@example ~]$ cd $HOME/HSC/rerun/tutorial/00762/HSC-G/output/
[user@example output]$ ls SRC-*
SRC-0905496-000.fits  SRC-0905498-070.fits  SRC-0905502-036.fits
SRC-0905496-001.fits  SRC-0905498-071.fits  SRC-0905502-037.fits
SRC-0905496-002.fits  SRC-0905498-072.fits  SRC-0905502-038.fits
   :                     :                     :

We should look into these things by our own eyes, but let us only check the number of files here, too.

[user@example ~]$ ls $HOME/HSC/rerun/tutorial/00762/HSC-G/output/SRC-*.fits |
    wc -l
520

There are only 520 = 5×104 files.

The number of CCDs per shot is, excluding those for auto-focus, not 112 but 104. Therefore this is normal.

Stack

Create a skymap

Before mosaicking, create a “tract”. We cut the sky into chunks, and call each region a tract. A tract is projected into a plane. (Or, different tracts have different WCS.)

Though this is useful in a wide survey, it is inconvenient for us to use tracts that are cut from the sky simply offhand when we want to make a mosaic from data obtained by dithering a small region about a field of view.

And so, we will define a special tract that encompasses all the images we have to make a mosaic from: Discrete Sky Map.

[user@example ~]$ makeDiscreteSkyMap.py $HOME/HSC \
    --calib=$HOME/HSC/CALIB_tutorial \
    --rerun=tutorial \
    --id visit=905496..905504:2
List of outputs

↑ If the following message is output finally:

2015-04-08T06:32:05: makeDiscreteSkyMap: tract 0 has corners (134.913, 23.031), (132.683, 23.031), (132.665, 25.084), (134.931, 25.084) (RA, Dec deg) and 11 x 11 patches

then tract 0 is defined properly.

Mosaic

Make a mosaic of tract 0. Mosaicking only determines positions, but it does not create images.

[user@example ~]$ mosaic.py $HOME/HSC \
    --calib=$HOME/HSC/CALIB_tutorial \
    --rerun=tutorial \
    --id visit=905496..905504:2 ccd=0..103 tract=0
List of outputs

Calibrate images

By mosaicking, magnitude zero-points and WCS have been determined. We make CCD images that include the calibration. (You can skip this stage if you do not need calibrated CCD images.)

[user@example ~]$ calibrateExposure.py $HOME/HSC \
    --calib=$HOME/HSC/CALIB_tutorial \
    --rerun=tutorial \
    --id visit=905496..905504:2 ccd=0..103 tract=0 \
    -j 4 # by 4 threads
List of outputs

↑In so doing, warnings might occur “Cannot open wcs files (that are on the edge of the FOV)” This means that mosaicking could not determine the positions of those CCDs on the edge of the tract. We can ignore it.

The calibrated images are output in↓

[user@example ~]$ ls $HOME/HSC/rerun/tutorial/00762/HSC-G/corr/0000/CALEXP-*.fits

Calibrate catalogs

Make catalogs that include the calibration, in the same way. (You can skip this stage if you do not need calibrated catalogs.)

[user@example ~]$ calibrateCatalog.py $HOME/HSC \
    --calib=$HOME/HSC/CALIB_tutorial \
    --rerun=tutorial \
    --id visit=905496..905504:2 ccd=0..103 tract=0 \
    --config doApplyCalib=False \
    -j 4 # by 4 threads
List of outputs

The calibrated catalogs are output in↓

[user@example ~]$ ls $HOME/HSC/rerun/tutorial/00762/HSC-G/output/0000/CALSRC-*.fits

Stack

Though mosaicking has been done, coadd images have not yet been made. We make coadd images by calling stack.py

[user@example ~]$ stack.py $HOME/HSC \
    --calib=$HOME/HSC/CALIB_tutorial \
    --rerun=tutorial \
    --id filter=HSC-G tract=0 \
    --selectId visit=905496..905504:2 ccd=0..103 \
    --do-exec --mpiexec "-n 4"
List of outputs

The coadd images are output in↓

[user@example ~]$ cd $HOME/HSC/rerun/tutorial/deepCoadd/HSC-G/0
[user@example 0]$ ls *.fits
0,4.fits   10,5.fits  3,10.fits  4,5.fits   5,9.fits   7,3.fits  8,8.fits
0,5.fits   10,6.fits  3,2.fits   4,6.fits   6,1.fits   7,4.fits  8,9.fits
 :           :         :          :          :          :         :

This many files are created though only the tract 0 is stacked. In fact, a tract is split into a small patches since the tract is so large that it can hardly be loaded into memory. Because the patches are simply split from a single large image, we can, using ds9 for example, view all the patches joined at a time (Beware that it requires more than 16GB of memory)。

[user@example 0]$ ds9 -mosaic wcs *.fits

The whole tract will look like: stacked image (8MB; downscaled by 102)

In addition, objects are measured at the same time as stacking. The catalogs are↓

[user@example ~]$ ls $HOME/HSC/rerun/tutorial/deepCoadd-results/HSC-G/0/*/src-*.fits

Multiband analysis

Though the tutorial data contains only a single band, we force our way to multiband analysis: for this is a tutorial. Let us merge respective catalogs that have been made in multiple bands, and measure the merged catalog again.

[user@example ~]$ multiBand.py $HOME/HSC \
    --calib=$HOME/HSC/CALIB_tutorial \
    --rerun=tutorial \
    --id tract=0 filter=HSC-G \
        $(: filter=HSC-G^HSC-I^HSC-Z e.g. if there are several bands) \
    --clobber-config \
    --do-exec --mpiexec "-n 4"
List of outputs

The result of measuring each band independently with a common merged catalog↓

[user@example ~]$ ls $HOME/HSC/rerun/tutorial/deepCoadd-results/HSC-G/0/*/meas-*.fits

The result of measuring each band with positions (etc.) of objects fixed at common values↓

[user@example ~]$ ls $HOME/HSC/rerun/tutorial/deepCoadd-results/HSC-G/0/*/forced_src-*.fits

Analyze CCDs again with reference to the coadds

We project back to CCDs the positions of objects detected in the coadd images (to be precise, in the merged catalog), and measure them again in the non-stacked CCDs.

[user@example ~]$ forcedCcd.py $HOME/HSC \
        --calib=$HOME/HSC/CALIB_tutorial \
        --rerun=tutorial \
        --id visit=905496..905504:2 ccd=0..103 tract=0 \
        --clobber-config \
        --do-exec --mpiexec "-n 4"
List of outputs

The resultant catalogs are written in:

[user@example ~]$ ls $HOME/HSC/rerun/tutorial/00762/HSC-G/tract0/FORCEDSRC-*.fits

That is all.

(Excursus) Example of a script

Suppose you want a large, whole image instead of chopped-up patches. Since hscPipe is a huge aggregate of Python libraries, you can use the libraries in Python scripts.

As an example, I paste below a script to make a large image from patches.

#!/usr/bin/env python
# encoding: utf-8

# A library of AFW (Application FrameWork), related to images
import lsst.afw.image as afwImage
# A library of AFW, related to geometry
import lsst.afw.geom  as afwGeom

import sys

def main():
    """
    Arrange patches to make a large image, and save it in "large.fits"

    make-large-image PATCHES...
        (e.g. make-large-image *,*.fits)
    """

    if len(sys.argv) <= 1:
        print "Input 1 patch or more."
        return 1

    largeImage = makeLargeImageFromPatches(sys.argv[1:])

    print "Saving (Wait a while)..."
    largeImage.writeFits("large.fits")


def makeLargeImageFromPatches(filenames):
    """
    @param filenames List of file names
    @return afwImage.ExposureF
    """

    # Loading all image would be dangerous by memory limitation.
    # Load metric information only.
    bboxes = []
    for filename in filenames:
        print "Loading a bounding box:", filename, "..."
        bboxes.append(
            # Get the bounding box of this patch
            # in the parent (= the tract)
            afwImage.ExposureF(filename).getBBox(afwImage.PARENT)
        )

    # Calculate the bbox that includes all the bboxes.
    #     Return values of getMin/Max method family are inclusive
    #     (range = [min, max]) cf. getBegin/End (used below)
    x0 = min(bbox.getMinX() for bbox in bboxes)
    y0 = min(bbox.getMinY() for bbox in bboxes)
    x1 = max(bbox.getMaxX() for bbox in bboxes)
    y1 = max(bbox.getMaxY() for bbox in bboxes)

    width  = x1 - x0 + 1
    height = y1 - y0 + 1

    # Prepare a large image
    print "Preparing a large image: ({}, {})".format(width, height)
    exposure = afwImage.ExposureF(width, height)

    # Set the position of this image in the tract
    exposure.setXY0(afwGeom.Point2I(x0, y0))

    # Extract the image data from the exposure
    largeImage = exposure.getMaskedImage()

    # Paste patches into largeImage
    for filename in filenames:
        print "Loading a patch:", filename, "..."

        # Load a patch again
        patch = afwImage.ExposureF(filename)
        bbox = patch.getBBox(afwImage.PARENT)

        # The position of this patch within largeImage:
        #     Return values of getBegin/End method family are one past the end
        #     (range = [begin,end) ) cf. getMin/Max (above)
        xp0 = bbox.getBeginX() - x0
        yp0 = bbox.getBeginY() - y0
        xp1 = bbox.getEndX  () - x0
        yp1 = bbox.getEndY  () - y0

        # Paste the patch (As for the notation in [], see numpy's document)
        largeImage[xp0:xp1, yp0:yp1] = patch.getMaskedImage()[:]

        # Blindly trust that all the patches have the same WCS information etc,
        # and set arbitrary one of them in the large image.
        if not exposure.hasWcs() and patch.hasWcs():
            exposure.setCalib   (patch.getCalib   ())
            exposure.setFilter  (patch.getFilter  ())
            exposure.setMetadata(patch.getMetadata())
            exposure.setWcs     (patch.getWcs     ())

    return exposure


if __name__ == "__main__":
    main()

↑Save the script in make-large-image.py, and you can use it as follows:

[user@example ~]$ setup-hscpipe
[user@example ~]$ python make-large-image.py \
    $HOME/HSC/rerun/tutorial/deepCoadd/HSC-G/0/*.fits

↑ Execute it in a machine with more than 16GB of memory. Or, Linux may force a double suicide with the script. Viewing the output large image requires that amount of memory, after all.

Caution: ↑The above script is not very good at coping with the mask layer. It may be that in one patch, the bits of the mask layer are assigned meanings (“Saturated&rquo;, “Cosmic ray&rquo;, &c) that are different from those assigned in another patch. The meanings of the mask bits are inscribed in the FITS header of an extension EXTNAME=MASK. Correctly, you should make the script read the FITS header to permutate the mask bits. (This is beyond the scope of this tutorial.)

Command line options

Command lines of hscPipe commands have a common syntax. Let us take reduceFrames.py for example:

[user@example ~]$ reduceFrames.py $HOME/HSC \
    --calib=$HOME/HSC/CALIB_tutorial \
    --rerun=tutorial \
    --id visit=905496..905504:2 \
    --config processCcd.isr.doBias=False \
        processCcd.isr.doDark=False \
        processCcd.isr.doFringe=False \
    --do-exec --mpiexec "-n 4"

The first thing you have to know is the last line. Commands named reduce** are, basically, commands to post jobs into a job queue (Torque PBS &c.). But if you do not have such a system, you can execute the job immediately on the local computer, by the --do-exec option.

However, if --do-exec only was given, the job is executed with a single thread. You should specify --mpiexec "-n {number-of-threads}". This is in fact the option string to hand over to mpiexec (google it). Note, if you do not use --do-exec but post the job into a queue, you must not specify the number of threads by --mpiexec.

[user@example ~]$ reduceFrames.py $HOME/HSC \
    --calib=$HOME/HSC/CALIB_tutorial \
    --rerun=tutorial \
    --id visit=905496..905504:2 \
    --config processCcd.isr.doBias=False \
        processCcd.isr.doDark=False \
        processCcd.isr.doFringe=False \
    --do-exec --mpiexec "-n 4"

Next, let us focus on the --config option. You can change configuration of analysis here. To know available parameters and their default values, append --show config to the command line and execute it.

If you change the configuration, the program will refuse to run.

If you change the configuration, the program will refuse to run.

This is so important that it is worth saying twice. The program will refuse to run because it is undesirable that there coexist data made with different analysis parameters in a single rerun.

When the program does not run, append --clobber-config to the command line. Thereby the cached configuration will be overwritten.

[user@example ~]$ reduceFrames.py $HOME/HSC \
    --calib=$HOME/HSC/CALIB_tutorial \
    --rerun=tutorial \
    --id visit=905496..905504:2 \
    --config processCcd.isr.doBias=False \
        processCcd.isr.doDark=False \
        processCcd.isr.doFringe=False \
    --do-exec --mpiexec "-n 4"

Lastly, --id. Put conditions here with which to search for input data. Multiple conditions specified will be connected by AND. For instance, if you specify --id field="DEN_C" filter="HSC-G", then all data will be selected that are in the field "DEN_C" and of the band "HSC-G". "field" and "filter" are column names of tables in the registry.

Though it is intuitive to use field &c. instead of specifying visits directly, you will be confused which data were input or not if, e.g., FITS headers should be wrong. Be careful.