In this tutorial is described a concrete example usage of the analysis pipeline distributed at the HSC Pipeline Binary Distribution site. Before reading this page, install the pipeline first.
Create a folder at a place you like. All input data and products will be placed in it.
I hereafter assume it ~/HSC
.
To create it, use the script create-rootdir.sh
instead of mkdir
,
for a few files have to be put in the data folder at this stage.
[user@example ~]$ # Set up hscPipe [user@example ~]$ setup-hscpipe [user@example ~]$ create-rootdir.sh ~/HSC /path/to/ps1_pv3_3pi_20170110 /path/to/STRAY_LIGHT
↑ The second argument ps1_pv3_3pi_20170110
is the path to the reference catalog folder.
This path (REFDIR) is such that REFDIR/{number}.fits
and
REFDIR/config.py
exists.
This folder (REFDIR) must be named ps1_pv3_3pi_20170110
since the pipeline assumes it by default.
↑ The third argument STRAY_LIGHT
is the path to the y-band stray light file.
You can omit this argument in this tutorial since the input data does not contain y-band images.
In addition, execute the following command to install transmission curves.
[user@example ~]$ installTransmissionCurves.py ~/HSC
Download images for tutorial from the data list in the pipeline distribution site. When the download completes, expand the archive in an arbitrary directory:
[user@example ~]$ tar xvaf rawdata.tar.xz
Emplace the downloaded images in ~/HSC:
[user@example ~]$ ## Don't forget to set up hscPipe if you haven't. [user@example ~]$ # setup-hscpipe [user@example ~]$ cd rawdata/ [user@example rawdata]$ # Move the images [user@example rawdata]$ hscIngestImages.py ~/HSC */*.fits --mode=move
↑ The destination path (~/HSC
) must be an absolute path. Otherwise the command may not work.
The images moved to ~/HSC
have been registered in ~/HSC/registry.sqlite3
.
Look into it to check what images there are.
[user@example ~]$ sqlite3 ~/HSC/registry.sqlite3 sqlite> .header on sqlite> SELECT visit, filter, field, count(visit) FROM raw GROUP BY visit, filter, field; sqlite> .q
If you get something like the following list, the emplacement was successful.
visit|filter|field|count(visit)
905496|HSC-G|DEN_C|112
905498|HSC-G|DEN_C|112
905500|HSC-G|DEN_C|112
905502|HSC-G|DEN_C|112
905504|HSC-G|DEN_C|112
907002|HSC-G|DOMEFLAT|112
907004|HSC-G|DOMEFLAT|112
907006|HSC-G|DOMEFLAT|112
907008|HSC-G|DOMEFLAT|112
907010|HSC-G|DOMEFLAT|112
↑If you see smaller figures in the rightmost column, some images are lacking.
You can notice that there are DOMEFLATs with visit=907002..907010:2
(“first..last:interval”),
and DEN_Cs with visit=905496..905504:2
.
If you want to know the sqlite3 command, google SQLite or SQL.
From the dome flats create flat data. (As for the visit numbers of the dome flats, see above)
[user@example ~]$ constructFlat.py $HOME/HSC \ --rerun=tutorial \ --calib=$HOME/HSC/CALIB \ --id visit=907002..907010:2 $(: or --id field=DOMEFLAT filter=HSC-G) \ --config \ isr.doBias=False \ isr.doDark=False \ --batch-type=smp --cores=4
↑ "--cores=4
" signifies “ by 4 threads.” Dividing the amount of memory by 4GB provides a good indication for the number of threads.
Pass an arbitrary rerun name into --rerun
.
Data analysis goes in sequence: reduce A to B, then B to C, ...
We call this sequence A→B→C→... “rerun”.
For further explanation of options, see Command line options.
The averaged flat data are written under the directory you passed in by --rerun=
. To be more precise:
[user@example ~]$ ls $HOME/HSC/rerun/tutorial/FLAT/2014-02-03/HSC-G FLAT-2014-02-03-HSC-G-000.fits FLAT-2014-02-03-HSC-G-056.fits FLAT-2014-02-03-HSC-G-001.fits FLAT-2014-02-03-HSC-G-057.fits FLAT-2014-02-03-HSC-G-002.fits FLAT-2014-02-03-HSC-G-058.fits : :
Normally, we should look into each of the FITS files, but let us minimally check that all of the 112 CCDs have been output:
[user@example ~]$ ls $HOME/HSC/rerun/tutorial/FLAT/2014-02-03/HSC-G/FLAT-*.fits | wc -l 112
Under $HOME/HSC/rerun/tutorial
have also been output intermediate files in addition to these final outputs.
Look into them if you are interested.
After constructing flat, you have to register it in order that the analysis pipeline can find it in subsequent stages.
[user@example ~]$ ingestCalibs.py $HOME/HSC \
@<(find $HOME/HSC/rerun/tutorial/FLAT -name "*.fits") \
--calib=$HOME/HSC/CALIB \
--validity=100 \
--mode=link
--validity=100
means that this flat can be used to process exposures
taken ±100 days around the date of the flat.
That is all for flat-making.
We have here neglected biases, darks and fringes.
You may create them in need with similar commands
(constructBias.py
, constructDark.py
, constructFringe.py
)
We tend to overestimate the sky level around large galaxies. Too high sky subtracted results in artificial moats encircling such galaxies. We will fill up these moats in the stacking step, and in so doing we need to re-estimate the sky level with a very large mesh. This process needs an average sky pattern pre-computed. We call it "a global sky pattern".
We create a global sky pattern from object frames, or images that record the actual starry sky. Invoke the following program. (As for the range of visit, see above)
[user@example ~]$ constructSky.py $HOME/HSC \ --rerun=tutorial \ --calib=$HOME/HSC/CALIB \ --id visit=905496..905504:2 $(: or --id field=DEN_C)\ --config \ isr.doBias=False \ isr.doDark=False \ --batch-type smp --cores=4
After constructing sky, you have to register it in order that the analysis pipeline can find it in subsequent stages.
[user@example ~]$ ingestCalibs.py $HOME/HSC \
@<(find $HOME/HSC/rerun/tutorial/SKY -name "*.fits") \
--calib=$HOME/HSC/CALIB \
--validity=100 \
--mode=link
For users familiar with old versions of the pipeline:
We no longer need to setup the reference catalog.
create-rootdir.sh
has already created a symbolic link to the reference catalog.
Invoke the analysis program that processes CCD images as below. (As for the range of visit, see above)
[user@example ~]$ singleFrameDriver.py $HOME/HSC \ --rerun=tutorial \ --calib=$HOME/HSC/CALIB \ --id visit=905496..905504:2 $(: or --id field=DEN_C)\ --config \ processCcd.isr.doBias=False \ processCcd.isr.doDark=False \ --batch-type=smp --cores=4
↑ If you are using the binary distributed in this site, and if this command
do not work with error messages “pthread_create ...”, then try
env OMP_NUM_THREADS=1 singleFrameDriver.py $HOME/HSC...
instead of the command above.
For further explanation of options, see Command line options.
When singleFrameDriver.py
finishes, its output files are under $HOME/HSC/rerun/tutorial
.
The images: ↓
[user@example ~]$ cd $HOME/HSC/rerun/tutorial/00762/HSC-G/corr/ [user@example corr]$ ls CORR-* CORR-0905496-000.fits CORR-0905498-070.fits CORR-0905502-036.fits CORR-0905496-001.fits CORR-0905498-071.fits CORR-0905502-037.fits CORR-0905496-002.fits CORR-0905498-072.fits CORR-0905502-038.fits : : :
And the object catalog (FITS BinTable) ↓
[user@example ~]$ cd $HOME/HSC/rerun/tutorial/00762/HSC-G/output/ [user@example output]$ ls SRC-* SRC-0905496-000.fits SRC-0905498-070.fits SRC-0905502-036.fits SRC-0905496-001.fits SRC-0905498-071.fits SRC-0905502-037.fits SRC-0905496-002.fits SRC-0905498-072.fits SRC-0905502-038.fits : : :
We should look into these things by our own eyes, but let us only check the number of files here, too.
[user@example ~]$ ls $HOME/HSC/rerun/tutorial/00762/HSC-G/output/SRC-*.fits | wc -l 520
There are only 520 = 5×104 files.
The number of CCDs per shot, excluding those for auto-focus, is not 112 but 104. Therefore this is normal.
We re-estimate the sky level here. See this section for explanation.
[user@example ~]$ skyCorrection.py $HOME/HSC \ --rerun=tutorial \ --calib=$HOME/HSC/CALIB \ --id visit=905496..905504:2 $(: or --id field=DEN_C)\ --batch-type=smp --cores=4
The output from this command will be used as the input of stacking.
Before mosaicking, create a “tract”. A tract means a region of the sky that is (gnomonically) projected to a single plane. Different tracts are projected to different planes, so that they have different WCS.
In a wide survey, we cut the sky into predefined chunks, and treat each chunk as a tract. The collection of the tracts are called a "sky map". But it is inconvenient for us to use tracts that are cut out of the sky simply offhand, when we want to make a mosaic from data obtained by dithering a small region about a field of view. The region would probably be separated into several tracts.
And so, we will define a special tract that encompasses all the images we have to make a mosaic from. The sky map comprising such tracts is called "Discrete sky map".
[user@example ~]$ makeDiscreteSkyMap.py $HOME/HSC \ --rerun=tutorial \ --calib=$HOME/HSC/CALIB \ --id visit=905496..905504:2 $(: or --id field=DEN_C)
↑ If the following message is output finally:
makeDiscreteSkyMap INFO: tract 0 has corners (134.970, 22.961), (132.538, 22.961), (132.517, 25.201), (134.992, 25.201) (RA, Dec deg) and 12 x 12 patches
then tract 0 is defined properly.
Make a mosaic of tract 0. Mosaicking only determines positions, but it does not create images.
[user@example ~]$ mosaic.py $HOME/HSC \ --rerun=tutorial \ --calib=$HOME/HSC/CALIB \ --id visit=905496..905504:2 tract=0 $(: or --id field=DEN_C filter=HSC-G tract=0)
By mosaicking, magnitude zero-points and WCS have been determined. We make CCD images that include the calibration. (You can skip this stage if you do not need calibrated CCD images.)
[user@example ~]$ calibrateExposure.py $HOME/HSC \ --rerun=tutorial \ --calib=$HOME/HSC/CALIB \ --id visit=905496..905504:2 tract=0 $(: or --id field=DEN_C tract=0) \ -j 4 # by 4 threads
↑In so doing, warnings might occur “NoResults: No locations for get (data that are on the edge of the FOV)” This means that mosaicking could not determine the positions of those CCDs on the edge of the tract. We can ignore it.
The calibrated images are output in↓
[user@example ~]$ ls $HOME/HSC/rerun/tutorial/00762/HSC-G/corr/0000/CALEXP-*.fits
Make catalogs that include the calibration, in the same way. (You can skip this stage if you do not need calibrated catalogs.)
[user@example ~]$ calibrateCatalog.py $HOME/HSC \ --rerun=tutorial \ --calib=$HOME/HSC/CALIB \ --id visit=905496..905504:2 tract=0 $(: or --id field=DEN_C tract=0) \ --config doApplyCalib=False \ -j 4 # by 4 threads
The calibrated catalogs are output in↓
[user@example ~]$ ls $HOME/HSC/rerun/tutorial/00762/HSC-G/output/0000/CALSRC-*.fits
Though mosaicking has been done, coadd images have not yet been made. We make coadd images by calling stack.py
[user@example ~]$ coaddDriver.py $HOME/HSC \ --rerun=tutorial \ --calib=$HOME/HSC/CALIB \ --id filter=HSC-G tract=0 \ --selectId visit=905496..905504:2 ccd=0..103 $(: or --selectId field=DEN_C filter=HSC-G) \ --batch-type=smp --cores=4
The coadd images are output in↓
[user@example ~]$ cd $HOME/HSC/rerun/tutorial/deepCoadd-results/HSC-G/0 [user@example 0]$ ls */calexp-*.fits 1,3/calexp-HSC-G-0-1,3.fits 5,3/calexp-HSC-G-0-5,3.fits 1,4/calexp-HSC-G-0-1,4.fits 5,4/calexp-HSC-G-0-5,4.fits 1,5/calexp-HSC-G-0-1,5.fits 5,5/calexp-HSC-G-0-5,5.fits : :
This many files are created though only the tract 0 is stacked. In fact, a tract is split into small patches since the tract is so large that it can hardly be loaded into memory. Because the patches are simply split from a single large image, we can, using ds9 for example, view all the patches joined at a time (Beware that it requires more than 16GB of memory)。
[user@example 0]$ ds9 -mosaic wcs */calexp-*.fits
The whole tract will look like: stacked image (8MB; downscaled by 102)
You can see that "det-*.fits
" files coexist
with the image files "calexp-*.fits
". ↓
[user@example ~]$ cd $HOME/HSC/rerun/tutorial/deepCoadd-results/HSC-G/0 [user@example 0]$ ls */det-*.fits 1,3/det-HSC-G-0-1,3.fits 5,3/det-HSC-G-0-5,3.fits 1,4/det-HSC-G-0-1,4.fits 5,4/det-HSC-G-0-5,4.fits 1,5/det-HSC-G-0-1,5.fits 5,5/det-HSC-G-0-5,5.fits : :
These files contain "detections" of objects in the stacked images. The objects have not been measured for now. They will be measured in the "multiband analysis" below.
Though the tutorial data contains only a single band, we force our way to multiband analysis: for this is a tutorial. Let us merge respective detections of objects that have been found in multiple bands, and measure the merged detections.
[user@example ~]$ multiBandDriver.py $HOME/HSC \ --rerun=tutorial \ --calib=$HOME/HSC/CALIB \ --id tract=0 filter=HSC-G \ $(: filter=HSC-G^HSC-I^HSC-Z e.g. if there are several bands) \ --batch-type=smp --cores=4
The result of measuring each band independently with a common merged catalog↓
[user@example ~]$ ls $HOME/HSC/rerun/tutorial/deepCoadd-results/HSC-G/0/*/meas-*.fits
The result of measuring each band with positions (etc.) of objects fixed at common values↓
[user@example ~]$ ls $HOME/HSC/rerun/tutorial/deepCoadd-results/HSC-G/0/*/forced_src-*.fits
That is all.
Suppose you want a large, whole image instead of chopped-up patches. Since hscPipe is a huge aggregate of Python libraries, you can use the libraries in Python scripts.
As an example, I paste below a script to make a large image from patches.
#!/usr/bin/env python # encoding: utf-8 # A library of AFW (Application FrameWork), related to images import lsst.afw.image as afwImage # A library of AFW, related to geometry import lsst.afw.geom as afwGeom import sys def main(): """ Arrange patches to make a large image, and save it in "large.fits" make-large-image PATCHES... (e.g. make-large-image *,*.fits) """ if len(sys.argv) <= 1: print "Input 1 patch or more." return 1 largeImage = makeLargeImageFromPatches(sys.argv[1:]) print "Saving (Wait a while)..." largeImage.writeFits("large.fits") def makeLargeImageFromPatches(filenames): """ @param filenames List of file names @return afwImage.ExposureF """ # Loading all image would be dangerous by memory limitation. # Load metric information only. bboxes = [] for filename in filenames: print "Loading a bounding box:", filename, "..." bboxes.append( # Get the bounding box of this patch # in the parent (= the tract) afwImage.ExposureF(filename).getBBox(afwImage.PARENT) ) # Calculate the bbox that includes all the bboxes. # Return values of getMin/Max method family are inclusive # (range = [min, max]) cf. getBegin/End (used below) x0 = min(bbox.getMinX() for bbox in bboxes) y0 = min(bbox.getMinY() for bbox in bboxes) x1 = max(bbox.getMaxX() for bbox in bboxes) y1 = max(bbox.getMaxY() for bbox in bboxes) width = x1 - x0 + 1 height = y1 - y0 + 1 # Prepare a large image print "Preparing a large image: ({}, {})".format(width, height) exposure = afwImage.ExposureF(width, height) # Set the position of this image in the tract exposure.setXY0(afwGeom.Point2I(x0, y0)) # Extract the image data from the exposure largeImage = exposure.getMaskedImage() # Paste patches into largeImage for filename in filenames: print "Loading a patch:", filename, "..." # Load a patch again patch = afwImage.ExposureF(filename) bbox = patch.getBBox(afwImage.PARENT) # The position of this patch within largeImage: # Return values of getBegin/End method family are one past the end # (range = [begin,end) ) cf. getMin/Max (above) xp0 = bbox.getBeginX() - x0 yp0 = bbox.getBeginY() - y0 xp1 = bbox.getEndX () - x0 yp1 = bbox.getEndY () - y0 # Paste the patch (As for the notation in [], see numpy's document) largeImage[xp0:xp1, yp0:yp1] = patch.getMaskedImage()[:] # Blindly trust that all the patches have the same WCS information etc, # and set arbitrary one of them in the large image. if not exposure.hasWcs() and patch.hasWcs(): exposure.setCalib (patch.getCalib ()) exposure.setFilter (patch.getFilter ()) exposure.setMetadata(patch.getMetadata()) exposure.setWcs (patch.getWcs ()) return exposure if __name__ == "__main__": main()
↑Save the script in make-large-image.py, and you can use it as follows:
[user@example ~]$ setup-hscpipe [user@example ~]$ python make-large-image.py \ $HOME/HSC/rerun/tutorial/deepCoadd-results/HSC-G/0/*/calexp-*.fits
↑ Execute it in a machine with more than 16GB of memory. Or, Linux may force a double suicide with the script. Viewing the output large image requires that amount of memory, after all.
Caution: ↑The above script is not very good at coping with the mask layer. It may be that meanings of mask bits (“Saturated”, “Cosmic ray”, &c) in one patch are different from those in another patch. The meanings of the mask bits are inscribed in the FITS header of an extension EXTNAME=MASK. Correctly, you should make the script read the FITS header to permutate the mask bits. (This is beyond the scope of this tutorial.)
Command lines of hscPipe commands have a common syntax.
Let us take singleFrameDriver.py
for example:
[user@example ~]$ singleFrameDriver.py $HOME/HSC \
--rerun=tutorial \
--calib=$HOME/HSC/CALIB \
--id visit=905496..905504:2 \
--config \
processCcd.isr.doBias=False \
processCcd.isr.doDark=False \
--batch-type=smp --cores=4
The first thing you have to know is the last line.
Commands named **Driver.py
are, basically, commands to post jobs into a job queue (Torque PBS &c.).
But if you do not have such a system, you can execute the job immediately on the local computer,
by the --batch-type=smp
option. You have to specify the number of threads to use
by --cores=
.
[user@example ~]$ singleFrameDriver.py $HOME/HSC \
--rerun=tutorial \
--calib=$HOME/HSC/CALIB \
--id visit=905496..905504:2 \
--config \
processCcd.isr.doBias=False \
processCcd.isr.doDark=False \
--batch-type=smp --cores=4
Next, let us focus on the --config
option. You can change configuration of analysis here.
To know available parameters and their default values, append --show config
to the command line
and execute it.
If you change the configuration, the program will refuse to run.
If you change the configuration, the program will refuse to run.
This is so important that it is worth saying twice. The program will refuse to run because it is undesirable that there coexist data made with different analysis parameters in a single rerun.
When the program does not run,
append --clobber-config
to the command line.
Thereby the cached configuration will be overwritten.
[user@example ~]$ singleFrameDriver.py $HOME/HSC \
--rerun=tutorial \
--calib=$HOME/HSC/CALIB \
--id visit=905496..905504:2 \
--config \
processCcd.isr.doBias=False \
processCcd.isr.doDark=False \
--batch-type=smp --cores=4
Lastly, --id
. Put conditions here with which to search for input data.
Multiple conditions specified will be connected by AND. For instance, if you specify
--id field="DEN_C" filter="HSC-G"
,
then all data will be selected that are in the field "DEN_C" and of the band "HSC-G".
"field" and "filter" are column names of tables in the registry.
Though it is intuitive to use field &c. instead of specifying visits directly, you will be confused which data were input or not if, e.g., FITS headers should be wrong. Be careful.