In this tutorial is described a concrete example usage of the analysis pipeline distributed at the HSC Pipeline Binary Distribution site. Before reading this page, install the pipeline first.
Create a folder at a place you like. All input data and products will be placed in it. I hereafter assume it ~/HSC.
[user@example ~]$ mkdir ~/HSC [user@example ~]$ # Cast a spell to initiate this folder [user@example ~]$ echo 'lsst.obs.hsc.HscMapper' > ~/HSC/_mapper
Download images for tutorial from the data list in the pipeline distribution site. When the download completes, expand the archive in an arbitrary directory:
[user@example ~]$ tar xvaf rawdata.tar.xz
Emplace the downloaded images in ~/HSC:
[user@example ~]$ # Set up hscPipe [user@example ~]$ setup-hscpipe [user@example ~]$ cd rawdata/ [user@example rawdata]$ # Move the images [user@example rawdata]$ hscIngestImages.py ~/HSC */*.fits --mode=move --create
↑ The destination path (~/HSC) must be an absolute path. Otherwise the command may not work.
The images moved to ~/HSC have been registered in ~/HSC/registry.sqlite3. Look into it to check what images there are.
[user@example ~]$ sqlite3 ~/HSC/registry.sqlite3 sqlite> .header on sqlite> SELECT visit, filter, field, count(visit) FROM raw GROUP BY visit, filter, field; sqlite> .q
If you get something like the following list, the emplacement was successful.
visit|filter|field|count(visit)
905496|HSC-G|DEN_C|112
905498|HSC-G|DEN_C|112
905500|HSC-G|DEN_C|112
905502|HSC-G|DEN_C|112
905504|HSC-G|DEN_C|112
907002|HSC-G|DOMEFLAT|112
907004|HSC-G|DOMEFLAT|112
907006|HSC-G|DOMEFLAT|112
907008|HSC-G|DOMEFLAT|112
907010|HSC-G|DOMEFLAT|112
↑If you see smaller figures in the rightmost column, you lack some images.
You can notice that there are DOMEFLATs with visit=907002..907010:2 (“first..last:interval”), and DEN_Cs with visit=905496..905504:2.
If you want to know the sqlite3 command, google SQLite or SQL.
From the dome flats create flat data. (As for the visit numbers of the dome flats, see above)
[user@example ~]$ reduceFlat.py $HOME/HSC \
--calib=$HOME/HSC/CALIB_tutorial \
--rerun=calib_tutorial \
--id visit=907002..907010:2 \
--detrendId calibVersion=calib_tutorial \
--config isr.doBias=False \
isr.doDark=False \
isr.doFringe=False \
--batch-type=smp --cores=4
↑Note carefully: Values of --calib and calibVersion must match the regular expression [A-Za-z0-9_+-]+. It must not, for example, contain an period.
↑ "--cores=4" signifies “ by 4 threads.” Dividing the amount of memory by 4GB provides a good indication for the number of threads.
Pass an arbitrary rerun name into --rerun. Data analysis goes in sequence: reduce A to B, then B to C, ... We call this sequence A→B→C→... “rerun”.
For further explanation of options, see Command line options.
The averaged flat data are written under the directory you passed in by --calib=. To be more precise:
[user@example ~]$ ls $HOME/HSC/CALIB_tutorial/FLAT/2014-02-03/HSC-G/calib_tutorial FLAT-000.fits FLAT-023.fits FLAT-046.fits FLAT-069.fits FLAT-092.fits FLAT-001.fits FLAT-024.fits FLAT-047.fits FLAT-070.fits FLAT-093.fits FLAT-002.fits FLAT-025.fits FLAT-048.fits FLAT-071.fits FLAT-094.fits : : : : :
Normally, we should look into each of the FITS file, but let us minimally check that all of the 112 CCDs have been output:
[user@example ~]$ ls $HOME/HSC/CALIB_tutorial/FLAT/2014-02-03/HSC-G/calib_tutorial/FLAT-*.fits | wc -l 112
In addition, under $HOME/HSC/rerun/(rerun name) have been output intermediate files other than these final outputs. Look into them if you are interested.
After the data are created, register their information in the registry..
[user@example ~]$ genCalibRegistry.py \
--root=$HOME/HSC/CALIB_tutorial \
--camera=HSC \
--create
↑Beware that no process below will succeed if you forget to do this.
That is all for flat-making. We have here neglected biases, darks and fringes. You may create them in need with similar commands (reduceBias.py, reduceDark.py, reduceFringe.py)
Before proceeding, you must specify astrometry_net_data in order for the astrometry.
[user@example ~]$ setup astrometry_net_data sdss-dr9-fink-v5b
Next, invoke the analysis program that processes CCDs. (As for the range of visit, see above)
[user@example ~]$ reduceFrames.py $HOME/HSC \
--calib=$HOME/HSC/CALIB_tutorial \
--rerun=tutorial \
--id visit=905496..905504:2 \
--config processCcd.isr.doBias=False \
processCcd.isr.doDark=False \
processCcd.isr.doFringe=False \
--batch-type=smp --cores=4
↑ If you are using the binary distributed in this site, and if this command do not work with error messages “pthread_create ...”, then try OMP_NUM_THREADS=1 reduceFrames.py $HOME/HSC... instead of the command above.
↑ The rerun name was --rerun=calib_tutorial when we made flats. Now we use --rerun=tutorial. This means that we finish the old rerun (workflow) and start a new one from this point.
For further explanation of options, see Command line options.
When reduceFrames.py finishes, the output files are under $HOME/HSC/rerun/(rerun name): the images ↓
[user@example ~]$ cd $HOME/HSC/rerun/tutorial/00762/HSC-G/corr/ [user@example corr]$ ls CORR-* CORR-0905496-000.fits CORR-0905498-070.fits CORR-0905502-036.fits CORR-0905496-001.fits CORR-0905498-071.fits CORR-0905502-037.fits CORR-0905496-002.fits CORR-0905498-072.fits CORR-0905502-038.fits : : :
And the object catalog (FITS BinTable) ↓
[user@example ~]$ cd $HOME/HSC/rerun/tutorial/00762/HSC-G/output/ [user@example output]$ ls SRC-* SRC-0905496-000.fits SRC-0905498-070.fits SRC-0905502-036.fits SRC-0905496-001.fits SRC-0905498-071.fits SRC-0905502-037.fits SRC-0905496-002.fits SRC-0905498-072.fits SRC-0905502-038.fits : : :
We should look into these things by our own eyes, but let us only check the number of files here, too.
[user@example ~]$ ls $HOME/HSC/rerun/tutorial/00762/HSC-G/output/SRC-*.fits | wc -l 520
There are only 520 = 5×104 files.
The number of CCDs per shot is, excluding those for auto-focus, not 112 but 104. Therefore this is normal.
Before mosaicking, create a “tract”. We cut the sky into chunks, and call each region a tract. A tract is projected into a plane. (Or, different tracts have different WCS.)
Though this is useful in a wide survey, it is inconvenient for us to use tracts that are cut from the sky simply offhand when we want to make a mosaic from data obtained by dithering a small region about a field of view.
And so, we will define a special tract that encompasses all the images we have to make a mosaic from: Discrete Sky Map.
[user@example ~]$ makeDiscreteSkyMap.py $HOME/HSC \
--calib=$HOME/HSC/CALIB_tutorial \
--rerun=tutorial \
--id visit=905496..905504:2
↑ If the following message is output finally:
2015-04-08T06:32:05: makeDiscreteSkyMap: tract 0 has corners (134.913, 23.031), (132.683, 23.031), (132.665, 25.084), (134.931, 25.084) (RA, Dec deg) and 11 x 11 patches
then tract 0 is defined properly.
Make a mosaic of tract 0. Mosaicking only determines positions, but it does not create images.
[user@example ~]$ mosaic.py $HOME/HSC \
--calib=$HOME/HSC/CALIB_tutorial \
--rerun=tutorial \
--id visit=905496..905504:2 ccd=0..103 tract=0
By mosaicking, magnitude zero-points and WCS have been determined. We make CCD images that include the calibration. (You can skip this stage if you do not need calibrated CCD images.)
[user@example ~]$ calibrateExposure.py $HOME/HSC \ --calib=$HOME/HSC/CALIB_tutorial \ --rerun=tutorial \ --id visit=905496..905504:2 ccd=0..103 tract=0 \ -j 4 # by 4 threads
↑In so doing, warnings might occur “Cannot open wcs files (that are on the edge of the FOV)” This means that mosaicking could not determine the positions of those CCDs on the edge of the tract. We can ignore it.
The calibrated images are output in↓
[user@example ~]$ ls $HOME/HSC/rerun/tutorial/00762/HSC-G/corr/0000/CALEXP-*.fits
Make catalogs that include the calibration, in the same way. (You can skip this stage if you do not need calibrated catalogs.)
[user@example ~]$ calibrateCatalog.py $HOME/HSC \ --calib=$HOME/HSC/CALIB_tutorial \ --rerun=tutorial \ --id visit=905496..905504:2 ccd=0..103 tract=0 \ --config doApplyCalib=False \ -j 4 # by 4 threads
The calibrated catalogs are output in↓
[user@example ~]$ ls $HOME/HSC/rerun/tutorial/00762/HSC-G/output/0000/CALSRC-*.fits
Though mosaicking has been done, coadd images have not yet been made. We make coadd images by calling stack.py
[user@example ~]$ stack.py $HOME/HSC \
--calib=$HOME/HSC/CALIB_tutorial \
--rerun=tutorial \
--id filter=HSC-G tract=0 \
--selectId visit=905496..905504:2 ccd=0..103 \
--batch-type=smp --cores=4
The coadd images are output in↓
[user@example ~]$ cd $HOME/HSC/rerun/tutorial/deepCoadd/HSC-G/0 [user@example 0]$ ls */calexp-*.fits 0,4/calexp-HSC-G-0-0,4.fits 5,1/calexp-HSC-G-0-5,1.fits 0,5/calexp-HSC-G-0-0,5.fits 5,10/calexp-HSC-G-0-5,10.fits 0,6/calexp-HSC-G-0-0,6.fits 5,2/calexp-HSC-G-0-5,2.fits : :
This many files are created though only the tract 0 is stacked. In fact, a tract is split into a small patches since the tract is so large that it can hardly be loaded into memory. Because the patches are simply split from a single large image, we can, using ds9 for example, view all the patches joined at a time (Beware that it requires more than 16GB of memory)。
[user@example 0]$ ds9 -mosaic wcs */calexp-*.fits
The whole tract will look like: stacked image (8MB; downscaled by 102)
Though the tutorial data contains only a single band, we force our way to multiband analysis: for this is a tutorial. Let us merge respective catalogs that have been made in multiple bands, and measure the merged catalog again.
[user@example ~]$ multiBand.py $HOME/HSC \ --calib=$HOME/HSC/CALIB_tutorial \ --rerun=tutorial \ --id tract=0 filter=HSC-G \ $(: filter=HSC-G^HSC-I^HSC-Z e.g. if there are several bands) \ --batch-type=smp --cores=4
The result of measuring each band independently with a common merged catalog↓
[user@example ~]$ ls $HOME/HSC/rerun/tutorial/deepCoadd-results/HSC-G/0/*/meas-*.fits
The result of measuring each band with positions (etc.) of objects fixed at common values↓
[user@example ~]$ ls $HOME/HSC/rerun/tutorial/deepCoadd-results/HSC-G/0/*/forced_src-*.fits
We project back to CCDs the positions of objects detected in the coadd images (to be precise, in the merged catalog), and measure them again in the non-stacked CCDs.
[user@example ~]$ forcedCcd.py $HOME/HSC \
--calib=$HOME/HSC/CALIB_tutorial \
--rerun=tutorial \
--id visit=905496..905504:2 ccd=0..103 tract=0 \
--batch-type=smp --cores=4
The resultant catalogs are written in:
[user@example ~]$ ls $HOME/HSC/rerun/tutorial/00762/HSC-G/tract0/FORCEDSRC-*.fits
That is all.
Suppose you want a large, whole image instead of chopped-up patches. Since hscPipe is a huge aggregate of Python libraries, you can use the libraries in Python scripts.
As an example, I paste below a script to make a large image from patches.
#!/usr/bin/env python # encoding: utf-8 # A library of AFW (Application FrameWork), related to images import lsst.afw.image as afwImage # A library of AFW, related to geometry import lsst.afw.geom as afwGeom import sys def main(): """ Arrange patches to make a large image, and save it in "large.fits" make-large-image PATCHES... (e.g. make-large-image *,*.fits) """ if len(sys.argv) <= 1: print "Input 1 patch or more." return 1 largeImage = makeLargeImageFromPatches(sys.argv[1:]) print "Saving (Wait a while)..." largeImage.writeFits("large.fits") def makeLargeImageFromPatches(filenames): """ @param filenames List of file names @return afwImage.ExposureF """ # Loading all image would be dangerous by memory limitation. # Load metric information only. bboxes = [] for filename in filenames: print "Loading a bounding box:", filename, "..." bboxes.append( # Get the bounding box of this patch # in the parent (= the tract) afwImage.ExposureF(filename).getBBox(afwImage.PARENT) ) # Calculate the bbox that includes all the bboxes. # Return values of getMin/Max method family are inclusive # (range = [min, max]) cf. getBegin/End (used below) x0 = min(bbox.getMinX() for bbox in bboxes) y0 = min(bbox.getMinY() for bbox in bboxes) x1 = max(bbox.getMaxX() for bbox in bboxes) y1 = max(bbox.getMaxY() for bbox in bboxes) width = x1 - x0 + 1 height = y1 - y0 + 1 # Prepare a large image print "Preparing a large image: ({}, {})".format(width, height) exposure = afwImage.ExposureF(width, height) # Set the position of this image in the tract exposure.setXY0(afwGeom.Point2I(x0, y0)) # Extract the image data from the exposure largeImage = exposure.getMaskedImage() # Paste patches into largeImage for filename in filenames: print "Loading a patch:", filename, "..." # Load a patch again patch = afwImage.ExposureF(filename) bbox = patch.getBBox(afwImage.PARENT) # The position of this patch within largeImage: # Return values of getBegin/End method family are one past the end # (range = [begin,end) ) cf. getMin/Max (above) xp0 = bbox.getBeginX() - x0 yp0 = bbox.getBeginY() - y0 xp1 = bbox.getEndX () - x0 yp1 = bbox.getEndY () - y0 # Paste the patch (As for the notation in [], see numpy's document) largeImage[xp0:xp1, yp0:yp1] = patch.getMaskedImage()[:] # Blindly trust that all the patches have the same WCS information etc, # and set arbitrary one of them in the large image. if not exposure.hasWcs() and patch.hasWcs(): exposure.setCalib (patch.getCalib ()) exposure.setFilter (patch.getFilter ()) exposure.setMetadata(patch.getMetadata()) exposure.setWcs (patch.getWcs ()) return exposure if __name__ == "__main__": main()
↑Save the script in make-large-image.py, and you can use it as follows:
[user@example ~]$ setup-hscpipe [user@example ~]$ python make-large-image.py \ $HOME/HSC/rerun/tutorial/deepCoadd-results/HSC-G/0/*/calexp-*.fits
↑ Execute it in a machine with more than 16GB of memory. Or, Linux may force a double suicide with the script. Viewing the output large image requires that amount of memory, after all.
Caution: ↑The above script is not very good at coping with the mask layer. It may be that in one patch, the bits of the mask layer are assigned meanings (“Saturated&rquo;, “Cosmic ray&rquo;, &c) that are different from those assigned in another patch. The meanings of the mask bits are inscribed in the FITS header of an extension EXTNAME=MASK. Correctly, you should make the script read the FITS header to permutate the mask bits. (This is beyond the scope of this tutorial.)
Command lines of hscPipe commands have a common syntax. Let us take reduceFrames.py for example:
[user@example ~]$ reduceFrames.py $HOME/HSC \
--calib=$HOME/HSC/CALIB_tutorial \
--rerun=tutorial \
--id visit=905496..905504:2 \
--config processCcd.isr.doBias=False \
processCcd.isr.doDark=False \
processCcd.isr.doFringe=False \
--batch-type=smp --cores=4
The first thing you have to know is the last line. Commands named reduce** are, basically, commands to post jobs into a job queue (Torque PBS &c.). But if you do not have such a system, you can execute the job immediately on the local computer, by the --batch-type=smp option. You have to specify the number of threads to use by --cores=.
[user@example ~]$ reduceFrames.py $HOME/HSC \
--calib=$HOME/HSC/CALIB_tutorial \
--rerun=tutorial \
--id visit=905496..905504:2 \
--config processCcd.isr.doBias=False \
processCcd.isr.doDark=False \
processCcd.isr.doFringe=False \
--batch-type=smp --cores=4
Next, let us focus on the --config option. You can change configuration of analysis here. To know available parameters and their default values, append --show config to the command line and execute it.
If you change the configuration, the program will refuse to run.
If you change the configuration, the program will refuse to run.
This is so important that it is worth saying twice. The program will refuse to run because it is undesirable that there coexist data made with different analysis parameters in a single rerun.
When the program does not run, append --clobber-config to the command line. Thereby the cached configuration will be overwritten.
[user@example ~]$ reduceFrames.py $HOME/HSC \
--calib=$HOME/HSC/CALIB_tutorial \
--rerun=tutorial \
--id visit=905496..905504:2 \
--config processCcd.isr.doBias=False \
processCcd.isr.doDark=False \
processCcd.isr.doFringe=False \
--batch-type=smp --cores=4
Lastly, --id. Put conditions here with which to search for input data. Multiple conditions specified will be connected by AND. For instance, if you specify --id field="DEN_C" filter="HSC-G", then all data will be selected that are in the field "DEN_C" and of the band "HSC-G". "field" and "filter" are column names of tables in the registry.
Though it is intuitive to use field &c. instead of specifying visits directly, you will be confused which data were input or not if, e.g., FITS headers should be wrong. Be careful.