5.9 The Off-line Pipe-line: Version 2

Version 2 steps away from its 100% bespoke ancestry to incorporate software that was written to perform a specific and isolated task, this is in contrast to the LT PL code that was originally written to perform the job as part of a much larger program. The non-bespoke software is easily integrated into the PL due to the modular nature of code. A removed procedure is easily replaced with an external software library routine or program call. However some recoding is necessary to provided the variables required by the new code.

5.9.1 Object Detection

During the coding of version 2 of the PL a new version of Source Extractor (SExtractor, see Section 5.9.1) was released (SExtractor V2.2.1), this release maintains to have fixed several bugs (most of which seem to have been invisible to the user) and improved the program’s stability. A more in-depth investigation of this software and its use by the astronomical community, using the SExtractor newsgroup (see Andreon et al.2000Bertin1999), revealed that the package had been extensively tested and was in large scale use by the community. Sabbey et al. (2001) use the package in a reduction PL to detect objects. Since the SExtractor software had been widely tested and accepted, and its detection of objects is quicker when tested against the online PL routines it was integrated into the PL.

It should also be noted that the ING WFS have also started using SExtractor (see McMahon et al.2001).

Software for Source Extraction
The analysis of an image by SExtractor (Bertin1999) is done in five steps, summarised here they are distilled from Bertin (1999); Bertin and Arnouts (1996).
  • Background Estimation: SExtractor computes a local background estimator in each square of a grid which covers the whole frame, the grid size can be user set. A histogram is then produced with the mean value being the global background for non-crowded fields (in a similar manner to the LT online PL). Crowded fields have their background calculated using a mode estimation:
    mode  = 2.5× median - 1.5 × mean.
    (5.25)
    The field is considered non-crowded if the background standard deviation (s) changes by less 20% when the histogram is clipped such that all values fall within 3s.
  • Detection: Objects are detected using thresholding, again in a similar manner to the LT PL. 8-Connected and contiguous pixels, above the threshold, is considered a detection, false detections are limited by convolving the image frame with a convolution mask such as a point spread function Gaussian for stellar detections.
  • De-blending: The separation of 2 independent, yet over lapping objects is performed by searching for “saddle points” in their profiles. The de-blending is then accomplished using the convolution mask as a template for “hidden” data.
  • Cleaning: Filtering “fake” objects, (e.g., objects are often found in the wings of a bright stellar detection due to there being a brighter local background there) is performed by stacking detections. Each object has its brightness calculated with its neighbours removed, if this value is still above the threshold then it is considered a real detection.
  • Star/Galaxy Separation: SExtractor uses a neural net to compute whether an object is stellar (returning 1) or a galaxy (returning 0). This value is recorded by the LT PL in the data file returned to the astronomer, this is at the request of the LT astronomy-board of directors, however no research has been carried out into its accuracy. However independent research has been carried out (see e.g.,  Andreon et al.2000).

5.9.2 CFITSIO

By the end of 1998 the FITSIO package (see Section 3.1.1) had evolved to a stage where it superceded the expectations of the LT PL FITS software modus operandi. Pence (1998) outlines some of the advantages and advances in the software library, some which are integrated into the discussion below.

The FITSIO library has been re-developed and re-written in C and is known as “CFITSIO”, allowing easier compilation and calling routines for C developers than the FORTRAN 77 wrappers. CFITSIO is optimized for maximum input/output performance achieving through-puts of the order of 10MB/s on current generation workstations. An important new feature only available in CFITSIO is the ability to read/write FITS files directly in computer memory, Pence stresses that this can greatly reduce the amount of time necessary for PL processing as the need to read/write temporary FITS files on magnetic disk is eliminated. The ability to read compressed FITS files directly (i.e., without uncompressing) is now implemented and with FITS files readily compressing by 1/3 this easily reduces storage requirements.

The decision to incorporate the CFITSIO package into the LT PL was taken at this point. While a substantial amount of code was removed from the individual pipes, and replaced with external subroutine calls, the transition was easily accomplished due to the structured nature of the PL code. It is possible to install only the CFITSIO code for the routines that will be used, however since both the CS and online PL have also made the transition to the new CFITSIO software the decision was made to install the whole package.

5.9.3 World Canvas Construction

Version 1 showed that the construction of a world co-ordinate system in the data files could not be left solely to the telescope pointing. In an effort to improve upon this an attempt was made at writing a program to compare a star catalogue to the FITS image in hand. The catalogue chosen for this purpose is the USNO A2.0 catalogue.
The United States Naval Observatory Catalogue
The USNO-A2.0 is a catalogue of 526,280,881 stars (Monet et al.1998), and is based on a re-reduction of the Precision Measuring Machine (PMM) scans that were the basis for the USNO-A1.0 catalogue. The A2.0 is used as an astrometric catalogue for the LT PL. The major difference between A2.0 and A1.0 is that A1.0 used the Guide Star Catalogue (see Lasker et al.1990) as its reference frame whereas A2.0, uses the Hipparcos reference frame (Monet1998) (see Urban et al.1997). The astrometric values in A2.0 are in J2000 format and cover stars down to 20 magnitude.

Currently under construction is the USNO-B (see Monet2000), it will include proper motions and star/galaxy classifications. This may be an upgrade option for the LT PL when completed.

The physical size of the catalogue is of the order of 8 Giga-bytes, it is constructed of 24 “cat” files written in binary, which contain the data, and 1 accelerator file written in plain text that outlines the regions covered by each cat file, thus allowing the correct zone to be rapidly accessed.

Pattern Matching
What follows is a discussion behind the algorithm designed to match the FITS image to the catalogue.

As a first attempt it is assumed that there is no rotation inherent in the file so that North is directly up and East points left, with pixel numbers increasing to the top and left of the image. Even though the pointing of the telescope may be off it is the only reference point available, therefore it must be used. The FITS keywords for RA and Dec are read and compared with the catalogue accelerator file. The correct cat file may then be accessed.

Firstly the RA and Dec of the telescope pointing are taken as the reference points and given the pixel values of the image centre, with the CCD pixel scale known the conversion of the catalogue sources to pixel values can also be easily made. The number of stellar sources in the fits file is then calculated, the top 10 brightest are chosen if there is a large number to minimise computing time. From the catalogue file the true star pattern may then be ascertained by calculating,

i= sum 10
   (C - di)2
i=0
(5.26)
where C is the pattern centre, and assumed in the case of the catalogue to be the telescope pointing, and di is the distance to the object star in question. Only the catalogue’s brightest 10 stars within the telescopes field of view, plus a small tolerance, are included in the calculation. Equation (5.26) may now be minimised by moving C through each pixel of a square grid whose side is the size of the worst recorded telescope pointing inaccuracy (~ 25''). The grid square with the minimum value for Equation (5.26) is then taken as the image reference point from which all other distances can be calculated.

This approach has met with limited success, the routine is initially, and most crucially, used on the standard star frames of the observations. This is because the locations of these stars is well documented and most accurately needed to generate photometric zeropoints. The test data primarily used the Landolt (1992) PG0918 field located at ~09h21m28s.30 +02o46'03''.58 (J2000) containing 4 stars which describes a square pattern on the sky with an additional star offset at both the top right and bottom left of this square. The procedure works for approximately half of the 29 standards observed, with 6 of the remaining showing significant skew or rotation which was not accounted for. Four of the files which did not map but do not show rotation are Z band images, notably only 3 Z band images were mapped. There appears no clear reason for this lack of reduction in the Z band.

5.9.4 The LT Photometric Catalogue

The LT photometric catalogue is a simple ASCII text file of standard stars which contains the following data:
  • Star name: Purely for user efficiency, not required by the PL code
  • RA & Dec: Of the stellar source
  • USN0 A2.0 Number: The unique catalogue number of the star which can be used to locate it in the catalogue
  • Magnitude: The magnitude of the source in all LT filters
  • Error: The error on the magnitude in all LT filters

This catalogue is required as the USN0 A2.0 catalogue only contains magnitudes in B and V, and the standard magnitude is required to generate accurate zeropoints.

5.9.5 The Optimal Extraction Technique for Imaging Photometry

Aperture photometry will not work if the flux from one star contaminates the aperture of the star of interest. To overcome this profile fitting has been developed allowing overlapping profiles to be separated. Profile fitting is analytical however and it is unclear what effect mismatching an analytical function to an observed object has, although the scheme is quick to use. Naylor (1998a) presents a method to overcome the problems presented by aperture and profile fitting photometry, that of optimal photometry.
The “Optimal” Extraction Algorithms
The following description is distilled from, and as such draws upon the nomenclature of, Eaton et al. (1999); Horne (1986); Naylor (1998a,b), but is presented as a simplified interface to the process of the optimal photometry used in the LT PL.

A general formula for calculating the total flux within an aperture is:

     sum 
F =    wi,j(Di,j- Si,j)
     i,j
(5.27)
where the sum is over all pixels in the aperture, Di,j is the total data counts in a pixel and Si,j is the total sky counts in a pixel. wi,j is the weight for each pixel, needed because the flux intensity is not constant across a given pixel. For aperture photometry the weights are simple, 1 within the aperture and 0 outside it. The variance (the square of the error) is then simply the sum of the variance of each pixel
          sum 
var[F] =    Vi,j
         i,j
(5.28)
where V i,j is the variance in each pixel and the error is the square root of the signal in that pixel. However to get optimum signal-to-noise, there must be a more complex weighting procedure. Optimization of the weights depends on the spatial distribution of the detected flux, the PSF (Pi,jI). As pointed out earlier this should be thought of as a measure of the probability that a photon is detected in pixel i,j rather than some arbitrary position or the fraction of light that falls on a given pixel. Therefore:
     D   -  S
fi,j =--i,j-I--i,j,
        P i,j
(5.29)
Where fi,j is the flux in a given pixel. Thus each pixel can be used as a measure of the total flux in the profile. In the absence of noise Equation (5.29) would be exact, this however is not the case and the weighting factor must be included, it is also necessary to divide by the sum of the weights to keep the normalisation correct, this gives;
      sum 
     --i,j wi,j-(Di,j---Si,j)/PiI,j
F =           sum   wi,j        .
               i,j
(5.30)
It is now left to choose the weights which minimise the variance of F. Since F is a weighted average of Equation (5.29), and fi,j have identical mean values but different variances, the variance of F can be minimised by choosing weights that are inversely proportional to the variance of the individual random variables:
         [            ]
-1--= var  (Di,j---Si,j)  = (-Vi,j)--.
wi,j            PIi,j         P E  2
                             i,j
(5.31)
Where Pi,jE is an estimate of the PSF and has been used in place of the true PSF (Pi,jI). Equation (5.30) may now be re-written as
     sum 
    ---i,j PiE,j(Di,j--Si,j)/Vi,j
F =      sum    ( E )2
          i,j  Pi,j  /Vi,j
(5.32)
This can be simplified to (compare with Equation (5.23));
F =  sum   W   (D   -  S  ),
          i,j  i,j    i,j
     i,j
(5.33)
where Wi,j is equal to
           P E/V
Wi,j = ----(i,j--i),j----.
        sum     PE  2/Vi,j
         i,j   i,j
(5.34)
The variance of the measured flux is then (compare with Equation (5.28)) given by
         sum   Vi,j-   sum     2
var[F ] =    Pi2,j =     W i,jVi,j.
         i,j        i,j
(5.35)
Practically there will be a difference between the estimated PSF (Pi,jE), and the actual PSF (Pi,jI), meaning that Equation (5.33) and Equation (5.34) will not give a good flux estimate. However Naylor (1998a) shows that this mismatch can be eliminated and states that this immunity from mis-matching the PSF is one of the key advantages of the optimal extraction technique over profile fitting.

To demonstrate this consider a mismatch between the PSFs, an estimate of the flux can be given by substituting the actual PSF for (Di,j - Si,j) in Equation (5.32), compare with Equation (5.23)):

       sum     E  I
F = T --i,j P(-i,jP)i,j/Vi,j.
       sum      E  2
        i,j P i,j  /Vi,j
(5.36)
If denominator and numerator are equal, then whilst the actual flux will be wrong the ratio of flux between two stars will be correct and differential photometry can be achieved, in direct analogy with Section 5.8.3. However this will, in general, not be the case as the equation contains the variances which will be different for different stars, differential photometry is therefore not valid. The only solution to this is to use the same weights for all stars whence differential photometry will then be valid. This method however will not be optimal for all targets so the way to ensure the best flux ratios is to use the weights of the faintest star of interest.

The flux produced by the optimal photometry routines, which is sufficient for differential photometry, must be corrected to the true flux level. This is termed a “profile correction” and carried out in the following way; Along with optimal photometry each object is additionally processed using aperture photometry. The aperture size used is 23 FWHM, which is empirically shown to produce the best signal-to-noise for aperture photometry by Naylor (1998a). The aperture signal-to-noise (S/N) of each object is calculated and those ten objects with the highest values selected. Those objects then have their optimal flux to aperture flux ratio calculated and averaged. This value is then used to correct all optimally calculated fluxes.

The algorithms necessary to perform optimal photometry have been coded in to routines by Naylor (1998a). These routines were initially incorporated into the LT PL. However during implementation of the PL it became apparent that several sections of the code were liable to problems (e.g., numerical overflows and underflows, memory allocation errors and unexpected input parameters). Over a serious of months and in collaboration with Tim Naylor, the code was refined to guard against these errors, the “official” code now incorporates these changes.

The original optimal photometry package was written to perform time-series photometry and heavily incorporated a programing environment known as ARK: The ARK is a collection of common user programs originally written by the Oxford High Energy Astrophysics Group and incorporated the data reduction system, for spectroscopic data, used by Alan (Smale), Robin (Corbet) and Koji (Mukai); hence the name ARK. To enable automation, and operation on the LT computer suite new code which dynamically extracts data from FITS files and removes all trace of the ARK format from all procedures was implemented. This involved completely re-writing the command level code and editing the routines which implement the optimal photometry algorithms.

This program is the only part of the LT PL not to be written in the C programming language and is instead written in FORTRAN 95. This decision was taken for a multitude of reasons. (i) The original code was written in FORTRAN 95 by Naylor, and those routines have been tested, a re-write in C would have involved substantial re-testing. It is however possible to call FORTRAN routines from a C program. (ii) The Solaris operating environment allows complete integration of FORTRAN 95 and C, to the point where the internal maths libraries etc. are the same routines-written in C with FORTRAN callable wrappers, this is invisible to the coder. The integration is not yet implemented in the PL. (iii) FORTRAN is faster for numerical work. The complete photometry package is of the order of 5000 lines of FORTRAN 95 code.

5.9.6 Airmass curve creation

Whilst Version 1.0 of the PL was able to accurately construct airmass curves it was unable to make the leap to scientifically removing points, or counter any change of photometricity throughout the night.

Version 2.0 attempts to counter for changing photometricity by adding a further correction. Once the standard airmass curve has been constructed and rogue points removed, still by hand at this stage, the residuals of the airmass curve about the least-squares-fit are plotted against the Modified Julian Date (MJD), a decimal number that uniquely identifies a period of time. If the photometricity of the night has remained constant the graph will display a straight, horizontal, line through zero. Any deviation from this shows a change in the nights photometric conditions. To first order the photometric conditions might change linearly from the start of the night to the end of the night. By fitting a least-squares-fit to this plot of (zeropoint versus airmass)-residuals against MJD an ordinate axis value can be interpolated from the MJD of an object. This ordinate axis result is a secondary zeropoint (in units of magnitude) and can be added to the primary zeropoint to correct for any change in photometricity before the magnitude is calculated.

While a linear change in photometric conditions can be countered in this manner a more diverse change is more difficult. Primarily a polynomic function is difficult to fit with so few points; keep in mind that the number of points will match the number of standards taken each night.

5.9.7 Conclusion: Version 2

The most substantial change in version 2 has been the implementation of optimal photometry which is able to overcome the problems presented by aperture photometry and profile fitting. The addition of CFITSIO to the code has proved to give faster and more efficient manipulation of FITS.