2.1 Why Automate Data Management?

A tidal wave of data is approaching astronomy, requiring radical new approaches to database management and data reduction (Lawrence et al.2000). Forseeable explosion points are:

  • 2002, when the Liverpool Telescope (LT), (see section 2.2), is commissioned. In a typical night a robotic telescope may take several hundred frames (Bode1995), easily pushing the raw-data storage requirement per night into the realms of Giga-bytes.
  • 2003 and 2006 will see the United Kingdom Infrared Telescope Wide Field Camera (UKIRT WFCAM) and the Visible and Infrared Telescope for Astronomy (VISTA) come on-line and start producing a over a Terra-byte of data every night (Lawrence et al.2000).

Bridger et al. (1998) stress that the benefits of reducing data in near-real time are many, including more efficient use of telescope time, as it is a crucial component in allowing observers to assess the quality of data as observing is taking place, and a higher publication rate for the data due to the speed of reduction. They also point out that there is a great desire in the astronomical community to expand the automatic reduction of data.

Typically data will be in the form of images or spectra. A typical image is digitized to 16 bits or 216 = 65536 levels, with eight bits making a byte. With commonplace CCD arrays of 2048 × 2048 pixels a single image is 8 Megabytes in size. To maximize productivity, pipeline reduction should be used for all but the most demanding of programmes (Bohannan2001), and ideally the data should be fully reduced on-site (Bode1995). Among the reasons for this are (i) it removes the need to transfer data from the acquisition site to the reduction site, usually via DAT tape, instantly doubling the storage space needed, as most observatories archive the data for a period. (ii) The need to manually reduce the data is removed, saving time in producing and analysing results. (iii) It increases the ability of electronic transfer of data, fully reduced data need only be in the form of a text file, e.g., listing the object and magnitude, which is only a few 10s of Kbytes, where as reduced data can be >8 Megabytes in size.

Add to this already huge demand on storage the need to generate and archive intermediate images, which are created during data reduction (and can number 3-4 images per single observation), and storage soon becomes a burden. The possibility, that having made mistakes or new scientific aims come to light, that the reduction will need to be redone from some intermediate state is always present. As such astronomers are generally unwilling to delete intermediate files, due to the time constraints in creating them, until such time that they become obsolete (i.e., the work is peer reviewed and published).

This immense amount of data is indicative of a new style of science that is fast emerging, one that requires the manipulation and reduction of vast datasets. This may be because the problem requires massive analysis (e.g., the spatial clustering power spectrum of a billion galaxies), or because time critical events or objects need to be found (e.g., fast response to Gamma ray bursters), or perhaps examination and exploration is required to create new ideas (Lawrence et al.2000). If no new research is done in this area the technology and techniques currently used to manipulate data will rapidly become outdated. The effort needed in this field must be on a similar scale to that which is driven to create the data otherwise the astronomical community will be creating a large back-log of data.

One way to help deal with this problem is automated data reduction; a reliable system that carries out fast reproducible standard data reduction, at the facility which creates the data, producing results for immediate scientific analysis and cutting storage by up to 400%