Facilities > VLA > Data Processing > Pipeline

VLA CASA Calibration Pipeline

by Gustaaf Van Moorsel last modified Feb 09, 2017 by Drew Medlin

General Description

The VLA calibration pipeline performs basic flagging and calibration using CASA. It is currently designed to work for Stokes I continuum data, but may work in other circumstances as well.  Starting with the D-configuration, Semester 2013A, a version of the calibration pipeline (the VLA scripted calibration pipeline) has been run automatically at the completion of all astronomical scheduling blocks (SBs) except for P-band observations, and the resulting calibration tables and flags are archived for future use. Beginning September 2015 we are now moving to the CASA pipeline that is integrated with CASA releases, and has an infrastructure in common with the ALMA pipeline.  This pipeline includes improved diagnostic plots compared with the scripted pipeline, and per-spectral window reference images of calibrators.  Investigators are notified when the calibrated data are ready for download, and detailed quality assurance checks can be performed by NRAO staff upon request.  The calibrated visibility data are retained on disk for 15 days after the pipeline has completed to enable investigators to download and image at their home institution.  Calibrated data can also be recreated after this nominal time period by re-generating the measurement set and applying the saved calibration and flag tables. This webpage primarily relevant for the CASA Integrated Pipeline that is available with CASA 4.7.1. However, many things here apply to the version available for CASA 4.5.3 and 4.3.1.

The VLA calibration pipeline runs on each completed SB separately; there is currently no provision for it running on collections of SBs.  The pipeline relies entirely on correct scan intents to be defined in each SB.  In order for the pipeline to run successfully on an SB it must contain, at minimum, scans with the following intents:

  1. a flux density calibrator scan that observes one of the primary calibrators (this will also be used as the delay and bandpass calibrator if no bandpass or delay calibrator is defined)
  2. complex gain calibrator scans.

The SB may also contain scans to be used specifically for bandpass and delay calibration, if desired.  However, if multiple fields are defined as bandpass or delay calibrators, only the first one will be used by the pipeline.  Note that a single scan or field may have multiple intents specified.  Scans intended to be used to set attenuators or requantizer gains should have scan intents of setup intent; they must not have scan intents that may result in their being used for calibration.

In overview, the VLA calibration pipeline does the following:

  • Loads the data into a CASA measurement set (MS), applies Hanning smoothing to them, and obtains information about the observing set-up from the MS
  • Applies online flags and other deterministic flags (shadowed data, end channels of sub-bands, etc.)
  • Prepares models for primary flux density calibrators
  • Derives pre-determined calibrations (antenna position corrections, gain curves, atmospheric opacity corrections, requantizer gains, etc.)
  • Iteratively determines initial delay and bandpass calibrations, including flagging of RFI and some automated identification of system problems
  • Derives initial gain calibration, and derives the spectral index of the bandpass calibrator
  • RFI flagging is done on data with the initial calibration applied
  • Derives final delay, bandpass, and gain calibrations, and applies them to the data
  • Runs the RFI flagging algorithm on the target data
  • Runs "statwt" to calculate data weights based on the RMS noise
  • Creates diagnostic images of calibrators

For more information about the pipeline calibration, please see the slides from the 2016 Data Reduction Workshop describing the scripted pipeline for a general overview; the CASA integrated pipeline performs the same steps, but includes more diagnostic plots than the scripted pipeline.

The VLA pipeline CASAguide provides an example run with a detailed explanations of how to run, interpret and modify the VLA pipeline and its products.

For many Stokes I continuum observations the VLA calibration pipeline will produce well-calibrated data that may only need minor further flagging.  In some circumstances more attention may be required, for example: S-band observations of fields close to the geostationary satellite belt. For more information about additional flagging, please see the CASA Integrated Pipeline Special Topics page.

For more detailed information about each task of the pipeline, you may examine the help file in CASA for each pipeline task. For example, in CASA, you could type:

help hifv_importdata

to see the CASA help file and available options for the import data step of the VLA pipeline. A complete list of tasks used by the VLA (and ALMA) pipeline may be found by typing tasklist at the CASA prompt and looking under the "User defined tasks" for all of the hifv_*, hifa_*, hsd_*, hif_*, and h_* tasks (not all of these are used by the VLA pipeline).

Please submit comments, questions, and suggestions via Pipeline Department of the NRAO Helpdesk.


Obtaining the Pipeline

Current version (CASA-Integrated):

The pipeline is now part of every other CASA release, starting with CASA 4.5.3: please see the Obtaining CASA webpage and download the desired VLA Pipeline version of CASA. The current version is CASA 4.7.1, which uses Pipeline Cycle4-R2-B. Once downloaded, please see the instructions below for running the pipeline.

Scripted pipeline:

This scripted pipeline version may continue to have updates in addition to the integrated versions for those who require greater flexibility to modify their reduction production procedures within the overall calibration framework developed by NRAO. If you are interested in obtaining the scripted pipeline, please see the Scripted Pipeline page.


Running the Pipeline


Stokes I Continuum


To run the CASA Integrated Pipeline on datasets that contain only continuum data (64 MHz or 128 MHz spectral windows):

    • Put your data (SDM-BDF or measurement set) in its own directory for processing.  For example:
    mkdir myVLAdata
    mv mySDM myVLAdata/
    cd myVLAdata
    • Start CASA from myVLAdata (this is important — do not try to run the pipeline from a different directory by giving it the full path to a dataset, as some of the CASA tasks require the MS to be co-located with its associated gain tables; also, do not try to run the pipeline from inside the SDM-BDF or MS directories themselves).  It is also important that a fresh instance of CASA is started from the directory that will contain the SDM-BDF or MS, rather than using an existing instance of CASA and using "cd" to move to a new directory from within CASA, as the output plots will then end up in the wrong place and potentially overwrite your previous pipeline results.

      To start the current, released version of CASA with the pipeline from the NMASC, type:


        For all other installations, type:

    /path/to/installation/bin/casa --pipeline
    • From the CASA prompt, type:
    import pipeline.recipes.hifv as hifv

    and then:

    • The pipeline will then proceed to calibrate your data, including a step to Hanning smooth your data (this can be important for strong, narrow-band RFI, but there are some situations where it is not desirable: for low frequencies in A-configuration it may increase bandwidth smearing, and for spectral line observations it will make the spectral resolution worse). If you would like to run the pipeline without Hanning smoothing, please see the special topics page and find the additional flagging instructions steps 5 and 6 in place of step 3 above.
    • Go and make some coffee.
    • The pipeline will automatically generate a weblog as it progresses: look for a directory named "pipeline-YYYYMMDDTHHMMSSS" (where YYYYMMDDTHHMMSSS is the date and time of pipeline execution) and navigate to  "pipeline-YYYYMMDDTHHMMSSS/html/index.html" when it is created.



    Spectral Line Data

    We are currently working on the heuristics for spectral line observations with this new version of the pipeline. Until these are ready and tested,  we recommend using the scripted pipeline for spectral line data. Please see the Scripted Pipeline page for help modifying the scripts for spectral line compatibility.

    Mixed set-ups

    In the case where a mixed continuum/spectral line set-up has been used, or multiple receiver bands have been observed, it may be the case that a single pipeline heuristic (e.g., gain calibration solution interval) is not appropriate for the entire dataset.  In this case, the MS can be split by correlator set-up/receiver band (typically specified by selecting on spectral windows or scans) after applying the online flags, and then running the pipeline on the split datasets individually.

    Rerunning the pipeline

    After running the pipeline, you may find that additional flagging may be required (resulting in the need to re-derive calibration) or that you may wish to rerun the pipeline again on the raw data with some different parameters for the pipeline. Instructions for both of these cases may be found on the special topics page and in the slides from the 2016 Data Reduction Workshop presentation describing extra flagging.


    Special Cases


    Incorrect scan intents

    If you would like to run the VLA calibration pipeline on your data but you set up the scan intents incorrectly at the time of observation or with different sources as specific calibrators, we recommend using the Scripted Pipeline.

    Features and limitations

    This section will be updated as the pipeline continues to be developed. The comments below apply to the following pipeline versions: r39437 (Pipeline-Cycle4-R2-B).

    • The VLA Calibration Pipeline produces a dataset that can be used as a starting point by the user for self-calibration and imaging of science targets.  Note that the pipeline does not do any self-calibration of fields that are not specified as calibrators.
    • Although the CASA task "fluxscale" uses medians to do the flux density bootstrapping, which is fairly robust, in some cases (especially at high frequencies, for datasets that may have suffered from pointing or other problems) the default flux density scale may be uncertain.  For projects that may require very accurate flux density scales, it is advisable to run the scripted pipeline.
    • The pipeline effectively calibrates each spectral window separately.  This means that the signal-to-noise ratio (s/n) on the calibrators needs to be sufficiently high that solutions can be obtained within the solution intervals specified in the pipeline.  For the delay calibrator, this means a s/n>3 per integration time, t(int); for the bandpass calibrator, a s/n>5 is required for solution times up to 10*t(int); for the gain calibrator a s/n>3 is required for solution times up to 10*t(int), and >5 for a scan average.  Following the guidelines for the strength of gain calibrators as a function of observing bandwidth in the VLA Observational Status Summary for the high frequency end of Q-band will be reasonably safe for all frequencies, although very narrow spectral channels may be problematic.  In these cases the data may need to be calibrated by hand, using polynomial bandpass fitting instead.
    • The pipeline is not compatible with P band observations (even if the SB is P band only) due to the lack of models in CASA for P band. For this case, the pipeline would assume a 1 Jy, flat spectral index model making all flux densities derived by the pipeline arbitrary.
    • The CASA pipeline is not designed to handle spectral line observations, polarization calibration, or OTF mosaicking, though we are working on adding these capabilities in the future.


    Applying Pipeline Calibration to Raw Data


    To apply the calibration and flag tables produced by the pipeline, we recommend using the same version of CASA used by the pipeline as well as the same version of the pipeline. The version of both the pipeline and CASA used may be confirmed via the Observation Overview Page, which should be included in the weblog.tgz file available for download as part of the pipeline products. We recommend starting with a fresh SDM-BDF, though a fresh MS should work, too. There may be small differences in the final result or statistics if online flags were applied when requesting the MS.

    In order for the pipeline to work properly, please follow the steps below to prepare your directory and calibration files for application. The pipeline products you should have ready are the raw SDM-BDF, unknown.session_1.caltables.tar.gz, *.ms.flagversions.tar.gz, *.ms.calapply.txt, and casa_piperestorescript.py. To ensure everything is correctly placed for the script to apply calibration, please follow these steps:


      1. Create a directory where you will work, call it something like "restoration".
    mkdir restoration
      1. Go into your restoration directory and create three new directories named exactly as follows:
    mkdir rawdata
    mkdir working
    mkdir products
      1. Place the raw SDM-BDF into the "rawdata" directory.
        mv /path/to/fresh/data/SDM-BDF rawdata/
      2. Place unknown.session_1.caltables.tar.gz, mySDM.ms.flagversions.tar.gz, and mySDM.ms.calapply.txt into the "products" directory
        mv *.tar.gz products/
        mv *.txt products/
      3. Place the casa_piperestorescript.py file into the "working" directory
    mv casa_piperestorescript.py working/
    1. Edit the casa_piperestorescript.py file to include "../rawdata/" before the name of the SDM-BDF (mySDM) in the call to hif_restoredata:
    2. From the "working" directory, start CASA
    3. Start the restore script from the CASA prompt:
    4. Enjoy calibrated data once the process finishes