VLA > VLA Scripted Calibration Pipeline

# VLA Scripted Calibration Pipeline

## General Description

The VLA calibration pipeline performs basic flagging and calibration using CASA. It is designed to work for Stokes I continuum data, but may work in other circumstances as well.  Starting with the D-configuration, Semester 2013A, the scripted pipeline was run automatically at the completion of all astronomical scheduling blocks (SBs) until 2015-09-08 (at which time the CASA Integrated Pipeline was used), and the resulting calibration tables and flags are archived.  The pipeline products undergo quality assurance checks by NRAO staff, and investigators are notified when the calibrated data are ready for download.  The calibrated visibility data are retained on disk for 15 days after the pipeline has completed, to enable investigators to download and image at their home institution.  Calibrated data can also be provided after this nominal time period, by re-generating the measurement set and applying the saved calibration and flag tables. Development of this Scripted Pipeline is no longer ongoing, but the versions available below may still be useful.

The VLA calibration pipeline runs on each completed SB separately; there is currently no provision for it running on collections of SBs.  The pipeline relies entirely on correct scan intents to be defined in each SB.  In order for the pipeline to run successfully on an SB it must contain, at minimum, scans with the following intents:

1. a flux density calibrator scan that observes one of the primary calibrators (this will also be used as the delay and bandpass calibrator if no bandpass or delay calibrator is defined)
2. complex gain calibrator scans.

The SB may also contain scans to be used specifically for bandpass and delay calibration, if desired.  However, if multiple fields are defined as bandpass or delay calibrators, only the first one will be used by the pipeline.  Note that a single scan or field may have multiple intents specified.  Scans intended to be used to set attenuators or requantizer gains should have scan intents of setup intent; they must not have scan intents that may result in their being used for calibration.

In overview, the VLA calibration pipeline does the following:

• Loads the data into a CASA measurement set (MS), applies Hanning smoothing to them, and obtains information about the observing set-up from the MS
• Applies online flags and other deterministic flags (shadowed data, end channels of sub-bands, etc.)
• Prepares models for primary flux density calibrators
• Derives pre-determined calibrations (antenna position corrections, gain curves, atmospheric opacity corrections, requantizer gains, etc.)
• Iteratively determines initial delay and bandpass calibrations, including flagging of RFI and some automated identification of system problems
• Derives initial gain calibration, and derives the spectral index of the bandpass calibrator
• Derives final delay, bandpass, and gain calibrations, and applies them to the data
• Runs the RFI flagging algorithm on the target data
• Runs "statwt" to calculate data weights

For many Stokes I continuum observations the VLA calibration pipeline will produce well-calibrated data that may only need minor further flagging.  In some circumstances more attention may be required, for example, S-band observations of fields close to the geostationary satellite belt. For more information about additional flagging, please see the Application of Pipeline Calibration slides from the 2014 Data Reduction Workshop.

## Obtaining Scripts

The Scripted VLA calibration pipeline comprises a set of python scripts that may be downloaded and run using CASA 5.0.0 (pipeline v1.4.0),  CASA 4.7.2 (pipeline v1.3.11), CASA 4.7.1 (pipeline v1.3.10), CASA 4.7.0-1 or 4.7.0 (pipeline v1.3.9), CASA 4.6.0, CASA 4.5.3 (pipeline v1.3.7), CASA 4.5.0 (pipeline v1.3.5), CASA 4.4.0 (pipeline 1.3.4), CASA 4.3.1 (pipeline v1.3.3), CASA 4.2.2 (pipeline v1.3.1), or CASA 4.1.0 (pipeline v1.2.0) at your home institution. For information about updates and changes made to either pipeline version 1.3.4 or 1.3.3, please download the tar file and read the "changes" file included. You can obtain the latest version of CASA from the Obtaining CASA webpage.  Make sure you download at least version 4.1.0 or 4.2.2, as earlier versions to do not include the ability to do requantizer gain corrections.  Instructions for obtaining and installing the pipeline are as follows:

1. Choose a directory on your machine where the scripts will live; in the example below, we use /home/mymachine/pipe_scripts
• EVLA_pipeline1.4.0.tar (for CASA 5.0.0; uses Perley-Butler 2013 flux density scale, usescratch=T or F is an option at startup, uses the new (mstransform based) tasks split & hanningsmooth.) Shows amplitude and phase vs. frequency plots for each field in the weblog.
• EVLA_pipeline1.3.11.tar (for CASA 4.7.2; uses Perley-Butler 2013 flux density scale, usescratch=T or F is an option at startup, uses the new (mstransform based) tasks split & hanningsmooth.) Shows amplitude and phase vs. frequency plots for each field in the weblog.
• EVLA_pipeline1.3.10.tar (for CASA 4.7.1; uses Perley-Butler 2013 flux density scale, usescratch=T or F is an option at startup, uses the new (mstransform based) tasks split & hanningsmooth.) Shows amplitude and phase vs. frequency plots for each field in the weblog.
• EVLA_pipeline1.3.9.tar (for CASA 4.7.0-1 or CASA 4.7.0; uses Perley-Butler 2013 flux density scale, usescratch=T or F is an option at startup, uses the new (mstransform based) tasks split & hanningsmooth.) Shows amplitude and phase vs. frequency plots for each field in the weblog.
• EVLA_pipeline1.3.8.tar (for CASA 4.6.0; uses Perley-Butler 2013 flux density scale, usescratch=T or F is an option at startup, uses the new (mstransform based) tasks split & hanningsmooth.)
• EVLA_pipeline1.3.7.tar (for CASA 4.5.3; uses Perley-Butler 2013 flux density scale, usescratch=T or F is an option at startup, uses split2)
• EVLA_pipeline1.3.5.tar (for CASA 4.5.0; uses Perley-Butler 2013 flux density scale, usescratch=T or F is an option at startup)
• EVLA_pipeline1.3.4.tar (for CASA 4.4.0; uses Perley-Butler 2013 flux density scale, usescratch=T)
• EVLA_pipeline1.3.3.tar (for CASA 4.3.1; uses Perley-Butler 2013 flux density scale)
• EVLA_pipeline1.3.1.tar (for CASA 4.2.2; uses Perley-Butler 2013 flux density scale)
• EVLA_pipeline1.2.0.tar (for CASA 4.1.0; uses Perley-Butler 2010 flux density scale, only for data taken prior to 2014-11-06: see known issues page)
3. Edit the file EVLA_pipeline.py, replacing the "pipepath" variable with the full path name of the location of the scripts.

While development of this Scripted Pipeline has stopped, the above versions may be useful as they may be edited to suit specific setups or science goals. The pipeline is optimized for the calibration of Stokes I continuum datasets.  However, with some simple modifications to the main driver script, it may also work for spectral line datasets, depending on the strength of your calibrators. Please see the known issues page for information about problems that may cause the pipeline to fail.

## Running the Pipeline

### Stokes I Continuum

To run the script on datasets that contain only continuum data (64 MHz or 128 MHz spectral windows):

1. Put your data (SDM-BDF or measurement set) in its own directory for processing.  For example:
mkdir myVLAdatamv mySDM myVLAdata/cd myVLAdata
1. Start casapy from myVLAdata (this is important — do not try to run the pipeline from a different directory by giving it the full path to a dataset, as some of the CASA tasks require the MS to be co-located with its associated gain tables; also, do not try to run the pipeline from inside the SDM-BDF or MS directories themselves).  It is also important that a fresh instance of CASA is started from the directory that will contain the SDM-BDF and MS, rather than using an existing instance of CASA and using "cd" to move to a new directory from within CASA, as the output plots will then end up in the wrong place and potentially overwrite your previous pipeline results.
2. From the CASA prompt, type:
execfile('/home/mymachine/pipe_scripts/EVLA_pipeline.py')

1. The pipeline will then prompt you for the SDM-BDF name; if you have only an MS, give it the root name of the MS (i.e., omit any '.ms').
2. The pipeline will then prompt you for whether or not you want to use Hanning smoothing (this can be important for strong, narrow-band RFI, but there are some situations where it is not desirable: for low frequencies in A-configuration it may increase bandwidth smearing, and for spectral line observations it will make the spectral resolution worse).
3. Go and make some coffee.
4. The pipeline will automatically generate a QA2 score of Pass, Partial, or Fail for the pipeline as a whole and for each step along the way. For some basic help interpreting your QA2 score, please see our QA2 interpretation page.

### Spectral Line Data

If your calibrators are strong enough that the heuristics in the VLA calibration pipeline will work on narrower bandwidths then some simple edits to the master EVLA_pipeline.py script is all that is needed in order for it to work on a spectral line datsaset. In particular, you will want to comment out the call to flagdata() that is executing the flagging task with mode='rflag' in the EVLA_pipe_targetflag.py script. This runs the target through an algorithm that searches for and flags RFI (rflag), and may therefore remove your spectral line as well.  You may also want to answer "n" to the Hanning smoothing option, and depending on the strength of your line, you may want to modify the inputs to "statwt" to exclude channels containing line emission.

### Mixed set-ups

In the case where a mixed continuum/spectral line set-up has been used, or multiple receiver bands have been observed, it may be the case that a single pipeline heuristic (e.g., gain calibration solution interval) is not appropriate for the entire dataset.  In this case, the MS can be split by correlator set-up/receiver band (typically specified by selecting on spectral windows or scans) after applying the online flags, and the pipeline run on the split datasets individually.  To do this:

1. Copy a version of EVLA_pipeline.py into your local directory being used for data reduction
2. Edit EVLA_pipeline.py, commenting out all "execfile" calls *after* EVLA_pipe_flagall.py
3. Run the pipeline on the full SDM-BDF/MS through EVLA_pipe_flagall.py, to apply all online flags.  Include Hanning smooth at this point, if you are going to use it: execfile('EVLA_pipeline.py')
4. Using the <SDMname>.listobs output, identify the groups of spws and/or scans to split (e.g., all spws associated with a particular observing band, or all spws with a particular spectral set-up)
5. Run CASA task "split" to separate the main MS into multiple MSs
6. Put the new MSs in their own directories (each will have their own pipeline run, so they need to be separated to avoid overwriting files)
7. cd to one of the directories with a split MS
8. Run the pipeline on this MS as usual, but *without Hanning smoothing* (if Hanning smoothing is going to be used at all, it has already been applied)

Note that because the online flags have already been applied, the "flagall" step of the pipeline will produce some error messages.  These can be ignored.

### Rerunning the pipeline

After running the pipeline, you may find that additional flagging may be required (resulting in the need to re-derive calibration) or that you may wish to rerun the pipeline again on the raw data with some different parameters for the pipeline. Instructions for both of these cases may be found on the special topics page and in the Application of Pipeline Calibration slides from the 2014 Data Reduction Workshop.

## Special Cases

### Incorrect scan intents

If you would like to run the VLA calibration pipeline on your data but you set up the scan intents incorrectly at observe time, all is not lost.  As long as you specified at least one scan as a flux density calibrator, it doesn't matter which, and you observed one of the standard flux density calibrators for which the pipeline has models (3C48, 3C138, 3C147, or 3C286), the pipeline can be modified easily to run on your dataset.  The master script, EVLA_pipeline.py, calls a number of sub-scripts that perform the various tasks.  Run the pipeline through EVLA_pipe_msinfo.py, and then re-set the following string and python list variables to refer to the correct field and scan IDs that you want to use for each.  For example:

flux_field_select_string='2'bandpass_scan_select_string='8'bandpass_field_select_string='4'delay_scan_select_string='8'delay_field_select_string='4'calibrator_scan_select_string='4,5,7,8,10,11,12'calibrator_field_select_string='1,2,3,4,5,6,7'phase_scan_list=[1,3,5,7,9,11,13,15]

Note that ONLY ONE bandpass or delay calibrator must be specified, and the bandpass and delay calibrators can be the same field ID, or different.  The two variables "calibrator_scan_select_string" and "calibrator_field_select_string" must contain the scan numbers and field IDs of ALL calibrators in the dataset, including the flux density calibrator, bandpass, delay, and complex gain.  Once these variables are set to your satisfaction, restart the pipeline from EVLA_pipe_flagall.py.

If there is no scan with a "calibrate flux" intent, or if a standard flux density calibrator was not observed at all but the flux density and spectral index of another calibrator is known, then it may still be possible to use the pipeline, with a little more work.  Please submit a helpdesk ticket for such situations.

### Features and limitations

This section will be updated as the pipeline continues to be developed.  The comments below apply to the following pipeline versions: 1.1.3, 1.1.4, 1.2.0, 1.3.0, 1.3.1, 1.3.3, 1.3.4, and 1.3.5.

• The VLA Calibration Pipeline produces a dataset that can be used as a starting point by the user for self-calibration and imaging of science targets.  Note that the pipeline does not do any self-calibration of fields that are not specified as calibrators.
• Although the CASA task "fluxscale" uses medians to do the flux density bootstrapping, which is fairly robust, in some cases (especially at high frequencies, for datasets that may have suffered from pointing or other problems) the default flux density scale may be uncertain.  For projects that may require very accurate flux density scales, it is advisable to run the pipeline through the script EVLA_pipe_fluxgains.py, which produces the gain table used for the flux density bootstrapping, and then flag this gain table to exclude bad times/antennas using the CASA tasks "plotcal" or "plotms".  The script EVLA_pipe_fluxflag.py can help by setting up the initial parameters for displaying the relevant gain table in plotcal, which can then be run interactively by the user.  After the flagging, the rest of the pipeline can run using the script EVLA_pipe_restart.py.
• The pipeline effectively calibrates each spectral window separately.  This means that the signal-to-noise ratio (s/n) on the calibrators needs to be sufficiently high that solutions can be obtained within the solution intervals specified in the pipeline scripts.  For the delay calibrator, this means a s/n>3 per integration time, t(int); for the bandpass calibrator, a s/n>5 is required for solution times up to 10*t(int); for the gain calibrator a s/n>3 is required for solution times up to 10*t(int), and >5 for a scan average.  Following the guidelines for the strength of gain calibrators as a function of observing bandwidth in the VLA  Observational Status Summary for the high frequency end of Q-band will be reasonably safe for all frequencies, although very narrow spectral channels may be problematic.  In these cases the data may need to be calibrated by hand, using polynomial bandpass fitting instead.
• The pipeline is not compatible with P band observations (even if the SB is P band only) due to the lack of models in CASA for P band. For this case, the pipeline would assume a 1 Jy, flat spectral index model making all flux densities derived by the pipeline arbitrary.

## Applying Pipeline Calibration to Raw Data

To apply the calibration and flag tables produced by the pipeline, we recommend using the same version of CASA used by the pipeline as well as the same version of the pipeline scripts. The version of both the pipeline and CASA used may be confirmed via the Splash Page of the weblog as well as in the CASA log file; both of which should be included in the weblog.tgz file available for download as part of the pipeline products. However, CASA 4.2.2 may be used to apply calibration derived using CASA 4.1.0 scripts (CASA 4.2.2 can NOT be used to run the 4.1.0 scripts, due to incompatibilities in the task input parameters). The instructions are the same for applying existing calibration and flag tables to a fresh MS in CASA 4.4.0, CASA 4.3.1, CASA 4.2.2, and CASA 4.1.0. The scripts may reside anywhere on your computer so long as the path is known. We recommend starting with a fresh SDM-BDF, though a fresh MS should work, too. There may be small differences in the final result or statistics if online flags were applied when requesting the MS.

### Define the pipepath variable

You will first need to edit two of the pipeline script python files, EVLA_pipeline.py and EVLA_pipe_restore.py, to give the correct path to your copy of the pipeline scripts.

1. Open "EVLA_pipeline.py" to change a variable named "pipepath" to point to the location of your local set of scripts.
Changepipepath='/lustre/aoc/cluster/pipeline/script/prod/' topipepath='/path/on/your/computer/to/downloaded/pipeline/scripts/'
1. Open "EVLA_pipe_restore.py" and find the line
execfile(pipepath+'EVLA_pipe_startup.py')
1. Above this line, add in the pipepath definition pointing to your copy of the scripts:
pipepath='/path/on/your/computer/to/downloaded/pipeline/scripts/'

Please be careful of indentation when editing the scripts.

### Prepare data

In order for the pipeline scripts to work properly, please follow the steps below to prepare your directory and calibration files for application.

1. Create a directory and place the SDM-BDF in it, along with "flagtables.tgz", "caltables.tgz", and "pipeline_shelf.restore" (all of which should have been downloaded). The weblog may be useful, but it doesn't need to be in this directory.
2. Start CASA from the directory where you placed the SDM-BDF, tgz files, and the .restore file.
3. Run the "EVLA_pipe_restore.py" script you downloaded with the other pipeline scripts: this will look for the pipeline_shelf.restore file and set needed environment variables. At the CASA prompt, type (if you get "not a valid SDM" warning, re-enter your SDM-BDF name):
execfile('/path/on/your/computer/to/downloaded/pipeline/scripts/EVLA_pipe_restore.py')
1. If you are starting with an SDM-BDF (recommended), the pipeline scripts will import the SDM-BDF and create a Measurement Set (MS). To do this, run the following script (If a MS already exists, skip this step and go to No. 5):
execfile('/path/on/your/computer/to/downloaded/pipeline/scripts/EVLA_pipe_import.py')
1. The pipeline Hanning smoothes the data by default: applying calibration obtained from the pipeline to non-Hanning smoothed data is incorrect! If your MS is NOT Hanning smoothed yet, set the following pipeline variable:
myHanning = 'y'
1. Run the Hanning smoothing script from the pipeline script set:
execfile('/path/on/your/computer/to/downloaded/pipeline/scripts/EVLA_pipe_hanning.py')
1. In a new window, navigate to the directory where you are running CASA. If a MS-name.ms.flagversions directory has been created, you will want to remove this since you will use the pipeline's flagversion file instead.
2. Untar the flagtables.tgz file. This should produce a directory with the same name as the MS, but with an extra ".flagversions" at the end, such as:  SB-name.ms.flagversions. If it does not, you will need to create a directory, MS-name.ms.flagversions, such that MS-name matches the full name of your MS, and put the contents of flagtables.tgz in this new directory.
3. Untar the caltables.tgz file.  This should produce a "final_caltables" directory. Move the contents out of this directory so the .g, .b, and .k calibration tables are in the same directory as the .ms and .ms.flagversions directories.

### Apply calibration and flagging

• Restore flags prior to the point where the pipeline applied the final calibration tables, run the following (Note the CASA version specific calls to flagmanager!):
# With CASA 4.5.3 and later versions of CASA:flagmanager(vis=ms_active,mode='restore',versionname='applycal_1')
# With CASA 4.3.1 and earlier versions of CASA: flagmanager(vis=ms_active,mode='restore',versionname='before_applycal_1')
• Apply the final pipeline calibration by running:
execfile('/path/on/your/computer/to/downloaded/pipeline/scripts/EVLA_pipe_applycals.py')
• If desired, run RFLAG on the target. Plots produced during these steps should look the same (or at least very close to) the plots in the weblog.
execfile('/path/on/your/computer/to/downloaded/pipeline/scripts/EVLA_pipe_targetflag.py')
• Run statwt.
execfile('/path/on/your/computer/to/downloaded/pipeline/scripts/EVLA_pipe_statwt.py')

## Pipeline Calibration Reports

A series of detailed evaluations of the pipeline results along with comparisons between images that have been created with data calibrated by the VLA pipeline and data that was calibrated by hand are now available.   For a full list of all the reports, see this link.