VLA Imaging Pipeline; CASA version 6.1.2

General Description

A prototype VLA imaging pipeline has been developed that is functional for the basic use case of making VLA continuum images using the aggregate bandwidth available in an observation. This imaging pipeline is built on the foundation provided by the ALMA imaging pipeline, but is optimized to support the VLA. The current imaging choices are made in the pipeline, not all of which will be optimal for all data sets. This imaging pipeline is still in an early state and problems may occur beyond the current list of known issues.

  • A single aggregate continuum image is produced per observed band per target present in the input data.
  • Mosaics are not currently supported, thus each field of a mosaic will be imaged individually.
  • Image sizes are limited to be at most 16384x16384 pixels meaning that images from A and B-configuration at low frequencies will not completely encompass the primary beam.
  • Cleaning is currently done without a mask, cleaning down to the 5-sigma level using the nsigma parameter of tclean.
  • Robust=0.5 is adopted as a default.
  • Targets with significant extended emission (as determined by a ratio of visibility amplitudes from short and mid-baselines) will have the inner 5% of the uvrange omitted from imaging to avoid deconvolution errors from poorly sampled structure.
  • Pixel sizes are chosen to sample the synthesized beam with 5 pixels across the minor axis; this specification drops to 4 pixels when the image size is mitigated.
  • Nterms=2 is used when an image fractional bandwidth is > 10%.

Please see the Known Issues below. For questions, suggestions, and other issues encountered with the pipeline, please submit a ticket to the Pipeline Department of the NRAO Helpdesk.


Pipeline Requirements

  1. The VLA imaging pipeline is currently designed to run on a single calibrated SB. However, it may successfully run on a collection of calibrated SBs. We recommend only attempting to image a collection of SBs with the same targets, bands, and correlator setups. Multi-SB functionality is not fully validated.
  2. The pipeline relies on correctly set scan intents. We therefore recommend that every observer ensures that the scan intents are correctly specified in the Observation Preparation Tool (OPT) during the preparation of the SB (see OPT manual for details). For the imaging pipeline to run, the OBSERVE_TARGET intent is required. Without this intent, there will be no science data to image. Other intents can be present in the MS, but the second stage of the pipeline, following import, will split off the science target data.
  3. You may run the imaging pipeline on datasets that either have both the DATA and CORRECTED columns or have the CORRECTED column split from the original MS into a new MS (where it will now be the DATA column). If your dataset only has the DATA column, the hif_mstransform task will report an error and not create a _target.ms file. However, the pipeline will proceed with the rest of the imaging tasks normally.

 


Obtaining and Running the Imaging Pipeline in CASA

The imaging pipeline can take a few hours to a few days to complete depending on the specifics of the data. We give abbreviated instructions here for starting and running CASA, for full instructions, see the VLA Calibration pipeline.

You should start CASA in the same directory that the data you plan to work on reside. To start CASA with the pipeline from your own installation type:

#In a Terminal
casa --pipeline


If you are running CASA on the New Mexico Array Science Center (NMASC) computers, you can start the latest CASA with pipeline using the command

#In a Terminal
casa-pipe

The imaging pipeline (unlike the calibration pipeline) can be sped up by running it in parallel with mpicasa. Note that mpicasa only works in Linux.

#In a Terminal
mpicasa -n X <path_to_casa>/casa --pipeline 

#At NMASC
mpicasa -n X /home/casa/packages/pipeline/casa-6.1.2-7-pipeline-2020.1.0.36/bin/casa --pipeline

where 'X' is the number of processing cores. Note that one core is always used for the management of the processes mpicasa -n 9 will therefore use 9 cores, 8 of which are used for processing the data. However, when using mpicasa, the memory usage will increase and depending on image size, number of threads, and amount of memory available on your computer or the computing node one could run out of memory and begin using swap space which will slow the imaging process to a crawl.

Ensure that your process can run for the full duration without interruption. Also, make sure that there is enough space in your directory as the data volume will increase by about a factor of four. There are several ways to run the imaging pipeline; you can run it standalone on a preciously calibrated dataset or you can run it as a combined calibration and imaging pipeline run. 

The CASA homepage has more information on using CASA at NRAO.  


Now that CASA is open, we can start running the imaging pipeline with one of several methods. In these first methods to run the imaging pipeline, we assume that a previously calibrated measurement set is available. This could either be from a calibration pipeline run by the user or restored measurement set from the new NRAO archive.

Method 1: Pipeline Script

You can use a pipeline script, for example 'casa_imaging_pipescript.py' file. For this to work, the 'casa_imaging_pipescript.py' must be in the same directory where you start CASA with the pipeline. Once CASA is started (same steps as above) type:

#In CASA
execfile('casa_imaging_pipescript.py')

In the following script, a calibrated measurement set is a required input. Simply replace myCaldMS with the name of the measurement set desired for imaging.

Example Imaging pipeline script:

__rethrow_casa_exceptions = True
context = h_init()
context.set_state('ProjectSummary', 'observatory', 'Karl G. Jansky Very Large Array')
context.set_state('ProjectSummary', 'telescope', 'EVLA')
try:
    hifv_importdata(vis=['myCaldMS.ms'])
    hif_mstransform(pipelinemode="automatic")
    hif_checkproductsize(maximsize=16384)
    hif_makeimlist(specmode='cont')
    hif_makeimages(hm_masking='none', hm_cyclefactor=3.0)
    hifv_exportdata(imaging_products_only=True)
finally:
    h_save()

If one simply wants to add the imaging commands to an existing calibration script, then the commands hif_mstransform to hif_makeimages should be inserted into the calibration script, before the hifv_exportdata call.

 

Method 2: Recipes

The Recipe Reducer is an alternative method to running the imaging recipe on calibrated data.

# In CASA
import pipeline.recipereducer pipeline.recipereducer.reduce(vis=['myCaldMS.ms'],procedure='procedure_hifv_contimage.xml',loglevel='summary')

Users should be cautioned that the recipe reducer created a non-unique weblog directory and this method also includes a call to hifv_exportdata, which will package the calibration products into a 'products' directory one level up. This may not be desired by all users.

 

Method 3: One Stage at a Time

You may notice that the 'casa_imaging_pipescript.py' is a list of specific CASA pipeline tasks being called in order to form the default pipeline. If desired, one could run each of these tasks one at a time in CASA to inspect intermediate pipeline products as an example.

If you need to exit CASA between stages, you can restart the pipeline where you left off. However, in order for this to work, none of the files can be moved to other directories.

First, use the CASA pipeline task h_resume after starting CASA again. This will set up the environment again for the pipeline to work. Type:

# In CASA
h_resume()

Now, you may start the next task in your list.

 

Running the Calibration Pipeline with the Imaging Pipeline

 

Method 1: Pipeline script

You can also run both the calibration pipeline and include the science target imaging. Below we show an example casa_calibration_and_imaging_pipescript.py that has the target imaging pipeline commands included at the end.

__rethrow_casa_exceptions = True
context = h_init()
context.set_state('ProjectSummary', 'observatory', 'Karl G. Jansky Very Large Array')
context.set_state('ProjectSummary', 'telescope', 'EVLA')
try:
    hifv_importdata(vis=['mySDM'], createmms='automatic',\
                    asis='Receiver CalAtmosphere', ocorr_mode='co',\
                    nocopy=False, overwrite=False)
    hifv_hanning(pipelinemode="automatic")
    hifv_flagdata(tbuff=0.0, flagbackup=False, scan=True, fracspw=0.05,\
                  intents='*POINTING*,*FOCUS*,*ATMOSPHERE*,*SIDEBAND_RATIO*,\
                  *UNKNOWN*, *SYSTEM_CONFIGURATION*, *UNSPECIFIED#UNSPECIFIED*',\
                  clip=True, baseband=True, shadow=True, quack=True, edgespw=True,\
                  autocorr=True, hm_tbuff='1.5int', template=True, online=True)
    hifv_vlasetjy(fluxdensity=-1, scalebychan=True, spix=0, reffreq='1GHz')
    hifv_priorcals(tecmaps=False)
    hifv_testBPdcals(weakbp=False)
    hifv_checkflag(pipelinemode="automatic")
    hifv_semiFinalBPdcals(weakbp=False)
    hifv_checkflag(checkflagmode='semi')
    hifv_semiFinalBPdcals(weakbp=False)
    hifv_solint(pipelinemode="automatic")
    hifv_fluxboot2(fitorder=-1)
    hifv_finalcals(weakbp=False)
    hifv_applycals(flagdetailedsum=True, gainmap=False, flagbackup=True,\
                   flagsum=True)
    hifv_targetflag(intents='*CALIBRATE*,*TARGET*')
    hifv_statwt(datacolumn='corrected')
    hifv_plotsummary(pipelinemode="automatic")
    hif_makeimlist(nchan=-1, calcsb=False, intent='PHASE,BANDPASS', robust=-999.0,\
                   parallel='automatic', per_eb=False, calmaxpix=300,\
                   specmode='cont', clearlist=True)
    hif_makeimages(tlimit=2.0, hm_perchanweightdensity=False, hm_npixels=0,\
                   hm_dogrowprune=True, hm_negativethreshold=-999.0, calcsb=False,\
                   hm_noisethreshold=-999.0, hm_fastnoise=True, hm_masking='none',\
                   hm_minpercentchange=-999.0, parallel='automatic', masklimit=4,\
                   hm_nsigma=0.0, target_list={}, hm_minbeamfrac=-999.0,\
                   hm_lownoisethreshold=-999.0, hm_growiterations=-999,\
                   overwrite_on_export=True, cleancontranges=False,\
                   hm_sidelobethreshold=-999.0)
#Science target imaging pipeline commands
hif_mstransform(pipelinemode="automatic")
    hif_checkproductsize(maximsize=16384)
    hif_makeimlist(specmode='cont')
    hif_makeimages(hm_masking='none', hm_cyclefactor=3.0)
    #hifv_exportdata(gainmap=False, exportmses=False, exportcalprods=False)
finally:
    h_save()

Method 2: Recipes

Similar to running just the imaging pipeline via the Recipe Reducer, there is a procedure to run calibration+imaging via this method as well.

# In CASA
import pipeline.recipereducer pipeline.recipereducer.reduce(vis=['mySDM'],procedure='procedure_hifv_calimage_cont.xml',loglevel='summary')

The same caveats about the non-unique weblog directory name and the call to hifv_exportdata described previously also apply here.

 

Method 3: One Stage at a Time

As noted in the imaging-only pipeline and the VLA calibration pipeline, you may also run the pipeline one stage at a time, with the ability to resume if it's necessary to exit CASA.


What you get: Pipeline Products

VLA pipeline output includes data products such as calibrated visibilities, a weblog, and all calibration tables. Note that the automated execution at NRAO will also run an additional data packaging step (hifv_exportdata) which moves most of the files to an upper level '../products' directory. This step is omitted in the manual execution and all products remain within the 'root' directory where the pipeline was executed.

The most important pipeline products include:

  • Science target images for each band and each target (files start with 'oussid*' in the root directory). These include the tt0, tt1, pb (primary beam profile), clean mask, alpha (spectral index), and alpha.error (spectral index uncertainty) files. Note that when a very large number of pixels are used for the image (typicaly A and B configuration and/or high frequency data) images loaded in CASAviewer or CARTA may appear blank and simply need to be zoomed to find your target(s).
  • A weblog that is supplied as a compressed tarball weblog.tgz, when extracted it has the form pipeline-YYYYMMDDTHHMMSSS/html/index.html, where the YYYYMMDDTHHMMSSS stands for the pipeline execution time stamp (multiple pipeline executions will result in multiple weblogs). The weblog contains information on the pipeline processing steps with diagnostic plots and statistics. The images for each target field in the weblog will not likely show detail for your observed target fields given the size of the images that might be created and the limited size of the weblog images. An example is given in the VLA pipeline CASA guide.
  • The casapy-YYYYMMDD-HHMMSS.log CASA logger messages (in pipeline-YYYYMMDDTHHMMSSS/html/).
  • 'casa_pipescript.py' (in pipeline-YYYYMMDDTHHMMSSS/html/), the script with the actually executed pipeline heuristic sequence and parameters. This file can be used to modify and re-execute the pipeline (see section The casa_pipescript.py file). Note that we also refer to a casa_imaging_pipescript.py, this is simply to differentiate between a script that runs calibration pipeline commands (possibly along with imaging) and one that runs only imaging. This file is created by the pipeline and is always called casa_pipescript.py regardless of what the filename of your script is called.
  • 'casa_commands.log' (in pipeline-YYYYMMDDTHHMMSSS/html/), which contains the actual CASA commands that were generated by the pipeline heuristics (see section The casa_commands.log file).
  • The output from CASA's task listobs is available at 'pipeline-YYYYMMDDTHHMMSSS/html/sessionSession_default/mySDM.ms/listobs.txt' and contains the characteristics of the observations (scans, source fields, spectral setup, antenna positions, and general information).

 

 

The Imaging Pipeline casa_imaging_pipescript.py File

The pipeline sequence of the pipeline heuristic steps are listed in the 'casa_pipescript.py' script that is located in the pipeline-YYYYMMDDTHHMMSSS/html (where YYYYMMDDTHHMMSSS is the timestamp of the execution) directory. Note that no matter what you call your script, the pipeline will create a file called casa_pipescript.py in the aforementioned directory as a record of what pipeline functions were run.

A typical 'casa_imaging_pipescript.py' has the following structure (where mySDM is again a placeholder for the name of the SDM-BDF raw data file and will have the name of the one that was processed):

__rethrow_casa_exceptions = True
context = h_init()
context.set_state('ProjectSummary', 'observatory', 'Karl G. Jansky Very Large Array')
context.set_state('ProjectSummary', 'telescope', 'EVLA')
try:
    hifv_importdata(vis=['myCaldMS.ms'])
    hif_mstransform(pipelinemode="automatic")
    hif_checkproductsize(maximsize=16384)
    hif_makeimlist(specmode='cont')
    hif_makeimages(hm_masking='none', hm_cyclefactor=3.0)
    #hifv_exportdata(imaging_products_only=True)
finally:
    h_save()

(Note that executions at NRAO may show small differences, e.g., an additional final hifv_exportdata (commented out in example above) step that packages the products to be stored in the NRAO archive.)

The above is, in fact, a standard user 'casa_imaging_pipescript.py' file for the current CASA and pipeline version (download to edit and run yourself) that can be used for general pipeline processing after inserting the correct myCaldMS filename in hifv_importdata.

The imaging pipeline run can be modified by adapting this script, at present there are limited options that should be altered. The script can then be (re-)executed via:

# In CASA
execfile('casa_imaging_pipescript.py')

 

The casa_commands.log File

casa_commands.log is another useful file in pipeline-YYYYMMDDTHHMMSSS/html (where YYYYMMDDTHHMMSSS is the timestamp of the pipeline execution) that lists all the individual CASA commands that the pipeline heuristics (hifv) tasks produced. Note that 'casa_commands.log' is not executable itself, but contains all the CASA tasks and associated parameters to trace back the individual data reduction steps.

The Pipeline Weblog

Information on the pipeline run can be inspected through a weblog that is launched by pointing a web browser to file:///<path to your working directory>/pipeline-YYYYMMDDTHHMMSSS/html/index.html. The weblog contains statistics and diagnostic plots for the SDM-BDF as a whole and for each stage of the pipeline. The weblog is the first place to check if a pipeline run was successful and to assess the quality of the calibration.

An example walkthrough of a calibration pipeline weblog is provided in the VLA Pipeline CASA guide. A similar walk through for the imaging pipeline is under construction.

Note that we regularly test the weblog on Firefox. Other browsers may not display all items correctly.

 

 


 

Quality (QA) Assessment Scores

Each pipeline stage has a quality assessment (QA) score assigned to it. The values range from 0 to 1 where 

0.9-1.0 Standard/Good     (green color) 
0.66-0.90 Below-Standard  (blue color; also shown as a question mark symbol) 
0.33-0.66 Warning         (amber color; cross symbol)
0.00-0.33 Error            (red color; cross symbol) 

We recommend that all pipeline stages and the relevant products are checked. Below-standard and Warning scores should receive extra scrutiny. The QA section at the bottom of the weblog of each stage will provide more information about the particular origin of each score. Errors are usually very serious issues with the data or processing and should be resolved in any case. The QA scores for the imaging pipeline are not currently in a mature state. Currently the most relevant QA scores will be associated with hif_makeimages() where the scores will be dictated by the S/N ratio in the image. Low S/N will get a low score, but that may be expected depending on the properties of your data.

Examples for QA scores are provided in the Pipeline CASAguide

 


 

Known Issues

This section will be updated as the pipeline continues to be developed. The comments below are general to most pipeline versions, including the current production pipeline used for VLA data. The current production version is CASA 6.1.2.

  • W-projection is not used for extended configurations and low frequencies (primarily S-band and L-band) and may yield poor results.
  • The image size limitation will also result in the primary beam (and side lobes) not being imaged for low-frequency data (primarily S-band and L-band). Therefore, strong sources outside the primary beam will not be deconvolved and may yield poor results.
  • The weblog for hif_checkproductsize will show '-1 GB' for many entries. These are currently unused modes for cube images and maximum allowed size of products. Mitigation is currently only done to limit the size of image to a maximum of 1638x16384 pixels.
  • When imported calibrated data (typically imaging-only recipes), hifv_import data will show a warning because there is a HISTORY table present from the calibration, this warning can be ignored.
  • Ephemeris targets (i.e., Solar System objects) have not been validated with the imaging pipeline and are not expected to work at present.


 

Connect with NRAO

The National Radio Astronomy Observatory and Green Bank Observatory are facilities of the U.S. National Science Foundation operated under cooperative agreement by Associated Universities, Inc.