VLA Imaging Pipeline 2021.2.0.128; CASA version 6.2.1
General Description
A VLA imaging pipeline is available for VLA continuum data using the aggregate bandwidth available in an observation. This imaging pipeline is built on the foundation provided by the ALMA imaging pipeline, but is optimized to support the VLA. The current imaging pipeline parameters may not be optimal for all datasets, but will be applicable for all the bands supported by the Science Ready Data Products processing. The imaging pipeline is still in an early state and problems may occur beyond the current list of known issues. Some current features and limitations are as follows:
- A single aggregate (all spws and bandwidth combined) continuum image is produced per observed band, per target that are present in the input data.
- Mosaics are not currently supported, thus each field of a mosaic will be imaged individually.
- Image sizes are limited to at most 16384x16384 pixels meaning that images from A and B-configuration at low frequencies will not completely encompass the primary beam.
- Cleaning is done using auto-masking as part of tclean, cleaning down to the 4-sigma level using the nsigma parameter of tclean.
- Robust=0.5 is adopted as a default.
- Targets with significant extended emission (as determined by a ratio of visibility amplitudes from short and mid-baselines) will have the inner 5% of the uvrange omitted from imaging to avoid deconvolution errors from poorly sampled structure.
- Pixel sizes are chosen to sample the synthesized beam with 5 pixels across the minor axis; this specification drops to 4 pixels when the image size is mitigated.
- Nterms=2 is used when the fractional continuum bandwidth is > 10% in a single band.
Please see the Known Issues below. For questions, suggestions, and other issues encountered with the pipeline, please submit a ticket to the Pipeline Department of the NRAO Helpdesk.
Pipeline Requirements
- The VLA imaging pipeline is currently designed to run on a single calibrated SB. However, it may successfully run on a collection of calibrated SBs. We recommend only attempting to image an SB(s) with the same targets, bands, and correlator setups. Multi-SB functionality is not fully validated and is not likely to work with all data.
- The pipeline relies on correctly set scan intents. We therefore recommend that every observer ensures that the scan intents are correctly specified in the Observation Preparation Tool (OPT) during the preparation of the SB (see OPT manual for details). For the imaging pipeline to run, the OBSERVE_TARGET intent is required. Without this intent, there will be no science data to image. Other intents can be present in the MS, but in the second stage of the imaging pipeline, following import (or following the calibration portion of the calibration+imaging recipe), the science target data will be split off from the calibration sources.
- The imaging pipeline recipes expect that datasets will have the DATA and CORRECTED columns present. If an MS only has the data column (having previously been split from a calibrated measurement set), you should run the pipeline using a CASA pipescript with the hif_mstransform task removed or commented-out.
Obtaining and Running the Imaging Pipeline in CASA
The imaging pipeline can take a few hours to a few days to complete depending on the specifics of the data and whether parallel processing is used. We give abbreviated instructions here for starting and running CASA; for full instructions, see the VLA Calibration pipeline.
You should start CASA in the same directory that contains the data you plan to work on. To start CASA with the pipeline from your own installation type (assuming that the executables are in your system PATH environment variable):
#In a Terminal
casa --pipeline
If you are running CASA on the New Mexico Array Science Center (NMASC) computers, you can start the latest CASA with pipeline using the command
#In a Terminal casa-pipe
The imaging pipeline (unlike the calibration pipeline) can be sped up by running it in parallel with mpicasa. Note that mpicasa only works in Linux.
#In a Terminal <path_to_casa>/mpicasa -n X <path_to_casa>/casa --pipeline
#At NMASC
export CASAPATH=/home/casa/packages/pipeline/casa-6.2.1-7-pipeline-2021.2.0.128/
$CASAPATH/bin/mpicasa -n X $CASAPATH/bin/casa --pipeline
where 'X' is the number of processing cores. Note that one core is always used for the management of the processes, so therefore mpicasa -n 9 will use 9 cores, 8 of which are used for processing the data. However, when using mpicasa, the memory usage will increase and depending on image size, number of threads, and amount of memory available on your computer (or the computing node) one could run out of memory and begin using swap space which will slow the imaging process to a crawl.
Ensure that your process can run for the full duration without interruption. Also, make sure that there is enough space in your directory as the data volume will increase by about a factor of two or more depending on the image sizes (depends on band and configuration). There are several ways to run the imaging pipeline; you can run it standalone on a previously calibrated dataset or you can run it as a combined calibration and imaging pipeline run.
The CASA homepage has more information on using CASA at NRAO.
Now that CASA is open, the imaging pipeline can be started with one of several methods. In these first methods to run the imaging pipeline, it is assumed that a previously calibrated measurement set is available. This could either be from a calibration pipeline run by the user or restored measurement set from the new NRAO archive.
Method 1: Pipeline Script
You can use a pipeline script, for example 'casa_imaging_pipescript.py' file. For this to work, the 'casa_imaging_pipescript.py' must be in the same directory where you start CASA with the pipeline. Once CASA is started (same steps as above) type:
#In CASA execfile('casa_imaging_pipescript.py')
In the following script, a calibrated measurement set is a required input. Simply replace myCaldMS with the name of the measurement set desired for imaging.
Example Imaging pipeline script:
context = h_init()
context.set_state('ProjectSummary', 'observatory', 'Karl G. Jansky Very Large Array')
context.set_state('ProjectSummary', 'telescope', 'EVLA')
try:
hifv_importdata(vis=['myCaldMS.ms'])
hifv_flagdata(intents='*POINTING*,*FOCUS*,*ATMOSPHERE*,*SIDEBAND_RATIO*,*UNKNOWN*, *SYSTEM_CONFIGURATION*, *UNSPECIFIED#UNSPECIFIED*', quack=False, autocorr=False, baseband=False, edgespw=False, clip=False, online=False, shadow=False, scan=True)
hif_mstransform(pipelinemode="automatic")
hif_checkproductsize(maximsize=16384)
hif_makeimlist(specmode='cont')
hif_makeimages(hm_cyclefactor=3.0)
hifv_pbcor(pipelinemode="automatic")
#hifv_exportdata(imaging_products_only=True)
finally:
h_save()
If one simply wants to add the imaging commands to an existing calibration script, then the commands beginning with hif_mstransform to hif_makeimages should be inserted into the calibration script, after calibrator imaging.
Method 2: Recipes
The Recipe Reducer is an alternative method to running the imaging recipe on calibrated data.
# In CASA
import pipeline.recipereducer pipeline.recipereducer.reduce(vis=['myCaldMS.ms'],procedure='procedure_hifv_contimage.xml',loglevel='summary')
Users should be cautioned that the recipe reducer creates a non-unique weblog directory and this method also includes a call to hifv_exportdata, which will package the calibration products into a 'products' directory one level up. This may not be desired by all users.
Method 3: One Stage at a Time
You may notice that the 'casa_imaging_pipescript.py' is a list of specific CASA pipeline tasks being called in order to form the default pipeline. If desired, one could run each of these tasks one at a time in CASA, for example to inspect intermediate pipeline products.
If you need to exit CASA between stages, you can restart the pipeline where you left off. However, in order for this to work, none of the files can be moved to other directories.
First, use the CASA pipeline task h_resume after starting CASA again. This will set up the environment again for the pipeline to work. Type:
# In CASA h_resume()
Now, you may start the next task in your list.
Running the Calibration Pipeline with the Imaging Pipeline
Method 1: Pipeline script
You can also run both the calibration pipeline and include the science target imaging. Below we show an example casa_calibration_and_imaging_pipescript.py that has the target imaging pipeline commands included at the end.
context = h_init()
context.set_state('ProjectSummary', 'observatory', 'Karl G. Jansky Very Large Array')
context.set_state('ProjectSummary', 'telescope', 'EVLA')
try:
hifv_importdata(vis=['mySDM'], createmms='automatic',\
asis='Receiver CalAtmosphere', ocorr_mode='co',\
nocopy=False, overwrite=False)
hifv_hanning(pipelinemode="automatic")
hifv_flagdata(hm_tbuff='1.5int',fracspw=0.01,\
intents='*POINTING*,*FOCUS*,*ATMOSPHERE*,*SIDEBAND_RATIO*,\
*UNKNOWN*, *SYSTEM_CONFIGURATION*, *UNSPECIFIED#UNSPECIFIED')*',\
hifv_vlasetjy(pipelinemode="automatic")
hifv_priorcals(pipelinemode="automatic")
hifv_testBPdcals(pipelinemode="automatic")
hifv_checkflag(checkflagmode='bpd-vla')
hifv_semiFinalBPdcals(pipelinemode="automatic")
hifv_checkflag(checkflagmode='allcals-vla')
hifv_solint(pipelinemode="automatic")
hifv_fluxboot(pipelinemode="automatic")
hifv_finalcals(pipelinemode="automatic")
hifv_applycals(pipelinemode="automatic")
hifv_checkflag(checkflagmode='target-vla')
hifv_targetflag(intents='*TARGET*')
hifv_statwt(pipelinemode="automatic")
hifv_plotsummary(pipelinemode="automatic")
hif_makeimlist(intent='PHASE,BANDPASS')
hif_makeimages(hm_masking='centralregion')
#Science target imaging pipeline commands
hif_mstransform(pipelinemode="automatic")
hif_checkproductsize(maximsize=16384)
hif_makeimlist(specmode='cont')
hif_makeimages(hm_cyclefactor=3.0)
hifv_pbcor(pipelinemode="automatic")
#hifv_exportdata(pipelinemode="automatic")
finally:
h_save()
Method 2: Recipes
Similar to running just the imaging pipeline via the Recipe Reducer, there is a procedure to run calibration+imaging via this method as well.
# In CASA
import pipeline.recipereducer pipeline.recipereducer.reduce(vis=['mySDM'],procedure='procedure_hifv_calimage_cont.xml',loglevel='summary')
The same caveats about the non-unique weblog directory name and the call to hifv_exportdata described previously also apply here.
Method 3: One Stage at a Time
As noted in the imaging-only pipeline and the VLA calibration pipeline, you may also run the pipeline one stage at a time, with the ability to resume if it's necessary to exit CASA.
What you get: Pipeline Products
VLA imaging pipeline output includes data products such as primary beam corrected images and spectral index images. Note that the automated execution at NRAO will also run an additional data packaging step (hifv_exportdata) which moves most of the files to an upper level '../products' directory. This step is omitted from the pipescript method, and all products remain within the 'root' directory where the pipeline was executed.
The most important pipeline products include:
- Science target images for each band and each target (files start with 'oussid*' in the root directory). These include the primary beam corrected tt0 image, tt1 (not primary beam corrected), pb (primary beam profile), clean mask, alpha (spectral index), and alpha.error (spectral index uncertainty) files. Note that when a very large number of pixels are used for the image (typicaly A and B configuration and/or high frequency data) images loaded in CASAviewer or CARTA may appear blank and simply need to be zoomed to find the sources(s).
- A weblog that is supplied as a compressed tarball weblog.tgz. When extracted, it has the form pipeline-YYYYMMDDTHHMMSSS/html/index.html, where the YYYYMMDDTHHMMSSS stands for the pipeline execution time stamp (multiple pipeline executions will result in multiple weblogs). The weblog contains information on the pipeline processing steps with diagnostic plots and statistics. The images for each target field in the weblog will not likely show detail for your observed target fields given the size of the images that might be created and the limited size of the weblog images. A VLA imaging pipeline CASA guide is under construction.
- The casapy-YYYYMMDD-HHMMSS.log CASA logger messages (in pipeline-YYYYMMDDTHHMMSSS/html/).
- 'casa_pipescript.py' (in pipeline-YYYYMMDDTHHMMSSS/html/), the script with the actually executed pipeline heuristic sequence and parameters. This file can be used to modify and re-execute the pipeline (see section The casa_pipescript.py file). Note that we also refer to a casa_imaging_pipescript.py; this is simply to differentiate between a script that runs calibration pipeline commands (possibly along with imaging) and one that runs only imaging. This file is created by the pipeline and is always called casa_pipescript.py regardless of what the filename of your script is called.
- 'casa_commands.log' (in pipeline-YYYYMMDDTHHMMSSS/html/), which contains the actual CASA commands that were generated by the pipeline heuristics (see section The casa_commands.log file).
- The output from CASA's task listobs is available at 'pipeline-YYYYMMDDTHHMMSSS/html/sessionSession_default/mySDM.ms/listobs.txt' and contains the characteristics of the observations (scans, source fields, spectral setup, antenna positions, and general information).
The Imaging Pipeline casa_imaging_pipescript.py File
The pipeline sequence of the pipeline heuristic steps are listed in the 'casa_pipescript.py' script that is located in the pipeline-YYYYMMDDTHHMMSSS/html (where YYYYMMDDTHHMMSSS is the timestamp of the execution) directory. Note that no matter what you call your script, the pipeline will create a file called casa_pipescript.py in the aforementioned directory as a record of what pipeline functions were run.
A typical 'casa_imaging_pipescript.py' has the following structure (where mySDM is again a placeholder for the name of the SDM-BDF raw data file and will have the name of the one that was processed):
context = h_init()
context.set_state('ProjectSummary', 'observatory', 'Karl G. Jansky Very Large Array')
context.set_state('ProjectSummary', 'telescope', 'EVLA')
try:
hifv_importdata(vis=['myCaldMS.ms'])
hifv_flagdata(intents='*POINTING*,*FOCUS*,*ATMOSPHERE*,*SIDEBAND_RATIO*,*UNKNOWN*, *SYSTEM_CONFIGURATION*, *UNSPECIFIED#UNSPECIFIED*', quack=False, autocorr=False, baseband=False, edgespw=False, clip=False, online=False, shadow=False, scan=True)
hif_mstransform(pipelinemode="automatic")
hif_checkproductsize(maximsize=16384)
hif_makeimlist(specmode='cont')
hif_makeimages(hm_cyclefactor=3.0)
hifv_pbcor(pipelinemode="automatic")
#hifv_exportdata(imaging_products_only=True)
finally:
h_save()
(Note that executions at NRAO may show small differences, e.g., an additional final hifv_exportdata (commented out in example above) step that packages the products to be stored in the NRAO archive.)
The above is, in fact, a standard user 'casa_imaging_pipescript.py' file for the current CASA and pipeline version (download to edit and run yourself) that can be used for general pipeline processing after inserting the correct myCaldMS filename in hifv_importdata.
The call to hifv_flagdata is there in case additional flagging needs to be added to the flagging template, all other flagging modes are turned off. This task will be replaced by a target-specific task in the next pipeline release.
The hifv_pbcor call after hif_makeimages will perform the primary beam correction. This is needed because tclean by default will not do the primary beam correction on wideband images. Note that this primary beam correction is approximate and uses the primary beam determined from the center of the band. If more accurate correction is required, please see the CASA task widebandpbcor.
The imaging pipeline run can be modified by adapting this script. At present there are limited options that should be altered. The script can then be (re-)executed via:
# In CASA execfile('casa_imaging_pipescript.py')
The casa_commands.log File
casa_commands.log is another useful file in pipeline-YYYYMMDDTHHMMSSS/html (where YYYYMMDDTHHMMSSS is the timestamp of the pipeline execution) that lists all the individual CASA commands that the pipeline heuristics (hifv) tasks produced. Note that 'casa_commands.log' is not executable itself, but contains all the CASA tasks and associated parameters to trace back the individual data reduction steps.
The Pipeline Weblog
Information on the pipeline run can be inspected through a weblog that is launched by pointing a web browser to file:///<path to your working directory>/pipeline-YYYYMMDDTHHMMSSS/html/index.html. The weblog contains statistics and diagnostic plots for the SDM-BDF as a whole and for each stage of the pipeline. The weblog is the first place to check if a pipeline run was successful and to assess the quality of the calibration.
An example walkthrough of a calibration pipeline weblog is provided in the VLA Pipeline CASA guide. A similar walk through for the imaging pipeline is under construction.
Note that we regularly test the weblog on Firefox. Other browsers may not display all items correctly.
Quality (QA) Assessment Scores
Each pipeline stage has a quality assessment (QA) score assigned to it. The values range from 0 to 1 where
0.9-1.0 Standard/Good (green color)
0.66-0.90 Below-Standard (blue color; also shown as a question mark symbol)
0.33-0.66 Warning (amber color; cross symbol)
0.00-0.33 Error (red color; cross symbol)
We recommend that all pipeline stages and the relevant products are checked. Below-standard and Warning scores should receive extra scrutiny. The QA section at the bottom of the weblog of each stage will provide more information about the particular origin of each score. Errors are usually very serious issues with the data or processing and should be resolved in any case. The QA scores for the imaging pipeline are not currently in a mature state. Currently the most relevant QA scores will be associated with hif_makeimages() where the scores will be dictated by the S/N ratio in the image. Low S/N will get a low score, but that may be expected depending on the properties of your data.
Examples for QA scores are provided in the Pipeline CASAguide.
Known Issues
This section will be updated as the pipeline continues to be developed. The comments below are general to most pipeline versions, including the current production pipeline used for VLA data. The current production version is CASA 6.2.1.
- W-projection is not used for extended configurations; low frequencies (primarily S-band and L-band) may yield poor results.
- Primary beam correction is done only on the tt0 image, and does not use the CASA task widebandpbcor which corrects for the spectral dependence of the primary beam and also corrects the .alpha images produced by tclean.
- The image size limitation will also result in the primary beam (and side lobes) not being imaged for low-frequency data (primarily S-band and L-band). Therefore, strong sources outside the primary beam will not be deconvolved and may yield poor results.
- The weblog for hif_checkproductsize will show '-1 GB' for many entries. These are currently unused modes for cube images and maximum allowed size of products. Mitigation is currently only done to limit the size of image to a maximum of 16384x16384 pixels.
- For imported calibrated data (typically imaging-only recipes), hifv_import data will show a warning because there is a HISTORY table present from the calibration; this warning can be ignored.
- Ephemeris targets (i.e., Solar System objects) have not been validated with the imaging pipeline and are not expected to work at present.
Previous Imaging Pipeline Releases: