SRDP > Scripted Access to the NRAO Archive

Scripted Access to the NRAO Archive

The web-based interface to the NRAO archive can fulfill the needs of many users. However, the browser interface can be difficult (or unusable) for users that need to execute more sophisticated queries and collation/filtering of search results. Thus, we have developed the ability to query the NRAO archive using Virtual Observatory (VO) protocols. This enables users to interact with the archive metadata in ways that are inefficient or impossible with the web-based interface.

The scripted interface conforms to the VO Table Access Protocol (TAP) standard and can be queried using the VO interface. We have primarily tested the service using the Python pyVO package, but should work with other VO enabled services (e.g., TopCat). To access the service, they should use the server address listed below (this is not a webpage but the address for the VO service).

Jupyter Notebook Demonstrating TAP Access

TAP Server address: https://data-query.nrao.edu/tap

Downloads are not yet possible through this scripted interface. Thus, users will need to use the scripted interface to identify the data they are looking for and then download those products through the web interface.

 

Getting Started for Users New to VO

We have created a demonstration Jupyter Notebook example that includes querying the archive and filtering/collecting the results. Click here to download the demonstration notebook. The example demonstrates querying by position and radius from that position. The search returns will contain all 'scans' within the archive that contain the queried coordinates. We then go on to show how to identify the unique Execution Blocks ([E]VLA, VLBA, ALMA) that contain the source and sum the observe time on source in a given Execution Block. This can enable users to more efficiently obtain this information than can be garnered from the web interface.

 

Columns Returned

The TAP interface contains many columns, and we describe them briefly here:

Most useful columns:
s_ra - right ascension
s_dec - declination
s_fov - field of view (may not be accurate for non-VLA data)
obs_publisher_did - file set ID, or execution block name (a unique name to locate in archive web interface)
target_name - name of target in NRAO archive
t_min  - observation start time (MJD)
t_max - observation end time (MJD)
t_exptime - exposure time of scan
freq_min - minimum frequency (Hz)
freq_max - maximum frequency (Hz)

em_min - minimum wavelength (meters)
em_max - maximum wavelength (meters)
facility_name - NRAO
instrument_name - VLBA, VLA, ALMA, GBT
pol_states - polarization products available
dataproduct_type - visibility or image
calib_level - calibration level of data (currently all are 1)
access_estsize - size of data in archive - fixed to be int64 and provide accurate sizes
configuration - Array configuration (VLA and ALMA)

New Columns Added in May 2024:

num_antennas - number of antennas in observation
max_uv_dist - maximum uv-distance in meters (unused for VLBA)
spw_names - names of each spectral window, contains expected spw ID once filled to MS (comma separated string)
center_frequencies - center of each spectral window (comma separated string, units of Hz)
bandwidths - bandwidth of each spectral window (comma separated string, units of Hz)
nums_channels - number of channels in each spectral window (comma separated string)
spectral_resolutions - spectral resolution of each spectral window (comma separated string - units of Hz)
aggregate_bandwidth - entire bandwidth covered by the data

 

Columns that are less useful at present:

obs_id - science product locator
em_res_power - spectral resolving power (unused)
t_resolution - time resolution (integration time)
access_url - where to get the data (currently just the archive webpage)
obs_collection - Archive Science Product Locator
em_xel - currently not used
o_ucd  - currently not used
s_region - currently not used
s_resolution - currently not used
access_format - data format (visibility or image)