DiFX Correlator
Introduction
The VLBA correlator is situated in the DSOC, at the end of the data path. Its role is to reproduce the signals recorded at the VLBA stations and any others involved in the observation, and to combine them in two-station baseline pairs, to yield the visibility function which is the fundamental measurement produced by the VLBA. VLBA observations are processed using the DiFX software correlator. DiFX was developed at Swinburne University in Melbourne, Australia (Deller et al. 2007), and adapted to the VLBA operational environment by NRAO staff (Brisken 2008). Subsequent references to "DiFX" apply specifically only to this VLBA implementation.
We encourage users to include the following text in the Acknowledgments section of any publication arising from VLBA observations made since December 2009:
This work made use of the DiFX software correlator developed at Swinburne University of Technology as part of the Australian Major National Research Facilities program.
... and to cite the following paper by the developers: Deller, et al. 2011, PASP, 123, 275.
Software correlation is especially well suited to applications like VLBI with bandwidth-limited data-transmission systems and non-realtime processing. Among its several advantageous aspects are: (1) flexible allocation of processing resources to support correlation of varying numbers of stations, frequency and time resolution, and various special processing modes, with no fundamental fixed limits other than the finite performance of the processing cluster; (2) optimization of resource usage to minimize processing time; (3) integration of control and processing functions; (4) continuously scalable, incremental upgrade paths; and (5) relatively straightforward implementation of special modes and tests. These and other virtues of software correlation are discussed in more detail by Deller et al. (2007).
Despite the absence of fixed limits cited in item (1) above, guidelines have been established for the extremes of spectral resolution, integration period, and output rate, for routine DiFX processing, as specified in the appropriate sections below. Exceptions will be considered for proposals including a sufficiently compelling scientific justification.
The VLBA DiFX correlator is not configured to process data from a single antenna, nor is a multi-station autocorrelation-only mode available.
Operation of DiFX is governed primarily by an observation description in VEX format (currently vex1.5). This format is used for both station and correlator control functions in a number of VLBI arrays, and VLBA program SCHED (Walker 2011) has been producing it for many years.
The VLBA and HSA stations currently record data exclusively on Mark 6 disk modules and DiFX fully supports Mark 6 data. DiFX also has limited support for data recorded on Mark 5 disk modules, as recorded by a Mark 5A, Mark 5B, Mark 5B+, or Mark 5C recorder. Support for VDIF format is currently incomplete but includes those versions created by the VLBA RDBE and the VLA WIDAR correlator. Modes recorded at EVN stations are also fully supported.
Correlator output is written according to the FITS Interferometry Data Interchange Convention (FITS-IDI; Greisen 2009). In addition to the fundamental visibility function measurements and associated meta-data, the FITS files include amplitude and phase calibration measurements, weather data, and editing flags, all derived from data logged at the observing stations. An up-to-date release of AIPS is required to handle DiFX data properly.
Conversion of DiFX correlator output to the Mark 4 format that is used primarily in analysis of geodetic observations is also available. To enable this additional output, a SCHED parameter CORDFMT=MARK4 should be specified.
Spectral Resolution
DiFX allows quite flexible selection of the desired number of "spectral points" spanning each individual data channel. Any number that can be factored as 2n · 5m can be specified, subject to these limitations:
- A maximum of 4096 points per channel, for routine DiFX processing.
- A total of 132,096, summed over all channels and polarization products, for compatibility with AIPS.
- A minimum spectral resolution of 2 Hz.
The number of spectral points must be the same for all data channels at any given time, although multiple passes are possible with different sets of channels. The actual spectral resolution obtained, and statistical independence of the spectral points, depends on subsequent smoothing and other processing.
DiFX also supports "spectral zooming'', selection of a subset of correlated spectral points from any or all data channels. Only the selected spectral points are included in the output dataset. This capability is of value mainly in maser studies, where a recorded data channel may be much wider than the maser emission in two main categories of observations: (1) Maser astrometry with in-beam continuum calibrators. Wideband observing is required for maximum sensitivity on the calibrators, while zooming allows high spectral resolution at the frequencies where maser emission appears. (2) Multiple maser transitions. When wideband data channels are used to cover a large number of widely separated maser transitions, spectral zooming allows the empty portions of high-resolution spectrum to be discarded.
Spectral zooming does not work with mixed sideband observing, this can happen for HSA or global observations where some telescopes require upper sideband and others require lower sideband. In proposing observations that will use spectral zooming, the required number of spectral points before zooming should be specified in the Proposal Submission Tool. Currently, the location and width of the "zoom" bands must be communicated directly to VLBA operations before correlation.
Integration Period
DiFX accommodates a nearly continuous range of correlator integration periods over the range of practical interest. Individual integrations are quantized in multiples of the indivisible internal FFT interval, which is equal to the number of spectral points requested, divided by the data channel bandwidth.
For most cases, with low to moderate spectral resolution, and/or wide data channels, the FFT intervals are fairly short, and it is straightforward to find an integration period in any desired range that is an optimal integral multiple of the FFT interval. ("Optimal'' refers here to the performance of DiFX.) Extreme cases of very high spectral resolution (many spectral points across a narrow data channel - resolution of less than about 100 Hz) imply FFT intervals long enough that only limited choices of integral multiples are available.
For flexibility in these situations (although the option exists in all cases), integration periods other than an integral multiple of the FFT interval can be approximated, in a long-term mean, by an appropriate sequence of nearby optimal integral multiples. In this case, output records are time-tagged as if correlated with exactly the requested period.
SCHED accepts an additional parameter so that users can indicate that the requested integration period is to be implemented exactly, as described above. Otherwise, the nearest optimal integral multiple of the FFT interval is passed to the correlator.
Pulsar Gating and Binning
DiFX supports the following three time-based selection modes to facilitate pulsar observations. In all cases, a pulsar spin ephemeris must be provided by the user. Except for certain applications of mode 3, the ephemeris must be capable of predicting the absolute rotation phase of the pulsar. Pulsar modes incur a minimum correlation-time penalty of about 50%. High output data rates may require greater correlator resource allocations.
- Binary Gating: A simple pulse-phase driven on-off accumulation window can be specified, with "on" and "off" phases. Such gating increases the signal to noise ratio of pulsar observations by a factor of typically 3 to 6, and can also be used to search for off-pulse emission.
- Matched-filter Gating: If the pulse profile at the observation frequency is well understood and the pulse phase is very well predicted by the provided pulse ephemeris, additional signal to noise over binary gating can be attained by appropriately scaling the correlation coefficients as a function of pulse phase. Depending on the pulse shape, additional gains of up to 50% in sensitivity over binary gating can be realized.
- Pulsar Binning: This mode entails generating a separate visibility spectrum for each requested range of pulse phase. There are no explicit limits to the number of pulse phase bins that are supported, however, data rates can become increasingly large. Currently AIPS does not support databases with multiple phase bins. Until post-processing support is available, a separate FITS file will be produced for each pulsar phase bin.
Details of pulsar observing, including practical aspects of using the pulsar modes, and limitations imposed by operations, are documented by Brisken & Deller (2010).
Multiple Phase Centers
The field of view in VLBI observations is very small, around 10-4 of the primary antenna beam area. This restricted interferometer beam arises in the correlation process from smearing at positions away from the correlation phase center, due to averaging in time (with, typically, a 2-second period) and/or across bandwidth ("chromatic aberration'' over, typically, 0.5 MHz spectral resolution). Thus, imaging of targets that are widely spaced in the primary beam requires multiple processing passes in typical correlator implementations. If the visibilities are maintained at high time and frequency resolution, it is possible to perform a u-v shift after correlation, essentially repointing the correlated dataset to a new phase center. However, this approach would require prohibitively large visibility datasets.
DiFX implements multiple u-v shifts inside the correlator, to generate as many phase centers as are necessary, in a single correlation pass. The output consists of one dataset of normal size for each phase center. This mode consumes around three times the correlator resources of a normal continuum correlation, due to the need for finer frequency resolution before the u-v shift, but the additional cost is only weakly dependent on the number of phase centers. For reasonable spectral and temporal resolution requirements (for example, adequate for smearing < 10% at the 50% contour of the VLBA primary beam), 200 phase centers require only 20% more correlator time than 2 phase centers. Extremely high spectral and/or temporal resolution (e.g. for shifts even closer to the edge of the primary beam) carry a higher overhead per additional phase center. This mode thus should be requested only for imaging of three or more sources within any single antenna pointing. The correlator output rate expands proportionally to the number of phase centers.
Correlator memory limits the product of baselines, spectral points, and phase centers for one correlator pass. The current limit is approximately 600 phase centers for the 10 element VLBA at 2 Gbps record rate (512 MHz polarization-summed bandwidth) for dual polarization products. Two correlator passes may be necessary for 600 phase centers with dual polarization products using the 4 Gbps record rate. Full polarization products reduce the maximum number of phase centers per correlator pass by a factor of 2. An unlimited number of phase centers can ultimately be achieved in multiple correlation passes, regardless of record rate or polarization setup.
Multiple phase-center correlation is requested in the NRAO Proposal Submission Tool by setting the "Number of Fields" item in the resource section to the maximum number of phase centers required for any antenna pointing specified in a given resource. The requested spectral resolution and integration time should correspond to the desired initial number of spectral points per data channel (required to minimize bandwidth smearing) and the desired integration between u-v shifts (to minimize time smearing). A resulting expanded output data rate that exceeds the current limit, as well as any required multiple passes, must be justified specifically in the proposal.
SCHED includes facilities to support specification of the actual phase center locations to be used in correlation.
For more details on wide-field imaging techniques, see Bridle & Schwab (1999), and Garrett et al. (1999).
Output Rate
Correlation parameters should result in an output rate less than 10 MBytes per second (of observing time) for routine DiFX processing; higher rates may be considered if required and adequately justified. Observers should ensure that their data-analysis facilities can handle the dataset volumes that will result from the correlation parameters they specify.
An approximate parametrization of the output rate is given by
\[R = 4 \cdot \frac{N_{\rm stn} \cdot (N_{\rm stn}+1) \cdot N_{\rm sbb} \cdot N_{\rm spc}}{T_{\rm int}}\cdot N_{\rm ppb} \cdot N_{\rm phc} \cdot p\]
where the rate \(R\) is in Byte/s;
\( N_{\rm stn} , \; N_{\rm sbb} , \; N_{\rm spc} \) are the numbers of observing stations, data channels, and spectral points per data channel, respectively;
\(T_{\rm int}\) is the correlator integration period;
\(N_{\rm ppb}\) is the number of pulsar phase bins; and
\(N_{\rm phc}\) is the number of phase centers.
The polarization factor \(p=1\) for single-polar, or dual-polar parallel-hand output; or \(p=2\) for cross-polar, four-Stokes processing.
Output data rates are also estimated by SCHED.
Source position
The most accurate possible source positions are needed for generating the proper correlator models for data processing. NRAO maintains a list of milliarcsecond positions for strong sources that appear in astrometric VLBI or connected-element interferometer catalogs. Positions generally are taken from the schedule file, so it is essential that the schedule file have the most accurate possible source positions. To keep fringe rate decorrelation low, source positions for correlation should be accurate to:
\[\sigma_{\Theta} (arcsec) < \frac{22}{t_{int}* \nu_{obs}}\] |
where tint is the correlator integration time in seconds and νobs is the observing frequency in GHz. However, it is desired that positions be better than this by a factor of at least 3-5, to provide the best results. When phase-referencing (see below), and even more so for astrometry, the source position errors are a larger problem and should be kept as low as possible (fraction of a mas is best). The correlator model is very detailed, and used to best advantage when source positions are as accurate as possible.
An accurate source position service is also available to obtain positions accurate enough for correlation.
Connect with NRAO