Facilities > VLA > Documentation > Manuals > Computing Resources

Computing Resources

« Return to page index

1. Overview

Overview

This document describes acceptable use of NRAO computing facilities for the purpose of calibration and imaging EVLA and VLBA observations at the New Mexico Array Science Center (NMASC).  The document describes the process of requesting an account, requesting resources and accessing data. In addition it enumerates available NRAO hardware and software and limits on the volume and duration of resources requests.

Users of NRAO computing resources must abide by the Acceptable Use Policy. Sharing of assigned user accounts is not permitted.

Resource request types and prioritization

The NMASC has finite computing resources, in the event of over subscription resources will be granted in the following priority order:

1.    Archive requests for data retrieval.
2.    Pipeline reprocessing requests to re-run NRAO supplied EVLA and ALMA pipelines with modified parameters
3.    Batch (script) submission to execute a user defined pipeline.
4.    Interactive use for direct CASA or AIPS interaction.

In the event that all resources are in use new tier 1 jobs will move to the top of the queue, then tier 2, then 3 and finally tier 4.  NRAO expects roughly 50% of the compute resources to be available for tier 4 interactive use. The distribution of job type is expected to change as more observers adopt pipeline reprocessing and batch processing modes.  

Over time the NRAO will examine finer grained prioritization, particularly within the batch and interactive queues, based on science rank, data size, time since observation or other parameters.


2. User Accounts and Remote Access

Requesting an account

A valid entry in the my.nrao.edu User Database (UserDB) is required for account access.  Please ensure your email address therein is correct before requesting an account.

To request a computer account perform the following steps

  1. Ensure your default email address is correct at my.nrao.edu
  2. Submit a ticket at https://help.nrao.edu.
    1. To log in, use the same user ID and password as when accessing the Proposal Submission Tool or the Observation Preparation Tool.
    2. Under 'Select a department' choose "VLA/GBT/VLBA Archive and Data Retrieval" which will ensure the ticket is directed to the appropriate group.
    3. Indicate how long you will need the account, the default is one month: 2 weeks for processing plus 2 weeks grace period to transfer data products.

A unique UNIX based computer account will be created upon receipt of the request ticket.  The account name will be ‘nm-<#ID>’ where <#ID> is your numeric UserID in the UserDatabase, the account password will be the same as the one used above to submit the request.

You will receive an automated email delivered to the address registered in the UserDB when the account is created.  The email will include your account name, account expiration time and a pointer to this documentation.

The assigned account allows access to the NMASC ssh portal, authenticated ftp (sftp) server, the processing cluster master server and any assigned cluster nodes.  It does not grant access to other NMASC systems or staff machines.

Requesting Resources

Cluster node reservation requests must be issued on the cluster master node: nmpost-master.  This server is not directly accessible from outside the NRAO.  To access it from a system outside the NRAO you must first login to the ssh portal:  ssh <username>@ssh.aoc.nrao.edu from there you can ssh to nmpost-master: ssh nmpost-master.

To reserve a cluster node for interactive use run the command nodescheduler on nmpost-master.  The command takes two arguments, # of days to reserve the node and number of nodes.  Please limit number of nodes to 1 unless you can show a need for multiple servers.  For example:

nmpost-master% nodescheduler --request 14 1

would reserve one node for 14 days.  Running nodescheduler with no arguments will provide usage information.   Once a node has been assigned (could take anywhere from seconds to hours depending on demand) you will receive an email listing which node you were assigned and how to release the node when you are done.  Please release the node when you are done or won't be using it for an extended period of time (days).

Once a node has been reserved, your account and only your account has access to it.  For information on available software (e.g. CASA, AIPS, Miriad) see https://science.nrao.edu/facilities/vla/docs/manuals/computing-resources/software

3. Connecting via VNC

Accessing the Cluster Remotely with VNC

While ssh will work fine if you are on the internal NRAO network, if you are trying to display things from a remote site we recommend using VNC.

Connect to the NRAO

From your local machine, login to the ssh portal ssh.aoc.nrao.edu with your username (e.g. nm-4386).  Skip this section if you are physically at the NRAO.

For Linux and Mac Machines

ssh nm-4386@ssh.aoc.nrao.edu

For Windows Machines

Install PuTTY, fill in the Host Name field and click Open.

Start the VNC Server

From the ssh portal, or some other NRAO machine, login to the node assigned to you (e.g. nmpost050)

ssh nmpost050

and start a VNC server with the following command

vncserver

The first time you run this, it should prompt you to set a password.  Do not use the same password as your username.  The system should then return something like:

New 'nmpost050:1 (nm-4386)' desktop is nmpost050:1

The 1 in this example is your session.  You will need this number later when you use your VNC client.

Connect to the VNC Server

The VNC Client used to connect to the VNC server is different depending on the OS you are using (Linux/RHEL, Linux/Ubuntu, MacOS)

Linux (RHEL, CentOS, SL, OEL, Debian)

If your local machine is an RHEL or Debian derivative, use vncviewer to start the VNC connection like so (assuming the session number is 1)

vncviewer -via nm-4386@ssh.aoc.nrao.edu nmpost050:1

If you are physically at the NRAO, skip the "-via" syntax like so

vncviewer nmpost050:1

Linux (Ubuntu)

If your local machine is Linux/Ubuntu, use remmina to start the VNC connection like so (assuming the session number is 1)

Launch the remmina program and select Connection -> New

 

Set the Name to something descriptive like NRAO Cluster, change the Protocol to VNC - Virtual Network Computing, set the Server to the node assigned to you followed by a colon and the session number (e.g. nmpost050:1), set the User name (e.g. nm-4386).   Then select the SSH tab

 

Check the box for Enable SSH tunnel, select Custom and set it to ssh.aoc.nrao.edu, set the User name (e.g. nm-4386), click on Save and then Connect.

 

 

Mac

If your local machine is Mac, open a Terminal by opening the Applications folder then the Utilites folder then double-click on the Terminal application. In the terminal, start an SSH tunnel.  In the following examples, 5901 is derived by adding 5900 to the session number from above.

ssh -L 5901:nmpost050:5901 nm-4386@ssh.aoc.nrao.edu

Leave that terminal in the background.  Then in the Finder, Pull down the GO menu and choose Connect to Server. For the server address, specify:

vnc://localhost:5901

You will be challenged for the VNC password you set up (likely at the time you launched the vnc server).

 

Windows

If your local machine is Windows, use a VNC client like the Java Viewer from TightVNC with the following setup.  The port number can be found by adding 5900 to the session number.  So in the above example, with a session number of 1, the port will be 5901.    If you are physically at the NRAO, leave the "SSH Server" line blank.

End the VNC Server

Commands that are run in this VNC session will continue to run even after closing your local VNC client. Once all processes are done, you should close your VNC server by connecting via ssh again to the nmpost cluster node and running (assuming the session number is 1)

vncserver -kill :1

4. Resource Limits and Data Retention

Allocation and limits on processing resources applies to both NRAO staff and observers.

Quota Limits

Beginning June 22nd, 2017 quotas will be enforced via native Lustre filesystem quotas based on user's default group.

Users are limited to 5TB of space in their /lustre/aoc/observers/<username> data area on the shared Lustre filesystem.  User data in large project areas is accounted for via separate project quotas and does not count toward the users quota. 

You can view your current usage with lfs quota -g <group> /lustre/aoc where <group> is your account name; observer account names and their default group are the same string.

The following shows the quota for the account nm-4386.

# lfs quota -g nm-4386 /lustre/aoc
Disk quotas for group nm-4386 (gid 24386):
     Filesystem  kbytes   quota   limit   grace   files   quota   limit   grace
    /lustre/aoc 5455723856* 5368709120 6442450944 6d22h49m58s  527602       0       0       -

To accommodate brief processing spikes Lustre incorporates a one week grace period where usage can increase by 20% to 6TB. Attempts to write above 6TB or failure to reduce back below 5TB after the grace period will trigger an I/O error. 

The above example shows the account is over the 5TB quota (trailing "*") but below the 6TB hard limit with 6 days and 22 hours remaining on the one week grace period.

Cluster Resource limits

For interactive sessions users are limited to one compute node. Interactive session nodes are assigned for 1 to 14 days and can only be accessed by the reserving account.

During periods of increased pressure, users may be asked to release idle nodes to allow broader community access.

For batch requests users are asked to limit in flight batch jobs to one node's worth of jobs.


Requests for increased access duration, storage space and compute nodes should be submitted as a ticket to https://help.nrao.edu (Data archive and Download department) and will be reviewed by designated NMASC or NAASC staff on a case by case basis.

Data Retention

External accounts, along with any data products, will be removed two weeks from the completion of a processing request.    For interactive sessions, the account and data products will be removed two weeks from the end of the session.  You will receive an email warning prior to account deletion.  In the event of multiple processing requests, the account expiration date will be triggered by the last request to complete.

Large Proposals

Observers associated with large proposals (greater than 200 hours) who plan to use NRAO computing resources are encouraged to request the creation of a project area on the Lustre filesystems and a Unix group for shared data access among proposal members.

Project area size limits will be negotiated with NRAO at the time of the request to match project data rates and imaging plans but will typically be 10TB in size.  Quotas for projects are enforced via the project group.



5. Available Hardware Resources

NMASC has a post processing environment comprised of a processing cluster to support CASA execution and a data storage cluster for support of the Lustre filesystem.  Resource allocations of NRAO facilities are limited by available nodes (servers) in the processing cluster and total space within the storage cluster.

Processing Cluster

The NMASC has a 60 node compute cluster.  Each node has dual 2.6GHz, 8-core Intel E5-2670 Sandy Bridge processors.  Memory ranges from 64GB to 256GB..  The compute nodes have no local disk storage, instead they are connected to a distributed parallel filesystem (Lustre) via a 40Gbit Infiniband network.  The compute cluster supports automatic EVLA pipeline processing, archive retrievals, batch processing requests and interactive processing sessions.

Lustre Filesystem

Lustre is a distributed parallel filesystem commonly used in HPC environments.  Each cluster node sees the same shared filesystem.  The NMASC Lustre filesystem is made up of 8 storage servers each with four RAID arrays (32 total arrays) which are 44TB in size.  The total storage volume is currently 1.4PB.  Individual nodes can read and write to the Lustre filesystem in excess of 1GByte/sec.  The entire filesystem can sustain roughly 15GByte/sec aggregate I/O.

The Lustre filesystem appears as /lustre/aoc on all Lustre enabled client computers.  As of March 2016, there are 200 such computers.

Public Workstations

The NMASC has 10 workstations for local visitors.  The systems vary in processor, memory and storage since work is expected to be done mostly on the compute cluster, but all have 10Gbit connections to the Lustre filesystem.  Instructions for reserving workstations can be found in the Computing section of the Visiting the DSOC web page.

6. Data Storage and Retrieval

Data Storage

Observer accounts reside on the Lustre in /lustre/aoc/observers/<user>.   This area is also where you should store all data products and scratch files.   Lustre is a shared resource amongst all staff and observers, we ask that everyone keep their usage as far below the 5TByte limit as possible.

Please see the following special instructions for retrieving data directly from the archive into your home area.  Put note about NGAS account file ownership

Data Retrieval

The NMASC supports the following methods for securely transporting data to remote facilities and has plans to support XSEDE's Globus Connect platform.  For the following examples, <user> is your nm-#### account name, your data area would be /lustre/aoc/observers/<user>.

SFTP

SFTP is an encrypted ftp protocol.  You can access your home area from remote facilities via: sftp sftp.aoc.nrao.edu and login with your NMASC account name.   From there sftp behaves much like any ftp client.

The example below would connect user nm-1234 to the sftp server.  The current directory would be nm-1234's home lustre area.

sftp nm-1234@sftp.aoc.nrao.edu

SCP

SCP is an encrypted copy that can transfer between remote hosts.  The format is scp user@remote machine:/<remote path> <local path>. From your  machine you would run scp <user>@ssh.aoc.nrao.edu:/lustre/aoc/observers/<user>/<path to files> <local path>

The example below would copy all files ('*') in nm-1234's data directory to the current directory ('.') on your machine.

scp nm-1234@ssh.aoc.nrao.edu:/lustre/aoc/observers/nm-1234/data/* .

LFTP

LFTP is a more sophisticated version of the classic ftp protocol which, among other things, uses multiple channels to speed performance.

lftp -u <user> sftp://sftp.aoc.nrao.edu

RSYNC

RSYNC is a versatile file-copying tool that only copies necessary files, that is the ones that are missing in your local copy.  This is useful if, for example, you have deleted some files from your local copy of your JobID and to copy just those missing files.

The example below would copy all the files in nm-1234's data directory to a local directory.  Without the trailing '/' rsync would copy the directory and its contents , with a trailing '/' it copies only the contents of the directory.  Adding '--delete' to the arguments list will keep the two areas exactly in sync by removing files on your system's copy if they've been removed from the NMASC copy.

rsync -vaz nm-1234@ssh.aoc.nrao.edu:/lustre/aoc/observers/nm-1234/data/ .

Browser Access

Go to https://archive.nrao.edu/observers/<user>. You will need to login using the observer user name and my.nrao.edu password. This will allow you to navigate through your filesystem on a browser to view and download files.

GlobusOnline

NMASC will be adding a Globus Connect portal in coming months.

7. Software

For detailed information regarding data processing refer to the CASA and AIPS documentation pages.

The following information addresses details specific to executing CASA or AIPS at the NMASC by our visiting observes with accounts such as nm-####.

CASA

    • To start the current release version of CASA, type
casa
    • CASA derivatives, such as plotms or the viewer, can be started outside of the casa session, by typing
casaplotms
casaviewer
    • To start the current release version of CASA with integrated pipeline tasks, type
casa-pipe

See the CASA pipeline webpage for more details.

To see a list of other available CASA versions, type:

casa -ls

to start one of these versions, type casa -r <version>, where <version> is the release number as it appears in the list. For instance, to start casa 4.4.0, type,

casa -r 4.4.0-el6

 

AIPS

When your nm-#### account is created, the necessary files to execute AIPS are also made by default. Therefore, to run aips on a cluster node, type

aips tv=local

8. Reporting Problems

Please report any problems through the NRAO Helpdesk.  To log in, use the same user ID and password as when accessing the Proposal Submission Tool or the Observation Preparation Tool.  Note this is a different account name but the same password used in accessing your temporary computer account.

9. Access Control Lists

Access Control List overview

Access Control Lists (ACLs) can be used to grant access to files or directories for specific users or groups to enable small teams of observers to create ad hoc sharing rules without exposing their data to the public. ACLs behave much like standard Unix level read, write, and execute settings for owner, group and other but can be applied to any number of users or groups and can be flagged to apply to all new files or sub-directories within a directory.

When observer accounts are created, ACLs are added to the home directory for the observers-mgr group and the apache account to enable Data Analyst support access, Archive Access Tool direct write capability and https based data retreival back to a home institution.

Access Control List details

ACLs are set with the command setfacl and can be queried with the command getfacl.

Typical format for setfacl is:

setfacl -[R]{m|x} [default:]{u|g}:{<username>|<groupname>}:<mask> <path>

Where -R specifies recursive, -m or -x states to modify or remove, default defines whether to make the ACL the default for new files,  u or g defines what type of ACL to apply (users or group) to <username> or <groupname><mask> is standard r,w,x permissions and <path> is a file or directory.

To enable access to an existing directory typically one would run setfacl twice, once to set the ACL for all existing files and sub-directories and once to set the default for the existing directory and sub-directories to ensure new files properly inherit the ACL.  The first execution is needed because default mode only applies ACLs to directories, new files will inherit the ACL but existing files will not.

Note that it is possible to have multiple *default* ACLs on a directory.  All ACLs flagged as default on a directory will be applied to subsequently created sub-directories or files.  Directories must include the eXecute bit to enable traversal.

ACLs can only be set by system administrators as root or the owner of the file or directory similarly to chmod rules.

Once ACLs have been set on a file or directory, it is best to continue to use ACLs for permissions instead of using chmod, as chmod can sometimes have unintended effects on existing ACLs.  You can use ls -l to see if a file or directory has ACLs.  Look for the + sign at the end of the permissions section.  For example:

drwxrws---+ 2 nm-4386 nm-4386 4096 May 25 16:57 data/

Acess Control List Examples

Setting ACL with setfacl

To set an ACL to allow observer nm-4386 read/execute access to the home directory of observer nm-6889 do:

setfacl -m u:nm-4386:rx ~nm-6889

Note the above only applies to the directory, it will have no effect on existing or new files or sub-directories. The example for getfacl below shows this ACL

 

To set a default ACL to allow the observers-mgr group (ie data analysts) to the data subdirectory in the nm-6889 home directory do:

setfacl -m default:g:observers-mgr:rwx ~nm-6889/data

Note the above will not affect existing files or sub-directories but all new files and sub-directories will inherit the ACL.

To remove the above acl do:

setfacl -x default:g:observers-mgr ~nm-6889/data

 

To provide user nm-6889 read access to all existing files in the JVLA VLASS project opt_scripts directory do:

setfacl -m u:nm-6889:r /lustre/aoc/projects/vlass/opt_scripts/*

Note the above will only apply to existing files. 

 

The following default ACL would have to be set to the parent directory to enable access for subsequent files:

setfacl -m default:u:nm-6889:r /lustre/aoc/projects/vlass/opt_scripts

To remove the above acl do:

setfacl -x default:u:nm-6889 /lustre/aoc/projects/vlass/opt_scripts

 

Querying ACL with getfacl

To view the acls on user nm-6889 home directory do:

getfacl ~nm-6889

 

  

Below is the output of the above query with ACLs pointed out, all lines merely reflect standard Unix permissions applied to the directory:

>getfacl ~nm-6889 getfacl: 
Removing leading '/' from absolute path names 
# file: lustre/aoc/observers/nm-6889 
# owner: nm-6889 
# group: nm-6889 
# flags: -s- 
user::rwx   
user:nm-4386:r-x                   <------user level r-x ACL set for user nm-4386 
group::--- 
group:obs-apache:r-x               <----- group level r-x ACL set for group obs-apache 
group:observers-mgr:rwx            <----- group level rwx ACL set for group observers-mgr 
mask::rwx 
other::--- 
default:user::rwx 
default:group::--- 
default:group:obs-apache:r-x       <----- group level r-x default ACL set for group obs-apache 
default:group:observers-mgr:rwx    <----- group level rwx default ACL set for group observers-mgr 
default:mask::rwx 
default:other::---