Data Storage and Retrieval

# Data Storage and Retrieval

Contributors: jrobnett, ebryer, mhatz, pmurphy

### Data Storage

Observer accounts reside on the Lustre filesystem in /lustre/naasc/observers/<account name>.   This area is also where you should store all data products and scratch files.   Lustre is a shared resource amongst all staff and observers, we ask that everyone keep their usage as far below the 3TByte limit as possible.

Please consult the NAASC Data Analysts in order to retrieve data directly from the archive into your home area.

### Data Retrieval

The NAASC supports the following methods for securely transporting data to remote facilities and has plans to support XSEDE's Globus Connect platform.  For the following examples, <user> is your cv-#### account name, your data area would be /lustre/naasc/observers/<user>.

#### SFTP

SFTP acts like an encrypted ftp protocol.  You can access your home area from remote facilities via: sftp sftp.cv.nrao.edu and login with your NAASC account name.   From there sftp behaves much like any ftp client.

The example below would connect user cv-1234 to the sftp server.  The current directory would be cv-1234's home lustre area.

sftp cv-1234@sftp.cv.nrao.edu

#### SCP

SCP is an encrypted copy that can transfer between remote hosts.  The format is scp user@remotemachine:/<remote_path> <local_path>. From your machine you would run scp <user>@ssh.cv.nrao.edu:<relative path to files> <local path>

The example below would copy all files ('*') in cv-1234's data directory to the current directory ('.') on your machine.

scp cv-1234@ssh.cv.nrao.edu:data/* .

#### LFTP

LFTP is a more sophisticated version of the classic ftp protocol which, among other things, uses multiple channels to speed performance.

lftp -u <user> sftp://sftp.cv.nrao.edu

RSYNC

RSYNC is a versatile file-copying tool that only copies necessary files, that is the ones that are missing in your local copy.  This is useful if, for example, you have deleted some files from your local copy of your JobID and want to copy just those missing files.

By default, and in the examples shown here, rsync uses ssh as the underlying transport protocol.  It works best if you have your identity already cached in your ssh-agent.

The example below would copy all the files in cv-1234's data directory to a local directory.  Without the trailing '/' rsync would copy the directory and its contents (so that your current local working directory will end up with a "data" subdirectory) , with a trailing '/' it copies only the contents of the remote "data" directory into the local working directory.  Adding '--delete' to the arguments list will keep the two areas exactly in sync by removing files on your local system's copy if they've been removed from the remote copy.

rsync -av cv-1234@ssh.cv.nrao.edu:data/ .

#### Browser Access

You can use the bulk.cv.nrao.edu web server to retrieve tarred files after your account is closed but before it is deleted.

Go to https://bulk.cv.nrao.edu/observers/<user-account>. You will need to login using the observer account name and my.nrao.edu password. This will allow you to navigate through your filesystem on a browser to view files.

#### GlobusOnline

The NAASC will be investigating a Globus Connect portal in coming months.

Filed under: