Moving Data to/from SCG

Globus

When moving large amounts of data, either in size, large counts of small files or both, Globus offers an easy to use web interface, a personal connect client that can be installed on your laptop/desktop or on any system you have access to and provides easy access and security using your Stanford login. The Globus tools can be used to move, copy or sync data and will retry in the background on errors.

Almost all parts of SCG are accessible through Globus, using these collection names & IDs:

SCG Path Name
(UUID)
Sharable?
~ SRCC SCG Home
(2e23906b-0608-45bb-b344-393b8706e862)
No
/labs SRCC SCG Lab Storage
(3257fc54-9071-42fa-88ca-6097b2679b9a)
Yes
/projects SRCC SCG Project Storage
(2a975852-c740-4ff0-b8df-bc66d4888fc9)
Yes
/public SRCC SCG Public
(9299a0f9-06db-4109-910b-d3b590be2440)
Yes
/reference SRCC SCG Reference Data
(670e9ef9-70c9-46df-b213-f878e36cfe72)
No
/storage SRCC SCG Other Storage
(87ab9eb7-a4cb-4bf1-811c-c2cebac2a695)
Yes
/gssc Stanford GSSC Storage
(ce40ee4c-bf06-4847-bd24-f4b41a6f2581)
Yes

(The old SCG Cluster Storage endpoint is now deprecated, and should not be used.)

Collection names above are links. Click the link to be taken to the Globus web app and load the collection. To transfer files, click on the button “Open in File Manager”. To make a share or view your existing shares, click on the “Collections” tab.


SCG OnDemand File App

The SCG OnDemand File App (https://ondemand.scg.stanford.edu/) offers an intuitive interface to navigate SCG storage and upload or download files. It also includes built in tools to view and edit files in the web browser.


Samba

The Samba server at samba.scg.stanford.edu presents SCG storage to Stanford campus networks and VPN and makes it possible to easily mount the storage as a shared drive on your local system. Basic instructions/troubleshooting for each major Operating System are below or you can try this direct link if you are feeling lucky and aren’t using Windows:

Linux

Open a terminal and run kinit SUNeTID@stanford.edu replacing SUNetID with your SUNetID. For example,

griznog@gambusia:~$ kinit griznog@stanford.edu
Password for griznog@stanford.edu: 
griznog@gambusia:~$ klist
Ticket cache: FILE:/tmp/krb5cc_1000
Default principal: griznog@stanford.edu

Valid starting       Expires              Service principal
07/15/2019 13:08:37  07/16/2019 14:08:27  krbtgt/stanford.edu@stanford.edu
  renew until 07/22/2019 13:08:27
  griznog@gambusia:~$ 

In Linux open Nautilus or your distribution’s file manager app and in the path area or under a menu selection like Connect to server... enter

smb://samba.scg.stanford.edu/

and you should be presented with a window displaying the available network shares. The special share labeled with your SUNetID is your SCG $HOME directory.

Mac OS X

Open a terminal and run kinit SUNeTID@stanford.edu replacing SUNetID with your SUNetID.

Open Finder, select Connect to server and enter

smb://samba.scg.stanford.edu/

and you should be presented with a Finder window displaying the available network shares. The special share labeled with your SUNetID is your SCG $HOME directory.

Windows

Open the File Explorer, and enter the following path in the “address bar”:

\\samba.scg.stanford.edu\

At the login prompt, use sunetid@stanford.edu as the username, and use your SUNet password as the password. In other words, if your SUNetID is abc, use abc@stanford.edu as your username, and use your SUNet password as your password.

If login still fails, you will need to do some one-time configuration, to tell your computer how to properly authenticate. Run these commands in the Command Prompt application, (which you will need to run as Administrator):

ksetup /addkdc stanford.edu krb5auth1.stanford.edu
ksetup /addkdc stanford.edu krb5auth2.stanford.edu
ksetup /addkdc stanford.edu krb5auth3.stanford.edu
ksetup /addhosttorealmmap .stanford.edu stanford.edu

The first three commands will give you a warning: “Your realm name stanford.edu has lowercase letters.” This is expected, and you should say “Yes” at the prompt.

Once those commands have run, you will need to restart your computer. After that, you should be able to log in.


rsync

rsync is a time-tested tool for moving data both between remote systems and locally. With a large number of options and features, it’s impossible to completely cover all potential uses of rsync, but we are able to show how we recommend using rsync with SCG.

Basic rsync usage is…

rsync [options] [user@host:]/path/to/source [user@host:]/path/to/target

Note that only one of source and target can be a remote host. An example of copying files from my local system to SCG (including our preferred options) is…

rsync -rltp --chmod Dg+s -v --partial --progress /mydrive/mydata griznog@login.scg.stanford.edu:/labs/ruthm/

The above example includes the following options:

  • -r : Recurse into directories
  • -l : Copy symlinks as symlinks
  • -t : Preserve modification times
  • -p : Preserve permissions
  • --chmod Dg+s: Set the “setgid” bit on all directories. This is needed in SCG /labs and /projects storage, for permissions to work properly.
  • -v: Display a list of every file that is transferred; in general, be more verbose about what rclone is doing.
  • --partial : If the transfer fails, when you run again, try to reuse any partially copied files.
  • --progress : Show a progress bar with transfer speed for each file transferred. This is in addition to the verbosity of -v.
  • /mydrive/mydata : A source directory. Note that including a trailing /, e.g., /home/giznog/mydata/ will cause rsync to work on the contents of the directory rather than the directory itself. This is a subtle difference that can lead to confusion on the target copy.
  • griznog@login.scg.stanford.edu:/labs/ruthm/griznog/ : A target location. The trailing slash on a target has no significance.

Some other interesting and useful rsync options are:

  • --delete : Useful when running rsync to update a remote copy where you want to delete any files that have been deleted on the local copy.
  • --remove-source-files : In cases where rsync is being used to quickly clean up data, for instance to reduce usage due to quota, this option will remove files once they have been successfully copied rather than having to wait until the entire rsync completes and deleting them manually. It does not remove directories.
  • --dry-run : Show what rsync would do, but don’t actually do any copy or removal. Useful to test with --delete or --remove-source-files before running a potentially destructive rsync command.

sftp

sftp provides a secure/encrypted analogs to ftp for any remote sites where ssh access is available. Example usage:

griznog@lepomis:~$ sftp griznog@login.scg.stanford.edu
Connected to login.scg.stanford.edu.
sftp> ls
Desktop    Documents  Downloads  Logs       Music      Pictures   Projects   
Public     Scratch    Templates  Videos     Working    bin        myfile     
ondemand   rpmbuild   
sftp> help
Available commands:
bye                                Quit sftp
cd path                            Change remote directory to 'path'
chgrp grp path                     Change group of file 'path' to 'grp'
chmod mode path                    Change permissions of file 'path' to 'mode'
chown own path                     Change owner of file 'path' to 'own'
df [-hi] [path]                    Display statistics for current directory or
                                   filesystem containing 'path'
exit                               Quit sftp
get [-afPpRr] remote [local]       Download file
reget [-fPpRr] remote [local]      Resume download file
reput [-fPpRr] [local] remote      Resume upload file
help                               Display this help text
lcd path                           Change local directory to 'path'
lls [ls-options [path]]            Display local directory listing
lmkdir path                        Create local directory
ln [-s] oldpath newpath            Link remote file (-s for symlink)
lpwd                               Print local working directory
ls [-1afhlnrSt] [path]             Display remote directory listing
lumask umask                       Set local umask to 'umask'
mkdir path                         Create remote directory
progress                           Toggle display of progress meter
put [-afPpRr] local [remote]       Upload file
pwd                                Display remote working directory
quit                               Quit sftp
rename oldpath newpath             Rename remote file
rm path                            Delete remote file
rmdir path                         Remove remote directory
symlink oldpath newpath            Symlink remote file
version                            Show SFTP version
!command                           Execute 'command' in local shell
!                                  Escape to local shell
?                                  Synonym for help
sftp> 

scp

scp provides a secure/encrypted analog to cp which works with remote sources or targets. Example usage:

griznog@lepomis:~$ scp myfile griznog@login.scg.stanford.edu:
myfile                                               100%    0     0.0KB/s   00:00    

Useful options are -r for recursion (to copy directories) and -v for verbose output.