Introduction

If you are moving data to or from the cloud (which includes Google Drive), the best option to do it from SCG is with rclone. rclone supports many kinds of data sources and destinations which include most types of cloud storage.

To use rclone, run module load rclone in your Terminal (or job script).

The first time you want to use a remote, you must configure it. Do this by running rclone config, choosing to add a New remote, and then following the instructions. Each kind of remote will have a different setup procedure.

Here are some instructions for certain cloud services commonly used with SCG:

Medicine Box

Medicine Box is the default Box instance used within the School of Medicine. It allows High Risk data and PHI. Because of this, rclone and SCG does not work with Medicine Box. If you need to move data from SCG to Medicine Box, the only option is to transfer the data to a macOS or Windows machine, and then move the data to Box.

Google Drive

Google Drive has two components:

  • Each Stanford user has their own, personal Google Drive space.

  • Google Shared Drives (formerly known as “Team Drive”) provides a common space for access to files, both for people within a Google Group, and for other users, including Google Service Accounts.

NOTE: Google Drive has a number of limitations:

  • Shared Drives may contain a maximum of 750,000 inodes. Each file and directory consumes one “inode” (just like with Oak storage).

  • Each user and Service Account may only upload 750 GiB of data per day. This is a hard limit and cannot be increased.

  • There are also limits on how many operations a user or Service Account may perform within Google Drive. These limits are not clearly documented. Accessing Google Drive too frequently may result in errors from the Google Drive service. rclone will recognize the situation and attempt to slow down, but may end up returning errors to the user. When that happens, waiting one day and re-trying the operation normally resolves the problem. Using your own OAuth client_id might also help, as that allows you to request API quota increases.

If you would like to use rclone to access your personal Google Drive and one or more Shared Drives, you will need to configure a separate remote for each.

The rclone documentation page for Google Drive is https://rclone.org/drive/. Before setting up a Google Drive remote with rclone, please review this entire section.

Before you begin configuring the remote, you should decide if you want to use your own OAuth client_id, and if you want to use a Service Account. Google Drive authenticates both the user performing an action and the software being used to perform the action.

  • The OAuth client_id is used to authenticate the client (in this case, rclone) to Google Drive. rclone ships with an OAuth client_id that is shared by all users of rclone across the world. If you plan on using rclone regularly with Google Drive, you should consider getting your own OAuth client_id. Doing so means the actions of others will not affect your API quota, and gives you access to request API quota increases from Google.

  • A Service Account decouples the Google Drive access from your Google account. When you authenticate to Google as yourself, actions performed using rclone are performed as you. If your Google account becomes disabled, or you upload too much data in a day (either through rclone or elsewhere), the rclone remote would stop working. If you place your rclone configuration in a shared location, then others could use the rclone configuration to perform actions in Google Drive as you. This can be avoided by creating a Google Service account, which acts as a separate user for access to Google Drive. However, a Service Account may only be used with Shared Drives.

To use a custom OAuth client_id or Service Account, you will need a GCP (Google Cloud) project. This means that a PTA is required, but Google Drive is not a charged service, so you should not see any charges from your GCP project (unless, of course, you start using it for something else).

If you are a user of SCG, you can set up a Google Cloud project through SCG. Otherwise, you can set up a Google Cloud project through University IT. Afterwards, you can use the following pages to set up an OAuth client_id and a Service Account:

You should now be prepared to create the Google Drive remote(s) in rclone. Remember that each Drive (personal, team) needs to be created as a separate remote.

When configuring a Google Drive remote, you will first be asked for a Client ID and Client Secret. If you chose to create an OAuth client_id, enter the ID and secret here.

For scope of access, you should either choose “drive” (for read-write access) or “drive.readonly” (for read-only access). Read-only access may be useful for workflows that only download data from Drive.

The root folder ID should be left blank.

If you chose to use a Service Account, you will get a JSON file containing Service Account credentials. This is the time to provide the path to the JSON file. If you are not using a Service Account, the file path should be left blank.

You should choose to not edit advanced config.

If you chose to not use a Service Account, you will now be asked to auto config. This is where you authenticate to Google, and give rclone access to either your Drive or a Shared Drive. In general, you should choose “No”. You will be given a (long) URL to enter into a web browser, which will log you in to Google and give you a (shorter) code to give to rclone.

You may now be asked to configure the remote as a team drive. If you used a Service Account, you should choose “Yes”. If you authenticated as yourself, you can choose “Yes” to connect to a Shared Drive, or “No” to connect to your personal Google Drive. When you choose “Yes”, you will see a list of Team Drives you can access. If you are using a Service Account and cannot see the Shared Drive you want, make sure the Service Account’s email address has access to the Shared Drive.

At this point, Google Drive remote configruation is complete! To see the list of folders in your Drive, use the lsd sub-command, like so:

rclone lsd my-drive:

(The above command assumes you named your Drive remote “my-drive”.)

Other services

To see the full list of cloud services rclone supports, see the rclone documentation.

When using rclone, if it asks you to use auto-config, you should normally say “No”. SCG is a remote/headless machine and choosing auto-config will normally launch a web browser, which won’t be able to come up via a terminal session. If you are using a virtual desktop via OnDemand, however, you can say “Yes” because rclone will be able to open a browser.

Here are quick links to remote-specific instructions that will be useful to SCG users:

Once your remote has been configured, you should check the rclone docs to see what commands are available. In the Subcommands section, we suggest reading up on the lsd, ls, copy, and sync subcommands.