Transferring files with remote computers

Overview

Teaching: 15 min
Exercises: 15 min
Questions
  • How do I transfer files to (and from) the cluster?

Objectives
  • Transfer files to and from a computing cluster.

Performing work on a remote computer is not very useful if we cannot get files to or from the cluster. There are several options for transferring data between computing resources using CLI and GUI utilities, a few of which we will cover.

Download Files from the Internet with wget and git

One of the most straightforward ways to download files is to use either wget or git. These are usually installed in most Linux shells, on Mac OS terminal and in GitBash. Any file that can be downloaded in your web browser through a direct link can be downloaded using wget. This is a quick way to download datasets or source code. The syntax for this command is

git can be used to download files and code from a repository, such as GitHub. You can

Transferring Single Files with scp

To copy a single file to or from the cluster, we can use scp (“secure copy”). The syntax can be a little complex for new users, but we’ll break it down. The scp command is a relative of the ssh command we used to access the system, and can use the same public-key authentication mechanism.

To upload to another computer, the template command is

[you@laptop:~]$ scp local_file SUNetID@login.farmshare.stanford.edu:remote_destination

in which @ and : are field separators and remote_destination is a path relative to your remote home directory, or a new filename if you wish to change it, or both a relative path and a new filename. If you don’t have a specific folder in mind you can omit the remote_destination and the file will be copied to your home directory on the remote computer (with its original name). If you include a remote_destination, note that scp interprets this the same way cp does when making local copies: if it exists and is a folder, the file is copied inside the folder; if it exists and is a file, the file is overwritten with the contents of local_file; if it does not exist, it is assumed to be a destination filename for local_file.

Transferring a Directory with scp

To transfer an entire directory, we add the -r flag for “recursive”: copy the item specified, and every item below it, and every item below those… until it reaches the bottom of the directory tree rooted at the folder name you provided.

[you@laptop:~]$ scp -r local_dir SUNetID@login.farmshare.stanford.edu:

Caution

For a large directory – either in size or number of files – copying with -r can take a long time to complete.

When using scp, you may have noticed that a : always follows the remote computer name. A string after the : specifies the remote directory you wish to transfer the file or folder to, including a new name if you wish to rename the remote material. If you leave this field blank, scp defaults to your home directory and the name of the local material to be transferred.

On Linux computers, / is the separator in file or directory paths. A path starting with a / is called absolute, since there can be nothing above the root /. A path that does not start with / is called relative, since it is not anchored to the root.

If you want to upload a file to a location inside your home directory – which is often the case – then you don’t need a leading /. After the :, you can type the destination path relative to your home directory. If your home directory is the destination, you can leave the destination field blank, or type ~ – the shorthand for your home directory – for completeness.

With scp, a trailing slash on the target directory is optional, and has no effect. A trailing slash on a source directory is important for other commands, like rsync.

Transferring Data with rsync

As you gain experience with transferring files, you may find the scp command limiting. The rsync utility provides advanced features for file transfer and is typically faster compared to both scp and sftp (see below). It is especially useful for transferring large and/or many files and for synchronizing folder contents between computers.

The syntax is similar to scp. To transfer to another computer with commonly used options:

[you@laptop:~]$ rsync -avP local_file SUNetID@login.farmshare.stanford.edu:

The options are:

To recursively copy a directory, we can use the same options:

[you@laptop:~]$ rsync -avP local_dir SUNetID@login.farmshare.stanford.edu:~/

As written, this will place the local directory and its contents under your home directory on the remote system. If a trailing slash is added to the source, a new directory corresponding to the transferred directory will not be created, and the contents of the source directory will be copied directly into the destination directory.

To download a file, we simply change the source and destination:

[you@laptop:~]$ rsync -avP SUNetID@login.farmshare.stanford.edu:local_dir ./

Transferring Data with Globus

While scp and rsync are excellent tools for quick transfers from your local machine to HPC filesystems, sometimes you need to transfer many gigabytes or even terabytes of data. In this case, scp and rsync may not be robust enough to complete your data transfer; they require a constant connection. If there is an interruption in your connection then you will need to rerun the command, often from the beginning to ensure that files were not corrupted during the disconnect. This is where Globus comes in.

Globus is a not-for-profit service developed and operated by the University of Chicago. Globus allows you to move, share, and access data on any system where the Globus software is installed. Globus can be installed on your laptop or lab computer, and many universities and national labs have Globus installed for their core facilities and HPC systems.

Globus has three main advantages over scp and rsync:

Installing Globus on Your Local Machine

To use Globus to transfer data between your local machine and FarmShare, you will first need to install Globus on your own computer.

  1. Go to https://www.globus.org/globus-connect-personal.
  2. Click the “INSTALL NOW” link for the Globus Connect Personal for your operating system (e.g. MacOS, Windows, Linux).
  3. Follow the installation instructions for your operating system.

How to Transfer Data from Your Local Machine to FarmShare

Once Globus Connect Personal is installed on your local machine, you can transfer data between your computer and FarmShare.

  1. In your local machine’s web browser, go to https://app.globus.org/. You may need to authenticate with your SUNetID or Cardinal Key first (select “Stanford University” from the dropdown menu that appears).
  2. In the top right-hand corner, click the Panels icon that has two panels (the middle one).
  3. Above the left panel, click the Search field next to “Collection”.
  4. In the screen that opens, click “Your Collections” and select the collection that represents your local machine.
  5. Navigate to the files/folders you want to transfer and select them.
  6. Above the right hand panel, click the Search field next to “Collection”.
  7. Start typing “FarmShare” and select that collection once it appears.
  8. Navigate to the desired destination for your files.
  9. Once your source files and desired destination are selected, you can begin the transfer. Click the “Start” button over the left panel.

Transferring Data to Your Local Machine

The above steps can also be used to transfer data from FarmShare to your local machine. Simply select the files you want to transfer from FarmShare and the desired destination on your local machine, and then click the “Start” button over the right-side “FarmShare” panel.

Transferring Data with FarmShare OnDemand

FarmShare OnDemand is a web interface to the FarmShare HPC system. OnDemand allows you to manage your files, access a shell session, and use interactive apps like JupyterLab, RStudio, and VS Code. We will only discuss OnDemand’s File Manager below, but we will cover the other features in-depth in the next section.

Managing Files with the FarmShare OnDemand File Manager

To create, edit or move files, click on the Files menu from the Dashboard page. A drop-down menu will appear, listing your most common storage locations on FarmShare: $HOME, Class Directories, Group Directories, and $SCRATCH.

Choosing one of the file spaces opens the File Explorer in a new browser tab. The files in the selected directory are listed.

There are two sets of buttons in the File Explorer.

/hpc-intro/OnDemand%20File%20Explorer%20buttons%20next%20to%20filename

Those buttons allow you to View, Edit, Rename, Download, or Delete a file.

/hpc-intro/OnDemand%20File%20Explorer%20buttons%20in%20top%20right%20menu
Button Function
Open in Terminal Open a terminal window on Sherlock in a new browser tab
Refresh Refresh the list of directory contents
New File Create a new, empty file
New Directory Create a new subdirectory
Upload Copy a file from your local machine to Sherlock
Download Download selected files to your local machine
Copy/Move Copy or move selected files (after moving to a different directory)
Delete Delete selected files
Change directory Change your current working directory
Copy path Copy the current working directory path to your clipboard
Show Dotfiles Toggle the display of dotfiles (files starting with a ., which are usually hidden)
Show Owner/Mode Toggle the display of owner and permission settings

Working with Windows

When you transfer text files from a Windows system to a Unix system (Mac, Linux, BSD, Solaris, etc.) this can cause problems. Windows encodes its files slightly different than Unix, and adds an extra character to every line.

On a Unix system, every line in a file ends with a \n (newline). On Windows, every line in a file ends with a \r\n (carriage return + newline). This causes problems sometimes.

Though most modern programming languages and software handles this correctly, in some rare instances, you may run into an issue. The solution is to convert a file from Windows to Unix encoding with the dos2unix command.

You can identify if a file has Windows line endings with cat -A filename. A file with Windows line endings will have ^M$ at the end of every line. A file with Unix line endings will have $ at the end of a line.

To convert the file, just run dos2unix filename. (Conversely, to convert back to Windows format, you can run unix2dos filename.)

Key Points

  • wget and git clone download a file from the internet.

  • scp and rsync transfer files to and from your computer.

  • You can use Globus or FarmShare OnDemand to transfer data through a GUI.