Command Line Training¶
To make the most out of computing infrastructures like the ScienceCloud, ScienceCluster, as well as the Supercomputer - Alps, you may find it helpful to learn to use the Command Line Interface (CLI) for scientific computing.
All our computing services run on open source, Linux-based operating systems. You may have heard of Ubuntu, a popular Linux distribution that also powers massive supercomputers. See Context below, for more info about open source and scientific computing.
These training materials should help you:
- Learn the fundamentals about the command line and how it relates to your research computing workflows.
- Introduce you to the structure of the Linux filesystem
- Acquaint you with the basic commands and syntax of a shell programming language (which will transfer to multiple shells and operating systems).
Bash
Please note: although many of these concepts apply across shell languages, the provided training materials use the Bash shell language. Using a different shell on your personal computer may result in slight variations from the examples.
For the best experience consider taking the Command Line training workshop.
How do I work through these materials?¶
The terminal application you use to access a command line on your local computer depends on your operating system; here are the default options:
- MacOS and Linux: use the Terminal application
- Windows: use PowerShell, or consider WSL or Multipass - both will give you the ability to install an Ubuntu Linux virtual machine.
Either take the training workshop to gain access to an Ubuntu Linux virtual machine, or use a terminal application on your local machine.
From these terminal applications you can either use your operating system's default shell (which may result in slightly different outputs than shown in these examples), or you can use the ssh command/program to connect to a machine that uses Bash (the shell language used to develop these materials).
For the course, as well as whenever you use the ScienceCloud and the ScienceCluster, you will use ssh to connect to a virtual machine (VM).
What is the "command line"?¶
The "command line" interface (often abbreviated "CLI") is a system that allows users to interact with a computer using typed commands.
Learning how to use a CLI will not only give you greater skills with computers, it will allow you to customize your research workflows so that you can make optimal use of the most powerful computing infrastructures.
Filesystems¶
What is a "filesystem"?¶
At an abstract level, one could model a computer as a machine that necessarily includes:
- Datasets and a system for storing such data
- Programs and applications that run both the computer system itself in addition to manipulating the available data
All data for a computer (i.e., datasets, user software, operating system software, etc.), is stored within what is called a filesystem. It is the filesystem that dictates how data is structured on any storage device (e.g., a hard-drive, a USB stick, etc.).
Importantly, there are multiple types of filesystems, and not all filesystems are compatible with all computer operating systems.
Examples of filesystems include:
- vfat: an older filesystem used by MS DOS
- ntfs: the default filesystem for Windows
- ext4: the default filesystem in most GNU/Linux distributions; used for ScienceCloud volumes
- apfs: the MacOS filesystem
Structure¶
Although not all filesystems are identical, many of them share a similar hierarchical tree structure.
In a hierarchical tree filesystem everything starts from the root directory, which is represented in Bash and other command line languages as /.
⚠️ The / Character
The / character alone represents the entire root directory and all its subdirectories. If a command acts or operates on the / symbol, especially recursively, then it will affect the entire filesystem.
Here's a diagram of a sample filesystem:
/
├─ bin/
│ └─ ...
├─ home/
│ ├─ user/
│ │ ├─ Documents/ ← this is an example directory!
│ │ │ └─ example.txt ← this is an example file!
│ │ └─ Pictures/
│ │ └─ photo.png
│ └─ second_user/
│ └─ ...
├─ sbin/
│ └─ ...
├─ var/
│ └─ ...
└─ .../
Within a hierarchical filesystem, a directory is a "branch" on the hierarchical tree. When using a GUI to control a computer's files, directories are commonly represented as folders. Thus, files can be thought of as being located at (or within) a specific directory (just as files can be considered as being within folders on a graphical desktop).
Directories themselves can have directories within or under them, which are called subdirectories.
Some familiar directories/locations you will see in many filesystems are:
bin/: includes system user command binaries ("bin" is short for "binaries")home/: includes alluserhome directories for the systemsbin/: includes essential system command binariesvar/: includes "variable length" types of files (e.g., logs, temporary files, etc.)
There are many other directories/locations you'll find across operating systems. It's important to remember: not all locations in the filesystem are safe to freely alter. Changing files in certain locations can lead to operating system failure or corruption.
Dotfiles¶
In order to help keep filesystems as accident-proof as possible, filesystems make use of dotfiles. A dotfile is exactly what the name states: a file (or a directory) that begins with a . character.
Unless you take specific actions to display them (e.g., use the -a flag with the ls command), they will not be displayed by default.
Dot Directories
Directories can also start with . (dot directories). As with dotfiles, they are hidden by default. Otherwise, they act like and can be treated like standard directories.
Paths¶
As noted, a directory is a "branch" within a filesystem where files (or other subdirectories) can be located; i.e., a directory is a location in a filesystem.
To refer to any location (i.e., directory or file) within a filesystem, a path to the location of interest is used. There are two types of paths:
- Absolute paths: include the entire location of a directory or file starting from the root directory; absolute paths always start with
/ - Relative paths: include the location of a directory or file in relation (i.e., relative to) the user's current location in the filesystem (see below)
From the sample filesystem above, an example of an absolute path to a file is:
/home/first_user/Documents/example.txt
To reiterate: all absolute paths start with /. The same / character is also used in paths (both absolute and relative) to distinguish between depths or levels of the hierarchical tree.
An example of a relative path is:
Documents/example.txt
In contrast to absolute paths, relative paths never begin with /. They describe the path to a file or directory with reference to your current working directory, which is the current location of your session within the filesystem.
The current working directory in the Documents/example.txt example is (with reference to the sample filesystem diagram) the /home/first_user/ directory.
How do you know your current location? The command prompt tells you, or you can use the pwd command.
Further info:
Paths to files and directories are formatted identically, though some programmers prefer to write directory paths with a trailing
/character.In most cases it is equivalent to include the final
/character. However, some command line tools will interpret a path with a trailing/character differently (e.g., rsync).
Beginning on the CLI¶
Command Prompt¶
When you arrive at a CLI you see what is called the command prompt. It is designed to help communicate who and where you are on a system. It often looks something like this:
username@hostname:~$
The values username and hostname in this example are specifically chosen as these are two of the principal values that comprise the command prompt.
Piece by piece, the example command prompt includes:
- the
usernameis your current authenticated username on the computer - the
hostnameis the name of the computer to which your command line session is connected - the
@character connects theusernamewith thehostname - the
:separates theusername@hostnameinformation from the displayed location within the computer's filesystem - the
~is the special symbol used to denote thehomedirectory for the user; this is often the default location when starting a command line session on a machine- the
~symbol will change to show path locations as you navigate through a filesystem (e.g., withcd) - in other words, this part of the command prompt shows your current location within the filesystem
- the
- the
$denotes the end of the command prompt; your typed commands will come afterwards
Further info: The specifics of your command prompt may vary according to your operating system.
What is a "shell"?¶
When inputting commands into a command prompt, what exactly happens with/to/from those commands?
To answer this question, it's necessary to understand that a computer's operating system is the entirety of the software (sometimes called the "software stack") that makes the computer functional.
Within the operating system exists a variety of software types, including:
- a "kernel": the software that directly controls hardware processes (e.g., memory management, process scheduling, etc.); one of the most commonly encountered kernels is Linux
- system libraries and utilities: collections of code and programs that allow installed applications to interact with the hardware via the kernel; these include the command line programs mentioned below (e.g.,
ls,cp, etc.) - user space programs: the software that the user can customize then utilize for their tasks
The shell is one of these user space programs. It's the specific software that interprets your commands then executes them. There are a variety of shells used across operating systems:
- Bash: the default shell for many Linux distributions
- ScienceCloud and ScienceCluster users will use Bash
- Zsh: the default shell for MacOS, but can also be used in Linux
- PowerShell: the default shell for Windows
Encouragingly, these shells share a common command syntax, meaning the skills involved in using one shell language will translate to other shells (and operating systems).
Syntax¶
Command Structure¶
The basic structure of a shell command is as follows:
<command> [-optional_arguments] <required_arguments>
- the
<command>is the specific command you're using (e.g.,ls,cd) - the
[-optional_arguments]are inputted via flags; as they are optional they are therefore never strictly required - the
<required_arguments>are the specific inputs to the<command>you're using, often paths to files or directories
If the required arguments are omitted with commands, the command will either fail or use its default value. Accordingly, it's best to familiarize yourself—at least a little—with every command you run.
Flags¶
The [-optional_arguments] of a command are inputted via flags. Flags come in 2 types, short and long:
-
Short flags use single letters, a single hyphen
-, and can be combined; for example:ls -alhis the same asls -a -l -h(and any combination of the single letter flags).
-
Long flags use full words, double hyphens, and must be written individually; for example:
ls --all --human-readable -l(which is the same asls -ahl)
Special Symbols¶
There are a number of special characters in most shell languages (including Bash) that reduce how much you need to type. Here are a selection of them:
/: the symbol for the root directory of the filesystem and the delimiter between directories and subdirectories (i.e., depths of the filesystem tree)~: an abbreviation for the home directory (i.e., shorthand for the path to thehomedirectory for the user).: refers to the current working directory; can be used in the same way as a file path..: refers to the parent of the current working directory; can be used in the same way as a file path|: called the pipe character, it forwards the textual output from one command directly into another command as input; e.g., with thegrepfunction>: called the redirection operator, it "redirects" the textual output of a command to write to a new text file or overwrite the existing text file (or value)- ⚠️ use the
>character carefully as it will overwrite existing files/values by default!
- ⚠️ use the
>>: a variation on the redirection operator that appends textual output of a command to a text file (rather than overwriting)
Fundamentals¶
File Permissions¶
Before operating any commands on files in a filesystem, it's first helpful to understand permissions.
Permissions are the concept in an operating system and filesystem that allow multi-user functionality in a safe, secure, and accident-reduced way.
Without the appropriate permissions, you (as a user) may or may not be able to:
- read a file/directory
- write to (i.e., change) a file/directory
- execute (i.e., run) a file
File permissions are structured so that multiple users on the same machine can have a unified, accident-protected, and secure way to manage their files.
The easiest way to see file permissions (in your current working directory) is to run ls -l. The output should resemble the following (fabricated example):
drwxrwxrwx 2 user group 4096 Jan 01 00:00 Documents
-rwxr--r-- 1 second_user second_group 4096 Jan 01 00:00 example.txt
The first 10 characters of each line share a common format:
- the first character will be a
dfor directory or-for not a directory - the next 9 characters are separated into 3 sets of 3 characters; each set of characters is identical in format, defining read (
r), write (w), and execute (x) permissions for:user,group, andother- a value of
-means that specific permission is not assigned; a value ofr,w, orxindicates the specific permission is assigned
- the first column of numbers (
2and1) is usually a number indicating number of values (i.e., files and directories) underneath an entry - the next 2 columns denote the
userandgroupassignment for the file/directory- the assigned
userhas read, write, execute permissions defined via the first series of 3 characters - the assigned
grouphas read, write, execute permissions defined in the second series of 3 characters - users on the machine that are not named
userand are also not a member of an entry's assignedgrouphave permissions defined in the third series of 3 characters (i.e.,other)
- the assigned
This text-based diagram may be helpful:
1 2 3 4 5 6 7 8 9 10
| | | | | | | | | |
- r w x r - - r - -
^ ^ ^ ^ ^ ^ ^ ^ ^ ^
| | | | | | | | | +--- `other` execute (x)
| | | | | | | | +----- `other` write (-)
| | | | | | | +------- `other` read (r)
| | | | | | +--------- `group` execute (x)
| | | | | +----------- `group` write (-)
| | | | +------------- `group` read (r)
| | | +--------------- `user` execute (x)
| | +----------------- `user` write (-)
| +------------------- `user` read (r)
+--------------------- entry type:
d = directory
- = regular file
The principal commands to edit permissions and ownership values are chmod and chown.
The special command sudo can be prepended to any other commands to "elevate" the command so it's treated as having been run by the special root user. The root user is a default user written into the operating system that has complete control over all aspects of a filesystem.
sudo Access
Due to the security issues and accident-potential associated with sudo and root permissions, only specific systems from Science IT allow sudo access. Please plan your workflow accordingly:
- ScienceCloud VMs, launched and managed by a user, come equipped with root access by default (secured by default using SSH keys).
- ScienceCluster and the Alps System do not allow users
sudoandrootpermissions.
File Types¶
When operating on a command line it's helpful to categorize files into 2 types:
- Binary files: require a specific program/application to be used or read; e.g.,
.mp3,.pdf,.doc - Text files: as the name states, they contain purely alphanumeric text and can be edited interactively
To confirm a file's type, use the file command.
To open a binary file you execute it using the command corresponding to its required program; for example:
libreoffice example\_libreoffice\_file.odt
Editing Text Files¶
There are several ways to edit text directly from the command line. Some popular full-terminal text editors include:
nano: the default editor on most GNU/Linux systems; beginner-friendly and easy to usepico: similar to nano but more lightweightvi: a powerful and efficient UNIX editor; has a steeper learning curve that may be challenging for beginners
For beginners, it's helpful to know how to start and stop nano:
- To start
nano, simply execute the commandnanoand your terminal application will move to thenanointerface creating a blank document- To edit a specific text file with
nano, runnano <path_to_text_file>
- To edit a specific text file with
- You can freely type with your cursor in this interface as well as paste text copied from your local computer
- When you are finished editing you can exit:
- Press
control + Xto initiate the exit procedure - When asked
Save modified buffer?typeyto confirm that you want to save the changes (ornto cancel without saving) - When prompted for the
File name: ...either update the file name or pressenterto confirm the inputted file name
- Press
Commands¶
While there are innumerable commands on any command line, here are commands (with useful flags as noted) to consider for research computing:
The -h / --help flag
For many, but not all, commands the -h/--help flag is conventionally used to display the help dialogue for a command.
Metadata¶
man: opens the manual for a command; i.e., it's used on other commands; e.g.,man manls: lists the content of a directory; common flags:-a,-l,-hlsblk: lists the storage devices on the systemdf: displays usage of the storage devices; common flags:-hps: displays process statistics; common flags:aux,-eftopandhtop: used for monitoring and benchmarking
Viewing Files¶
file: confirm a file typecat: print the content of a text fileecho: prints a character string of interestecho $USER: prints your username, where $USER is an environment variable storing the name of the currently logged-in userless: open a text file to read it in your terminal; typeqto exittailandhead: print the end/beginning of a file, respectively; common flags:-n <number>grep: stands for "global regular expression parse"- used to find specific character strings within text,
grep <pattern> <filename> - often being fed data via the
|operator:ps aux | grep ssh, to list all processes containing "ssh"
Filesystem Navigation¶
pwd: prints the current working directorycd: changes the current directory to a directory of your choosingcd -: brings you to the previous directory you were incd ..: moves you one directory level up from your current locationcd ~: will always bring you$HOME
Moving and Copying¶
cp: copy files and directories- usage:
cp [options] <source> <destination> - common flags:
-rfor recursive (i.e., apply to a directory and its contents)
- usage:
mv: moves files from one location to another; also used for renaming- usage:
mv [options] <source> <destination> mvis always recursive!mv -i: prompt before overwrite
- usage:
mkdirandrmdir: make and remove an empty directory, respectivelyrm: remove files and directoriesrm -i: prompt before each removal, giving you a chance to confirm- common flags:
-rfor recursive (i.e., apply to a directory and its contents) - ⚠️ the
rmcommand does not move files to a trash bin or temporary location, it immediately removes them; use with caution
- common flags:
File Transfer¶
See our documentation on scp and rsync.
To make a "clone" of a remote git repository, you can use:
git clone git@gitlab.uzh.ch:project/folder.git
Permissions¶
chmod: change the permissions of files and directories- usage:
chmod [options] <file> - common flags:
[ugo]±r,[ugo]±w,[ugo]±x
- usage:
chown: change ownership of files and directories- usage:
chown <user>:<group> <file>
- usage:
sudo: prepended to a command to execute it as the superuser (i.e., "superuser do")- requires the current user to have
sudo/rootpermissions - requires authentication
- requires the current user to have
Connecting to Remote Computers¶
ssh: stands for "secure shell" and is the principal tool used to establish secure connections to remote machines
See our documentation on ssh, ssh key generation, and more.
Context¶
Open Source Operating Systems¶
By working with virtual machines on the command line for scientific research, you by default will be exposed to an entire open-source operating system. Very often it will be a distribution of Linux called Ubuntu, but there are many variants (e.g., Debian, Fedora, Arch)
'Distributions'
A "distribution" of Linux means a version of an operating based on the Linux kernel. All Linux distributions share the same kernel but differ in other parts of the software stack. At Science IT, the recommended and default version is Ubuntu. It is widely considered one of the most user friendly distributions, especially for beginners.
Open-source operating systems (and communities) like these form the basis of large scale scientific (and non-scientific) computing.
As a researcher via the command line you can, for example, install software to customize your runtime environment then share your software stack setup with other researchers so they can replicate your work on their own computing hardware.
Moreover, by using open-source operating systems and software, researchers support the UZH's commitment to Open Science.
Scripting¶
Here's an example of a for-loop in Bash, which squares the integers between 1 and 10.
for i in $(seq 1 10); do echo $((i*i)); done
Once you can run commands one at a time, the next step is to write shell scripts—small files that tell the computer to run those commands automatically.
First, create the file you want to run, in this case, called squares.sh.
echo 'for i in $(seq 1 10); do echo $((i*i)); done' > squares.sh
Then, run it:
bash squares.sh
Shell scripts form the basis for extending your control of a computer so the machine acts according to your instructions without requiring your presence. In other words, they let you automate computers to run workflows for you.
Of particular note, shell scripts (Bash) are how users submit jobs in cluster environments.