ScienceCluster Training¶
Science IT Reference¶
1. Science IT Overview¶
- We provide infrastructure:
- ScienceCloud
- ScienceCluster
- Eiger (CSCS)
- We provide support, training, and consulting:
- Application support
- Training to use infrastructure
- Consulting examples:
- Specialized advice for Science IT hardware;
- Assistance with workflows or scripts;
- Scaling up compute work from laptop to cluster;
- Code optimization including porting to GPU or enabling parallelization;
2. Science IT Resources¶
We manage multiple resource services via an on-premises data center. From this data center, we serve the ScienceCloud and ScienceCluster (as well as Science Apps).
2.1. ScienceCloud versus ScienceCluster¶
The ScienceCloud and ScienceCluster are different infrastructure services that have their own advantages and disadvantages. You can find more information on which service is right for you here
- ScienceCloud
- Offers Virtual Machines that are controlled by the user
- Great for interactive use
- The user has root (sudo) privileges and can customize system software
- Multiple operating systems are available: Ubuntu Linux or another Linux distribution
- ScienceCluster
- A shared cluster environment with compute resources managed by SLURM
- Great for large batches of jobs (up to thousands jobs submitted at a time)
- A user can install software only in their user directories, and commonly-used software is maintained by Science IT
- Only one operating System is available: Ubuntu Linux
2.2 Hardware Services Summary¶
- ScienceCloud - 20000+ vcpus (1vcpu = 1core)
- 1-32 vcpus, 4-256GB ram
- GPU VMs, Nvidia Tesla T4 (1GPU + 8-32 vcpus)
- ScienceCluster (SLURM)
- CPU nodes:
- high-mem nodes – nodes of 4TB RAM , 256 vcpus
- “parallel” / MPI / Infiniband nodes - 48 vcpus, 380GB RAM (for multi-node parallel jobs)
- generic / "standard" CPU nodes - nodes of 2vcpu+8GB RAM; nodes of 8vcpus+32GB RAM; nodes of 32vcpus+128GB RAM
- GPU nodes:
- Nvidia Tesla T4’s (16GB RAM)
- Nvidia Tesla V100 (16 or 32GB GPU RAM)
- Nvidia Tesla A100’s (80GB RAM)
- CPU nodes:
Science IT also manages 2 related infrastructure services:
- ScienceApps – web-based interactive application portal (with Jupyter, Rstudio, Matlab, Spark) to the ScienceCluster.
- "Supercomputer" – Eiger partition of the Alps supercomputer (CSCS, Lugano)
- UZH pay-per-use access to Eiger, managed by Science IT (nodes with 128 cores; 2 x AMD Epyc Rome).
- minimum job size is one full node, UZH share is ~200 nodes
- CSCS supercomputer time (GPU nodes of Piz Daint) also available via scientific research proposals (cscs.ch)
3. Connecting to the ScienceCluster¶
3.1 How to log in¶
Connecting to the ScienceCluster requires the ssh
tool, which is available on Linux, MacOS, and Windows 10+ command line terminals. From one of these terminals, run the following command using your specific UZH shortname:
ssh shortname@cluster.s3it.uzh.ch
Upon a successful login, you will arrive at one of the 3 login nodes. The login nodes are where you manage your code and data on the cluster as well as submit your jobs to be run using the compute resources (i.e., the powerful computers on the cluster). Do not execute or run your code or scripts directly on login nodes. Doing so will compromise the integrity of the system and harm your own and other users' experience on the cluster.
Note
Windows users may benefit from using a full Ubuntu subsystem on their local machine. Consider using the Windows Subsystem for Linux.
3.2 ScienceCluster Storage¶
Details on the ScienceCluster filesystem can be found here.
Practice the file transfer commands by transferring this training.tar
file from your local machine to your scratch filesystem (~/scratch
or /scratch/$USER
) on ScienceCluster:
scp local_path_to_file/training.tar shortname@cluster.s3it.uzh.ch:scratch
Once you have placed the training.tar
file in a directory within your scratch
folder, open/expand the contents of the file by running the following command from the same directory:
tar xvf training.tar
tar tvf training.tar
to view the .tar
contents without expanding the files. 4. Submitting Jobs using SLURM¶
4.1 What is SLURM¶
What is SLURM? It's the Simple Linux Utility for Resource Management, which means its the tool that you use to submit your jobs to the cluster in a fair and organized way. Without a workload management system like SLURM, clusters could not be shared fairly or efficiently.
Fun fact: the acronym is derived from a popular cartoon series by Matt Groening titled Futurama. Slurm is the name of the most popular soda in the galaxy.
4.2 How does SLURM work?¶
SLURM is what manages the jobs that are submitted to the cluster. As a user you will need to use SLURM to submit jobs that run your code on your data. Nothing computationally intensive should be run on the login nodes; these notes are for text editing, file/code/data management as well as basic transfers, and job submissions.
The SLURM server sits between the login nodes and the powerful computing resources on the cluster acting as a "Workload Manager Server".
-
How does SLURM work? When a user submits a job:
- If resources are available, job starts immediately
- If resources are in-use, the job waits in queue (status = pending)
-
Therefore SLURM does the following:
- Allocates the requested resources for each job
- Allows finite resources to be shared users
- Attempts to share resources fairly
To see all jobs currently running on the cluster use squeue
. To see the current state of all nodes use sinfo
.
4.3 How to submit a compute job¶
To submit a job using SLURM you need to use a submission script that contains the commands that you want to run. If you are running a script in R, Python, or another data analysis language you will use the submission script to execute your code.
A sample submission script titled hello.sh
can be found within the training.tar
file. The script is as follows:
#!/bin/bash
### Comment lines start with ## or #+space
### Slurm option lines start with #SBATCH
### Here are the SBATCH parameters that you should always consider:
#SBATCH --time=0-00:05:00 ## days-hours:minutes:seconds
#SBATCH --mem 3000M ## 3GB ram (hardware ratio is < 4GB/core)
#SBATCH --ntasks=1 ## Not strictly necessary because default is 1
#SBATCH --cpus-per-task=1 ## Use greater than 1 for parallelized jobs
### Here are other SBATCH parameters that you may benefit from using, currently commented out:
###SBATCH --job-name=hello1 ## job name
###SBATCH --output=job.out ## standard out file
echo 'hello starting.'
hostname ## Prints the system hostname
date ## Prints the system date
echo 'finished'
You'll note that the script has three basic parts:
- a shebang line (
#!/bin/bash
) that tells the system it's a Bash script - the
SBATCH
parameter section, which is where you request your resources of interest - the area where you execute your commands of interest (i.e., where you will execute your code)
Every submission script must contain all three parts. Every time you submit a job, you should update the SBATCH
parameters to request the resources you need and you should update the code that's being executed.
You submit submission scripts using the sbatch
command. I.e., use the following command to run the hello.sh
script:
sbatch hello.sh
Submitted batch job <number>
where <number>
is the job's assigned ID. The hello.sh
script only uses standard Bash commands (e.g., hostname
) to print metadata on the server in the cluster where the job ran.
More details on job submissions can be found here.
Warning
Consider using an array if you need to submit a large set of jobs (e.g., 500+), both for the integrity of the system as well as your own efficiency. If each of your individual jobs are less than a few minutes, especially I/O heavy jobs, consider putting them into a single job (sequentially within a script).
5. Customizing job submissions for your code¶
5.1 Modules and the Runtime Environment¶
With the hello.sh
script you can practice submitting a simple script that uses only Bash commands and therefore needs no customization.
However, nearly all users will need to customize the software environment that their job uses (called the "runtime environment"); i.e., you will need to create your custom Python or R environment so your job can access the modules/libraries that you need to run your code and process your data.
The module system allows users this possibility. The "modules" on the ScienceCluster are simply existing installations of the most common software used by researchers. To see what software is available, run module av
. Software listed with the (D)
is the default version loaded when not specifying a version number specifically.
In addition to providing you with the common software for running your research code, the module system is also used when requesting advanced hardware resources like GPU's. Specialized hardware requires specialized software that needs to be available at runtime. When requesting such hardware, make sure to load the accompanied software module that supports it. See this GPU job submission section for a specific example.
To load a module, use module load <name>
. For example, to load the default version of Anaconda use module load anaconda3
. You can specify a version using the versions listed via module av
with the following syntax: module load anaconda3/2024.02-1
.
To list your currently loaded modules, use module list
. You can unload specific modules using module unload ...
. You can also clear your modules using module purge
.
For the more the technically curious users: loading a module changes/sets environment variables during your session that will give you access to your requested software. In other words, it does something similar to the following Bash commands:
export PATH=directory_of_application_executable:$(PATH)
export LD_LIBRARY_PATH=directory_of_some_library_to_be_used:$(LD_LIBRARY_PATH)
5.2 Virtual Environments¶
The most common use of the cluster's module system is to create and use virtual environments.
What is a virtual environment and why/how are they used/useful?
- A tool that allows users to install their desired software in a contained and manageable way (i.e., in your user-space on ScienceCluster);
- Often used for data analysis languages like Python or R;
- Specific package versions and dependencies can be installed inside a virtual environment, which can be catalogued for greater reproducibility;
- Each virtual environment is completely independent of others, which allows specific versions of packages and/or multiple versions of packages to be handled across different environments
On the ScienceCluster, the default virtual environment tools for new users are Conda and Mamba. A specific guide on using Conda/Mamba can be found here.
Note
For R users, Conda/Mamba is the recommended way for creating your R environment.
For complete control over your software environment, including operating system libraries, use Singularity; a tutorial can be found here, with references found here.
5.3 Creating a Virtual Environment¶
To create your first virtual environment with Conda/Mamba, follow the directions here.
5.4 Integrating the Virtual Environment into your Submission script¶
If you would like to use a customized environment in your submission script, simply load the environment before running the command that runs your code.
For example, to run the getpi.py
script (within the training.tar
file) from an sbatch
submission using the myenv
environment created here, you would change your hello.sh
script to the following:
#!/bin/bash
### Here are the SBATCH parameters that you should always consider:
#SBATCH --time=0-00:05:00 ## days-hours:minutes:seconds
#SBATCH --mem 3000M ## 3GB ram (hardware ratio is < 4GB/core)
#SBATCH --ntasks=1 ## Not strictly necessary because default is 1
#SBATCH --cpus-per-task=1 ## Use greater than 1 for parallelized jobs
module load anaconda3
source activate myenv
python3 ./getpi.py 100
Submitting the script with sbatch
would run the getpy.py
script for 100
iterations using the myenv
environment with Python version selected during the environment creation.
For reference, the getpi.py
script estimates the value of Pi using a Gregory-Leibniz series. You can change 100
to another (higher) positive integer value to achieve a better approximation of Pi.
6. Advanced Topics¶
6.1 GPUs¶
GPUs can be requested by adding the following parameter to your Slurm submission script:
#SBATCH --gpus=1
You can also request a specific GPU type. For example:
#SBATCH --gpus=T4:1
for a T4#SBATCH --gpus=V100:1
for a V100- When requesting a specific amount of memory for V100 GPUs add the corresponding constraint:
#SBATCH --constraint=GPUMEM32GB
#SBATCH --gpus=A100:1
for an A100
Instead of specifying GPU type in your #SBATCH
parameters, you could load one of the GPU flavor modules before submitting your script. If you load a GPU flavor module, you only need to specify #SBATCH --gpus=1
in your script.
The following GPU flavor modules are available:
module load gpu
module load multigpu
module load t4
module load v100
module load v100-32g
module load a100
You would need to load one of those modules in order to get access to the GPU specific modules such as cuda
, cudnn
, and nccl
. Most common GPU applications (TensorFlow, PyTorch) use the CUDA GPU runtime library, which will need to be loaded either via module load cuda
(optionally specifying a version) or loaded/installed via your Conda or Mamba (or Singularity) environment. Please note that CUDA applications are evolving quickly, so there sometimes may be incompatibilities
Note
Flavor modules are optional. If you do not need any of the GPU specific modules and you only need a single GPU of any type, you could simply include #SBATCH --gpus=1
in your submission script.
Warning
It is recommended to load GPU flavour modules outside of batch scripts. They set constraints that may interfere with resource allocation for job steps. In other words, run module load ...
before submitting your job via sbatch
.
Additional notes on GPU job submissions can be found here, and a full Python TensorFlow example can be found here.
6.2 Benchmarking¶
The process of running your code to determine the resource requirements you need to make it run effectively is called benchmarking. You should spend at least a small amount of time benchmarking your code before you scale it across your entire dataset, otherwise your resource requests are made with incomplete information and you may be requesting an inappropriate amount of resources.
Please keep in mind the following notes when selecting your resource requests for each job submission:
- Requesting more than 1 core when you haven't specifically integrated multi-CPU tools in your code will not make your code run faster.
- Requesting more memory than required will also not make your code run faster.
- Only request the CPU and memory resources that you know your code requires.
- Request an amount of time based on your best estimates of how long the code will need plus a small buffer.
- When possible, implement checkpointing in your code so you can easily restart the job and continue where your code stopped.
Science IT teaches a semester course called Scientific Workflows wherein we teach the basics off monitoring and benchmarking. We'll notify users via our Newsletter every time this course is offered, so make sure to read your monthly newsletters from us.
In the meantime, feel free to refer to the Scientific Workflow Course GitLab Repo.