Can I use more than one CPU for my job?¶
Not every software package can make use of multiple CPUs (a trait called multithreading), so it is important to determine whether the package you are using can do so before allocating more CPUs to it, otherwise, you will be just wasting CPU cycles and money.
Two ways to determine whether your software package can make use of multiple CPUs
- Find it in a list of known multithreaded software
- Run it with multiple CPUs and analyze the resulting job statistics
Known multithreaded software packages¶
Below is a list of some commonly-used software packages which are known to be multithreaded:
bwa mem |
samtools |
STAR |
CIBERSORTx |
BLAST+ |
HISAT2 |
Trinity |
SPAdes |
GATK |
Picard |
Kraken2 |
Cufflinks |
Salmon |
Kallisto |
Clustal |
RSEM |
SRA Toolkit |
PLINK |
Trimmomatic |
Prokka |
VCFtools |
BCFtools |
Bowtie2 |
FastQC |
Omega |
You can also look at the documentation for your software package and look for command-line parameters that include the words thread
or cpu
in them.
Run with multiple CPUs and analyze the job statistics¶
To set up the multiple CPU run, add the --cpus-per-task=2
parameter to either your sbatch
command line or to an #SBATCH
line in your script, and run your job.
sbatch [other-slurm-arguments-if-any] --cpus-per-task=2 [your-script]
The number of “CPUs per task” you give it as above is available within your jobs as the environment variable ${SLURM_CPUS_PER_TASK}
, so you can pass it along to any arguments that your software has which control the number of CPUs/threads it can use:
mycommand -arg1 -arg2 --threads=${SLURM_CPUS_PER_TASK} ...
When that job is complete, use the seff
(Slurm efficiency) tool to analyze the job statistics by running it on the job number given to your job.
seff [your-job-number]
Job ID: [your-job-number]
Cluster: scg
User/Group: bettingr/upg_bettingr
State: COMPLETED (exit code 0)
Nodes: 2
CPU Utilized: 08:04:12
CPU Efficiency: 98.22% of 08:13:00 core-walltime
Memory Utilized: 5.42 GB (estimated maximum)
Memory Efficiency: 45.13% of 12.00 GB (1.00 GB/core)
If the CPU Efficiency is close to 100% (like it is in the example above), then your software is using the extra CPU that you gave it and is ready for multiple CPUs to be assigned to future jobs. If that number is nearer to 50%, then the package ignored your extra CPU and just ran on a single one – you should not bother adding any more CPUs to these jobs.
If you have determined that your job can make use of extra CPUs, you can experiment to see what is the best number of CPUs to use. Ideally, a job with 2 CPUs will run twice as fast as one with 1, and a job with 4 CPUs will run 4 times as fast, but this linear progression rarely plays out in practice. Try different numbers to see where this linearity breaks down – that point is your optimal number of CPUs.
One final reminder: the more CPUs you request, the harder it will be for Slurm to schedule your job. If you request more than 24, the number of nodes which can support that request goes down considerably. You might choose to run your jobs with fewer than your optimal number of CPUs to make it easier to find the resources for your job requests.