How do I choose the memory allocation for my job?

Memory resource allocation is a balancing act between two competing forces. The first force wants you to give your job as much memory as possible, because if it runs out of memory, it dies. The second force wants you to give your job as little memory as possible, because that makes it easier to schedule on a node with other jobs using memory. What you want to do is balance these forces so that your job has just as much memory as it needs, but no more.

Here is how you can achieve this balance:

  1. Run your job with a lot of memory.
  2. See how much memory was used.
  3. Run the rest of your jobs with just a little more memory than you found in Step 2.

Below we will go through these steps.

Run with a lot of memory.

Start by running some of your jobs with a lot of memory. What is “a lot of memory,” you ask? The documentation for the software package you are running might have some good suggestions for how much memory to run your jobs with. But, in the absence of that, try running a job with 32GB of memory (per core: if you have multiple CPUs, then multiply that number by 32GB to get your memory size). If your jobs dies from insufficient memory, then double that memory allotment; continue doubling the memory until the job runs.

If you can, try running a few jobs on different input datasets to account for any memory usage variability which comes from the inputs.

See how much memory was used.

Use the seff tool to see how much of the memory allotted to your job was used by your software. When you run this command with a job ID, it will show you the job resource statistics from that job:

seff [your-job-number]

Job ID: [your-job-number]
Cluster: scg
User/Group: bettingr/upg_bettingr
State: COMPLETED (exit code 0)
Nodes: 1
CPU Utilized: 08:04:12
CPU Efficiency: 98.22% of 08:13:00 core-walltime
Memory Utilized: 14.44 GB (estimated maximum)
Memory Efficiency: 45.13% of 32.00 GB (1.00 GB/core)

Look for the Memory Utilized and Memory Efficiency lines in the output. The first of these will tell you now much memory was used, and the second will tell you what percentage of what you gave it was used by your program. A good metric for the Memory Efficiency is to use about 85-95% of the memory given.

Run future jobs with a little more memory than was used.

Use the figures from the job runs from the step above to come up with a memory size figure for your future jobs. If you have run multiple jobs in this calibration experiment, use the largest of them for your amount. Once you have come up with a figure, give it about a 5-10% pad in case a future run dares to take more.

If the example above was from your job, you would see that the program used less than half of the 32GB allotment of data that it was given. Future runs could probably be done successfully with only 16GB of memory.

Once you have chosen a memory allotment and have started running the bulk of your jobs, you can continue to check the Memory Efficiency of some of them to see if their memory usages are running as expected.