# Running Jobs (Slurm)

# Queueing System

<span style="font-family:arial, helvetica, sans-serif;">Greenplanet uses the [Slurm](http://slurm.schedmd.com/slurm.html) queue and job scheduler. We are currently running version 20.02.5, which has reasonably easy-to-follow [documentation](https://slurm.schedmd.com/archive/slurm-20.02.5/). There is also a [Cheat Sheet](http://www.schedmd.com/slurmdocs/pdfs/summary.pdf) of command options. A nice introduction is available from the [Slurm User Group 2017 conference](https://slurm.schedmd.com/SLUG17/SlurmOverview.pdf).</span>

<span style="font-family:arial, helvetica, sans-serif;">If you are used to different system (PBS, LoadLeveler, SGE, etc.), see [this PDF](http://slurm.schedmd.com/rosetta.pdf) that shows corresponding command line tools and job script options. Wrapper scripts for PBS commands (qstat, qsub, etc.) are also installed, which will let you use job submission scripts from an ancient time when we ran Torque/Maui.</span>

You can use the <span style="font-family:'courier new', courier, monospace;">**sview**</span> command to open the Slurm GUIFor a list of partitions, look at [this page](https://knowledge.ps.uci.edu/books/running-jobs-slurm/page/partitions).

<span style="font-family:arial, helvetica, sans-serif;">If you need to run an interactive shell as a job, for instance to run matlab or test other code interactively, you should use this command (leave off --x11 if X11 graphics are not required):</span>

```bash
srun --pty --x11 -t 300 -n 1 -p ilg2.3 bash -i
```

<div id="bkmrk-%28-t-is-walltime%2C-adj"><span style="font-family:arial, helvetica, sans-serif;line-height:1.6;">(-t is walltime, adjust as required. -p is partition which is the queue name or a comma seperated list of partitions. If you need to run something on a different partition you can replace **<span style="font-family:'courier new', courier, monospace;">ilg2.3</span>** with the desired [partition](https://knowledge.ps.uci.edu/books/running-jobs-slurm/page/partitions).</span></div><div id="bkmrk-%C2%A0"> </div><div id="bkmrk-for-non-interactive-"><span style="font-family:arial, helvetica, sans-serif;">For non-interactive jobs the normal way to run a job is with the **<span style="font-family:'courier new', courier, monospace;">sbatch</span>** command supplied by the path to a slurm job script.</span></div>

# Partitions

Some partitions are limited to certain groups (see Notes column)

<div class="content clearfix" id="bkmrk-greenplanet-partitio"><div class="content clearfix"><div class="field field-name-body field-type-text-with-summary field-label-hidden"><div class="field-items"><div class="field-item even"><table style="width:1200px;"><caption>**Greenplanet Partitions (gplogin2/gplogin3)  
\[Partitions in <span style="color:rgb(26,82,120);">blue</span> are best for general use\] \[Partitions in <span style="color:rgb(186,55,42);">red</span> are only on the side being upgraded to Rocky 8/Alma 9 (gplogin1)\]**

</caption><thead><tr><th scope="col">Partition</th><th scope="col">Nodes</th><th scope="col">CPU Arch</th><th scope="col">GFLOPS/core</th><th scope="col">Cores/node</th><th scope="col">RAM</th><th scope="col">Disk</th><th scope="col">Notes</th></tr></thead><tbody><tr><td><span style="color:rgb(186,55,42);">**atlas**</span></td><td>**8**</td><td>**Intel (Westmere) 2.4/2.9 GHz**</td><td>**~11**</td><td>**8-12**</td><td>**24G**</td><td>**0.2-7T**</td><td>**ATLAS (Taffard, Whiteson)**</td></tr><tr><td><span style="color:rgb(26,82,120);">**brd2.4**</span></td><td><span style="color:rgb(26,82,120);">**6**</span></td><td><div><span style="color:rgb(26,82,120);">**Intel (Broadwell) 2.4 GHz**</span></div></td><td> </td><td><span style="color:rgb(26,82,120);">**28 (4 nodes)**</span>  
<span style="color:rgb(26,82,120);">**20 (2 nodes)**</span></td><td><span style="color:rgb(26,82,120);">**128G (28C)**</span>  
<span style="color:rgb(26,82,120);">**256G (20C)**</span></td><td><span style="color:rgb(26,82,120);">**372G (28C)**</span>  
<span style="color:rgb(26,82,120);">**1.1T (20C)**</span></td><td><span style="color:rgb(26,82,120);">**c-14-10,c-14-12 (28C)**</span>  
<span style="color:rgb(26,82,120);">**c-19-\[411-414\] (20C)**</span></td></tr><tr><td>cas2.5</td><td>4</td><td>Intel (Cascade Lake) 2.5 GHz</td><td> </td><td>40</td><td>192G</td><td>400G SSD</td><td>Wodarz</td></tr><tr><td>cas3.6</td><td>1</td><td>Intel (Cascade Lake) 3.6 GHz</td><td> </td><td>16</td><td>1.5T</td><td>1.7T SSD</td><td>Primeau</td></tr><tr><td><span style="color:rgb(26,82,120);">**has2.5**</span></td><td><span style="color:rgb(26,82,120);">**4**</span></td><td><span style="color:rgb(26,82,120);">**Intel (Haswell) 2.5 GHz** </span></td><td> </td><td><span style="color:rgb(26,82,120);">**24**</span></td><td><span style="color:rgb(26,82,120);">**128G**</span></td><td><span style="color:rgb(26,82,120);">**372G**</span></td><td> </td></tr><tr><td><span style="color:rgb(26,82,120);">**ilg2.3**</span></td><td><span style="color:rgb(26,82,120);">**46**</span></td><td><span style="color:rgb(26,82,120);">**AMD (Interlagos) 6276 2.3 GHz**</span></td><td><span style="color:rgb(26,82,120);">**18.4**</span></td><td><span style="color:rgb(26,82,120);">**32\***</span></td><td><span style="color:rgb(26,82,120);">**128G**</span></td><td><span style="color:rgb(26,82,120);">**900G**</span></td><td><span style="color:rgb(26,82,120);">**\*32 FP cores, 64 Integer cores**</span></td></tr><tr><td>m-c1.9</td><td>10</td><td>AMD (Magny-Cours) 6168 1.9 GHz<span style="white-space:pre;"> </span></td><td>7.6</td><td>48</td><td>64G</td><td>900G</td><td> </td></tr><tr><td>m-c2.2</td><td>2</td><td>AMD (Magny-Cours) 6174 2.2 GHz</td><td>8.8</td><td>48</td><td>64-128G</td><td>1.4T</td><td> </td></tr><tr><td><span style="color:rgb(26,82,120);">**nes2.8**</span></td><td><span style="color:rgb(26,82,120);">**214**</span></td><td><span style="color:rgb(26,82,120);">**Intel (Nehalem/Westmere) 2.7/2.8 GHz**</span></td><td><span style="color:rgb(26,82,120);">**~11**</span></td><td><span style="color:rgb(26,82,120);">**8-12**</span></td><td><span style="color:rgb(26,82,120);">**12-48G**</span></td><td><span style="color:rgb(26,82,120);">**100-500G**</span></td><td> </td></tr><tr><td><span style="color:rgb(26,82,120);">**sib2.9**</span></td><td><span style="color:rgb(26,82,120);">**45**</span></td><td><span style="color:rgb(26,82,120);">**Intel (Sandy-/Ivy-Bridge) 2.8/2.9 GHz**</span></td><td><span style="color:rgb(26,82,120);">**~23**</span></td><td><span style="color:rgb(26,82,120);">**16-20**</span></td><td><span style="color:rgb(26,82,120);">**32-128G**</span></td><td><span style="color:rgb(26,82,120);">**900G+**</span></td><td> </td></tr><tr><td>sky2.4</td><td>13</td><td>Intel (Skylake) 2.4 GHz</td><td> </td><td>40</td><td>96G</td><td>220-740G</td><td>Moore &amp; Randerson nodes</td></tr><tr><td>sky3.0</td><td>7</td><td>Intel (Skylake) 3.0 GHz</td><td> </td><td>24</td><td>384G</td><td>1.6T SSD</td><td>Lowengrub</td></tr><tr><td>wodarz</td><td>4</td><td>Intel (Broadwell) 2.4 GHz</td><td> </td><td>28</td><td>128G</td><td> </td><td>Wodarz</td></tr><tr><td><span style="color:rgb(186,55,42);">**gpu**</span></td><td><span style="color:rgb(186,55,42);">**2**</span></td><td><span style="color:rgb(186,55,42);">**Intel (Haswell) 2.4 GHz + 8 Nvidia TitanX GPUs**</span></td><td> </td><td> </td><td> </td><td> </td><td><span style="color:rgb(186,55,42);">**Mobley, Poulos**</span></td></tr><tr><td><span style="color:rgb(186,55,42);">**knl1.3**</span></td><td><span style="color:rgb(186,55,42);">**2**</span></td><td><span style="color:rgb(186,55,42);">**Intel (Knights Landing)**</span></td><td> </td><td><span style="color:rgb(186,55,42);">**64**</span></td><td><span style="color:rgb(186,55,42);">**48G**</span></td><td> </td><td><span style="color:rgb(186,55,42);">**4 threads/core, no Infiniband**</span></td></tr></tbody></table>

</div></div></div></div></div>There is also a special partition called "scavenge" that is intended to make use of otherwise idle nodes. Scavenge contains all the free-access nodes, but jobs may get preempted (killed) by high priority jobs. It must be used with the qos "scavenger".

# Batch Matlab

<p id="bkmrk-this-is-example-is-b">This is example is based on using gplogin2, the login node for the new side of&nbsp;cluster.</p>
<p id="bkmrk-we-use-lmod-on-gplog">We use lmod on gplogin2/3 as opposed to environment-modules on gplogin1.</p>
<p id="bkmrk-all-examples-below-a">All examples below are created in your home directory via copy-and-paste into your GP shell account:</p>
<p id="bkmrk-consider-the-followi">Consider the following example, matlab_example.m</p>
<pre id="bkmrk-cat-%3C%3C-%27eof%27-%3E-%7E%2Fmat"><code class="language-bash">cat &lt;&lt; 'EOF' &gt; ~/matlab_example.m
[X,Y] = meshgrid(-2:.2:2);
Z = X .* exp(-X.^2 - Y.^2);
surf(X,Y,Z);
print('example-plot','-dpng');
exit;
EOF</code></pre>
<p id="bkmrk-running-the-above-ma" class="callout danger">Running the above matlab example <strong>*WITHOUT*</strong> Slurm: (this is how many people run on the login node which is <strong>BAD</strong>!)</p>
<pre id="bkmrk-cat-%3C%3C-%27eof%27-%3E-%7E%2Frun"><code class="language-bash">cat &lt;&lt; 'EOF' &gt; ~/run.sh
#!/bin/bash

ml purge
ml matlab/R2017b

matlab -nodisplay -nodesktop -nosplash &lt; matlab_example.m
EOF

chmod 755 ~/run.sh &amp;&amp;  ~/run.sh
</code></pre>
<p id="bkmrk-running-the-above-ma-1" class="callout success">running the above matlab example <strong>*WITH*</strong> Slurm:</p>
<pre id="bkmrk-cat-%3C%3C-%27eof%27-%3E-%7E%2Frun-1"><code class="language-bash">cat &lt;&lt; 'EOF' &gt; ~/runv2.sh
#!/bin/bash

#SBATCH --job-name=my_matlab_job
#SBATCH --output=my_matlab_job.out
#SBATCH --error=my_matlab_job.err
#SBATCH --partition=brd2.4,has2.5,ilg2.3,m-c1.9,m-c2.2,nes2.8,sib2.9
#SBATCH --time=00:01:00
#SBATCH --nodes=1
#SBATCH --ntasks=16

ml purge
ml matlab/R2017b

matlab -nodisplay -nodesktop -nosplash &lt; matlab_example.m
EOF

chmod 755 ~/runv2.sh</code></pre>
<div class="content clearfix" id="bkmrk-to-submit-the-job">
<div class="content clearfix">
<div class="content clearfix">
<div class="field field-name-body field-type-text-with-summary field-label-hidden">
<div class="field-items">
<div class="field-item even">
<div style="font-family: monospace, monospace; font-size: small;"><span style="font-family: arial, helvetica, sans-serif;">To submit the job</span></div>
<div style="font-family: monospace, monospace; font-size: small;"><br></div>
</div>
</div>
</div>
</div>
</div>
</div>
<pre id="bkmrk-sbatch-runv2.sh"><code class="language-bash">sbatch runv2.sh</code></pre>
<div class="content clearfix" id="bkmrk-this-will-create-a-f">
<div class="content clearfix">
<div class="content clearfix">
<div class="field field-name-body field-type-text-with-summary field-label-hidden">
<div class="field-items">
<div class="field-item even">
<div style="font-family: monospace, monospace; font-size: small;">
<div style="font-family: monospace, monospace; font-size: small;"><span style="font-family: arial, helvetica, sans-serif;">This will create a file in your home directory</span></div>
<div style="font-family: monospace, monospace; font-size: small;"><br></div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
<pre id="bkmrk-example-plot.png"><code class="language-bash">example-plot.png</code></pre>
<p id="bkmrk-the-hardest-part-is-"><span style="font-family: arial, helvetica, sans-serif;">The hardest part is determining how much resources your computation/simulation will need.</span></p>
<p id="bkmrk-one-has-to-pick-an-p"><span style="font-family: arial, helvetica, sans-serif;">One has to pick an partition based on the computation. Usually people will want Intel CPUs, but we have AMD CPUs as an option.</span></p>
<div class="content clearfix" id="bkmrk-partitions-for-gp1-a">
<div class="content clearfix">
<div class="field field-name-body field-type-text-with-summary field-label-hidden">
<div class="field-items">
<div class="field-item even">
<div style="font-family: monospace, monospace; font-size: small;">
<div style="font-family: monospace, monospace; font-size: small;">
<ul class="menu clearfix">
<li class="first leaf menu-mlid-578"><span style="font-family: arial, helvetica, sans-serif;"><a href="https://knowledge.ps.uci.edu/books/running-jobs-slurm/page/partitions">Partitions</a></span></li>
</ul>
</div>
</div>
<div style="font-family: monospace, monospace; font-size: small;"><span style="font-family: arial, helvetica, sans-serif;">The resources for your computation/simulation needs to be determined emperically.</span></div>
<div style="font-family: monospace, monospace; font-size: small;"><hr><u><strong><span style="font-family: arial, helvetica, sans-serif;">Method 1: Make a Slurm submission script and guestimate your resources required (CPU cores, number of nodes, walltime etc.). </span></strong></u></div>
</div>
</div>
</div>
</div>
</div>
<p id="bkmrk-submit-the-job-via-s"><span style="font-family: arial, helvetica, sans-serif;">Submit the job via sbatch and then analyze the effiency of the job with seff and refine your scheduling parameters on the next run.</span></p>
<pre id="bkmrk-%24-seff-10773-job-id%3A"><code class="language-bash">$ seff 10773
Job ID: 10773
Cluster: blueplanet
User/Group: santucci/staff
State: COMPLETED (exit code 0)
Nodes: 1
Cores per node: 32
CPU Utilized: 00:00:20
CPU Efficiency: 1.56% of 00:21:20 core-walltime
Job Wall-clock time: 00:00:40
Memory Utilized: 448.57 MB
Memory Efficiency: 2.74% of 16.00 GB
</code></pre>
<p id="bkmrk-if-you-want-to-see-w"><span style="font-family: arial, helvetica, sans-serif;">If you want to see which node was selected for the job look at the epilog output</span></p>
<pre id="bkmrk-%24-cat-slurm.epilog-1"><code class="language-bash">$ cat slurm.epilog-10773 
-------------- slurm.epilog ---------------
Job ID:    10773
User:      santucci
Group:     staff
Job Name:  my_matlab_job
Partition: has2.5
QOS:       normal 
Account:   staff
Reason:    None,c-19-293
Nodelist:  c-19-293
Command:   /data11/home/santucci/runv2.sh
WorkDir:   /data11/home/santucci
BatchHost: c-19-293
</code></pre>
<div class="content clearfix" id="bkmrk-">
<div class="content clearfix">
<div class="field field-name-body field-type-text-with-summary field-label-hidden">
<div class="field-items">
<div class="field-item even">
<div style="font-family: monospace, monospace; font-size: small;"><hr></div>
</div>
</div>
</div>
</div>
</div>
<p id="bkmrk-method-2%3A-request-an"><u><strong>Method 2:<span style="font-family: arial, helvetica, sans-serif;"> Request an interactive shell and experiment to determine how much memory is required and how long it needs to run</span>.</strong></u></p>
<div class="content clearfix" id="bkmrk-%C2%A0">
<div class="content clearfix">
<div class="content clearfix">
<div class="field field-name-body field-type-text-with-summary field-label-hidden">
<div class="field-items">
<div class="field-item even">
<div style="font-family: monospace, monospace; font-size: small;"><br></div>
<div style="font-family: monospace, monospace; font-size: small;">
<div style="font-family: monospace, monospace; font-size: small;">&nbsp;</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
<pre id="bkmrk-srun---pty---x11--t-"><code class="language-bash">srun --pty --x11 -t 300 -n 1 -p &lt;partition-list&gt; bash -i</code></pre>
<div class="content clearfix" id="bkmrk-recommendations-on-p">
<div class="content clearfix">
<div class="field field-name-body field-type-text-with-summary field-label-hidden">
<div class="field-items">
<div class="field-item even">
<div style="font-family: monospace, monospace; font-size: small;"><br></div>
<div style="font-family: monospace, monospace; font-size: small;"><span style="font-family: arial, helvetica, sans-serif;">recommendations on profile are available @ </span><a href="https://www.nccs.nasa.gov/user_info/slurm/determine_memory_usage">https://www.nccs.nasa.gov/user_info/slurm/determine_memory_usage</a></div>
<div>&nbsp;</div>
<div>If new to Slurm please see <a href="https://knowledge.ps.uci.edu/books/running-jobs-slurm/page/queueing-system">this page</a>.</div>
<hr></div>
</div>
</div>
</div>
</div>
<p id="bkmrk-here-are-two-quick-r">Here are two quick reference guides that you will want to have handy:</p>
<p id="bkmrk-https%3A%2F%2Fslurm.schedm"><cite class="m_4124538184945677148gmail-m_-6430851990854406648gmail-iUh30"><a href="https://slurm.schedmd.com/pdfs/summary.pdf">https://slurm.schedmd.com/pdfs<wbr>/summary.pdf</a></cite></p>
<p id="bkmrk-https%3A%2F%2Fwww.chpc.uta"><cite class="m_4124538184945677148gmail-m_-6430851990854406648gmail-iUh30"><a href="https://www.chpc.utah.edu/presentations/SlurmCheatsheet.pdf">https://www.chpc.utah.edu/pres<wbr>entations/SlurmCheatsheet.pdf</a></cite></p>
<p id="bkmrk-credit%3A-inspiration-"><cite class="m_4124538184945677148gmail-m_-6430851990854406648gmail-iUh30">Credit: inspiration for this example comes from </cite><a href="https://it.math.ncsu.edu/hpc/slurm/batch/matlab">https://it.math.ncsu.edu/hpc/slurm/batch/matlab</a></p>

# Interactive Matlab

Remember gplogin\* (login node) is a shared resource, so one shouldn't be running matlab directly on a login node. It's not a problem until someone points it out as a problem, so we need everyone to get in the habit of requesting an interactive session for Matlab when not using the Slurm scheduler.

<div class="gmail_default" id="bkmrk-for-interactive-sess">for interactive sessions on Intel</div>```gmail-bash
srun --pty --x11 -t 24:00:00 -n 1 --mem=8192 -p brd2.4,has2.5,nes2.8,sib2.9,sky2.4 bash -i
```

 for interactive sessions on AMD

```gmail-bash
srun --pty --x11 -t 24:00:00 -n 1 --mem=8192 -p ilg2.3,m-c1.9,mc2.2 bash -i
```

<div class="gmail_default" id="bkmrk-after-you-get-a-shel"><div>after you get a shell</div></div>```bash
ml purge
ml matlab
matlab &
```