{"id":5923,"date":"2020-07-15T12:09:21","date_gmt":"2020-07-15T11:09:21","guid":{"rendered":"https:\/\/research.reading.ac.uk\/act\/?post_type=kbe_knowledgebase&#038;p=5923"},"modified":"2026-01-09T18:24:41","modified_gmt":"2026-01-09T18:24:41","slug":"racc-slurm-commands-and-resource-allocation-policy","status":"publish","type":"kbe_knowledgebase","link":"https:\/\/research.reading.ac.uk\/act\/knowledgebase\/racc-slurm-commands-and-resource-allocation-policy\/","title":{"rendered":"Reading Academic Computing Cluster &#8211; Slurm commands and resource allocation policy"},"content":{"rendered":"<p>(work in progress)<\/p>\n<h2>Resources and limits<\/h2>\n<p><strong>Partitions:<\/strong><\/p>\n<p>In SLURM, partitions are (possibly overlapping) groups of nodes. Partitions are similar to queues in some other batch systems, e.g. in SGE on met-cluster and maths-cluster. The default partition is called \u2018cluster\u2019 and it is has 24 hours default time limit, and the maximum time limit is 30 days, but we do not recommend running jobs that long. There is also the &#8216;limited&#8217; partition, with maximum time limit of 24 hours. The partition &#8216;limited&#8217; allows access to some of the &#8216;project&#8217; nodes.<\/p>\n<table style=\"border-collapse: collapse;width: 100%;height: 125px\">\n<tbody>\n<tr style=\"height: 25px\">\n<td style=\"width: 33.3333%;height: 25px\">partition<\/td>\n<td style=\"width: 33.3333%;height: 25px\">limits<\/td>\n<td style=\"width: 33.3333%;height: 25px\">description<\/td>\n<\/tr>\n<tr style=\"height: 25px\">\n<td style=\"width: 33.3333%;height: 25px\">cluster<\/td>\n<td style=\"width: 33.3333%;height: 25px\"><\/td>\n<td style=\"width: 33.3333%;height: 25px\"><\/td>\n<\/tr>\n<tr style=\"height: 25px\">\n<td style=\"width: 33.3333%;height: 25px\">limited<\/td>\n<td style=\"width: 33.3333%;height: 25px\"><\/td>\n<td style=\"width: 33.3333%;height: 25px\"><\/td>\n<\/tr>\n<tr style=\"height: 25px\">\n<td style=\"width: 33.3333%;height: 25px\">gpu<\/td>\n<td style=\"width: 33.3333%;height: 25px\"><\/td>\n<td style=\"width: 33.3333%;height: 25px\"><\/td>\n<\/tr>\n<tr style=\"height: 25px\">\n<td style=\"width: 33.3333%;height: 25px\">gpu_limited<\/td>\n<td style=\"width: 33.3333%;height: 25px\"><\/td>\n<td style=\"width: 33.3333%;height: 25px\"><\/td>\n<\/tr>\n<tr>\n<td style=\"width: 33.3333%\">project<\/td>\n<td style=\"width: 33.3333%\"><\/td>\n<td style=\"width: 33.3333%\"><\/td>\n<\/tr>\n<tr>\n<td style=\"width: 33.3333%\">custom project partitions<\/td>\n<td style=\"width: 33.3333%\"><\/td>\n<td style=\"width: 33.3333%\"><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>The cluster currently consists of over 50 compute nodes, with core counts varying between 8-cores up to 24-cores per node. There is also a node with 4 GPU devices.<\/p>\n<p><strong>Time and memory limits:<\/strong><\/p>\n<p>The default time limit in the \u2018cluster\u2019 partition is 24 hours and the default memory limit is 1 GB per CPU core. The maximum time limit is 30 days, and the maximum memory limit is not set, it is limited only by the hardware capacity. <strong>Users are expected to properly estimate their CPU and memory requirements. Over-allocating resources will prevent other users from accessing unused memory and CPU time.<\/strong> Also, previously consumed resources are used to compute user\u2019s fair-share priority factor, and overprovisioning jobs will have a negative effect on future user\u2019s job priority.<\/p>\n<p>The CPU and memory limits will be strictly enforced by the scheduler. Tasks will be limited to run on the requested number of CPU cores, e.g. if you request one CPU core and run a parallel job, all your threads or processes will run on a single CPU core. Processes, which exceed their memory allocation will be killed by the scheduler.<\/p>\n<p><strong>Fair share policies:<\/strong><\/p>\n<p>Jobs waiting to be scheduled are ordered according to a multifactor priority. The fair-share component is based on<\/p>\n<ul>\n<li>the amount of resources already allocated to the user, i.e. the CPUs and memory being used by user\u2019s running jobs,<\/li>\n<li>on resources consumed by the user in the past,<\/li>\n<\/ul>\n<p>and the job component depends on<\/p>\n<ul>\n<li>the resource request of the jobs &#8211; short and small memory jobs will start faster,<\/li>\n<li>the partition, and on the (not implemented yet) quality of service factor (QOF),<\/li>\n<\/ul>\n<h2>Power saving<\/h2>\n<p>Inactive nodes are automatically shut down. The \u2018~\u2019 in the node status shown by the command \u2018sinfo\u2019, means that the node is switched off.\u00a0 When you submit a job, such nodes will be automatically switched on and there will be only a short delay before the job starts running, to allow the servers to boot. It takes 6 minutes to boot up all the 17 nodes, but it is much quicker when fewer nodes need to be started.<\/p>\n<p>It should be noted that after a node has just powered up, there might be some problems with MPI jobs, related to the automounter and sssd. A workaround is to add a dummy job slice in the batch script, e.g. \u2018cd\u2019, which will fail, but it will force automounting the home directory and the production task will run fine.<\/p>\n<p>&nbsp;<\/p>\n<h2>SLURM commands<\/h2>\n<p><strong>Overview:<\/strong><\/p>\n<p>Available cluster resources can be displayed with the command <span class=\"theme:terminal lang:default decode:true crayon-inline\">sinfo<\/span><\/p>\n<p>Running jobs can be displayed with the command <span class=\"theme:terminal lang:default decode:true crayon-inline\">squeue<\/span> (better with -l flag)<\/p>\n<p>Further, job accounting data can be obtained with the command <span class=\"theme:terminal lang:default decode:true crayon-inline\">sacct<\/span><\/p>\n<p>Batch jobs are submitted using the command <span class=\"theme:terminal lang:default decode:true crayon-inline\">sbatch<\/span><\/p>\n<p>The commands <span class=\"theme:terminal lang:default decode:true crayon-inline\">salloc<\/span> and<strong> <span class=\"theme:terminal lang:default decode:true crayon-inline\">srun<\/span> <\/strong>allow to interactively run tasks on the compute nodes (this is not an interactive session known by met-cluster users)<\/p>\n<p>Jobs can be killed with\u00a0 <span class=\"theme:terminal lang:default decode:true crayon-inline\">scancel<\/span><\/p>\n<p><strong>Monitoring cluster resources with \u2018<\/strong><strong>sinfo<\/strong><strong>\u2019:<\/strong><\/p>\n<p>As \u2018cluster\u2019 is the default partition, it is convenient to display the resources for just this one, by adding \u2018-p cluster\u2019 to the \u2018sinfo\u2019 command. By default, the nodes which are in the same state are grouped together.<\/p>\n<pre class=\"toolbar-hide:true show-title:false nums:false nums-toggle:false lang:default highlight:0 decode:true\">$ sinfo -p cluster\r\n\r\nPARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST\r\ncluster*     up   infinite      7  idle~ compute-0-[5-11]\r\ncluster*     up   infinite      1    mix compute-0-0\r\ncluster*     up   infinite      4  alloc compute-0-[1-4]<\/pre>\n<p>The above output shows that nodes 5-11 are idle, and \u2018~\u2019 means they are switched off to save power. Nodes 1-4 are fully allocated, meaning they will not be available for new jobs until the jobs currently running on them are finished. The mix state for the compute-0-0 node means that some of the cores on the node are in use and some of them are free.<\/p>\n<p>Further details can be displayed using the \u2018-o\u2019 flag. See the manual page, \u2018<a href=\"https:\/\/slurm.schedmd.com\/sinfo.html\">man sinfo<\/a>\u2019, for more details on format specifiers. In this example, the number of CPU cores are displayed with the command:<\/p>\n<pre class=\"toolbar-hide:false show-title:false nums:false nums-toggle:false lang:default highlight:0 decode:true\">$ sinfo -p cluster -o \"%P %.6t %C\"\r\nPARTITION  STATE CPUS(A\/I\/O\/T)\r\ncluster*  idle~ 0\/112\/0\/112\r\ncluster*    mix 8\/8\/0\/16\r\ncluster*  alloc 64\/0\/0\/64<\/pre>\n<p>A\/I\/O\/T stands for Allocated\/Idle\/Other\/Total. The idle and switched off nodes have 112 cores available. There is a node with 8 cores allocated, and another 8 cores idle. In total, there are 120 cores (8 + 112) available for new jobs.<\/p>\n<p>Nodes can be listed individually by adding the \u2018-N\u2019 flag:<\/p>\n<pre class=\"top-set:false bottom-set:false toolbar-hide:false show-title:false marking:false ranges:false nums:false nums-toggle:false wrap-toggle:false plain:false plain-toggle:false copy:false popup:false expand-toggle:false decode-attributes:false trim-whitespace:false trim-code-tag:false mixed:false lang:sh highlight:0 decode:true show_mixed:false show_mixed:false show_mixed:false show_mixed:false\">$ sinfo -p cluster -N -o \"%N %.6t %C\"\r\nNODELIST  STATE CPUS(A\/I\/O\/T)\r\ncompute-0-0  alloc 16\/0\/0\/16\r\ncompute-0-1  alloc 16\/0\/0\/16\r\ncompute-0-2  alloc 16\/0\/0\/16\r\ncompute-0-3  alloc 16\/0\/0\/16\r\ncompute-0-4  idle~ 0\/16\/0\/16\r\ncompute-0-5  idle~ 0\/16\/0\/16\r\ncompute-0-6  idle~ 0\/16\/0\/16\r\ncompute-0-7  idle~ 0\/16\/0\/16\r\ncompute-0-8   mix  8\/8\/0\/16\r\ncompute-0-9  idle~ 0\/16\/0\/16\r\ncompute-0-10  idle~ 0\/16\/0\/16\r\ncompute-0-11  idle~ 0\/16\/0\/16<\/pre>\n<p><strong>Monitoring jobs with squeue<\/strong><\/p>\n<p>A more informative output formatting can be achieved with the follwing<\/p>\n<p><span class=\"nums:false nums-toggle:false lang:default highlight:0 decode:true crayon-inline \">squeue -o &#8220;%.18i %.8u %.8a %.9P %q %.8j %.8T %.12M %.12l %.5C %.10R %p<\/span><\/p>\n<p>it might be a good idea to add this as an alias in your .bashrc.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>(work in progress)  Resources and limits  Partitions:    In SLURM, partitions are (possibly overlapping) groups of nodes. Partitions are similar to queues in some other batch systems, e.g. in SGE on met-cluster and maths-cluster. The default partition is called \u2018cluster\u2019 and it is has 24 hours default time limit, and the<\/p>\n","protected":false},"author":361,"featured_media":223,"template":"","meta":{"_acf_changed":false,"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"__cvm_playback_settings":[],"__cvm_video_id":"","_links_to":"","_links_to_target":""},"kbe_taxonomy":[],"kbe_tags":[],"class_list":["post-5923","kbe_knowledgebase","type-kbe_knowledgebase","status-publish","has-post-thumbnail","hentry"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v21.8.1 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Reading Academic Computing Cluster - Slurm commands and resource allocation policy - Academic Computing Team<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/research.reading.ac.uk\/act\/knowledgebase\/racc-slurm-commands-and-resource-allocation-policy\/\" \/>\n<meta property=\"og:locale\" content=\"en_GB\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Reading Academic Computing Cluster - Slurm commands and resource allocation policy - Academic Computing Team\" \/>\n<meta property=\"og:description\" content=\"(work in progress) Resources and limits Partitions:  In SLURM, partitions are (possibly overlapping) groups of nodes. Partitions are similar to queues in some other batch systems, e.g. in SGE on met-cluster and maths-cluster. The default partition is called \u2018cluster\u2019 and it is has 24 hours default time limit, and the\" \/>\n<meta property=\"og:url\" content=\"https:\/\/research.reading.ac.uk\/act\/knowledgebase\/racc-slurm-commands-and-resource-allocation-policy\/\" \/>\n<meta property=\"og:site_name\" content=\"Academic Computing Team\" \/>\n<meta property=\"article:modified_time\" content=\"2026-01-09T18:24:41+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/research.reading.ac.uk\/act\/wp-content\/uploads\/sites\/2\/2017\/03\/cluster-1-1.png\" \/>\n\t<meta property=\"og:image:width\" content=\"128\" \/>\n\t<meta property=\"og:image:height\" content=\"128\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Estimated reading time\" \/>\n\t<meta name=\"twitter:data1\" content=\"4 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/research.reading.ac.uk\/act\/knowledgebase\/racc-slurm-commands-and-resource-allocation-policy\/\",\"url\":\"https:\/\/research.reading.ac.uk\/act\/knowledgebase\/racc-slurm-commands-and-resource-allocation-policy\/\",\"name\":\"Reading Academic Computing Cluster - Slurm commands and resource allocation policy - Academic Computing Team\",\"isPartOf\":{\"@id\":\"https:\/\/research.reading.ac.uk\/act\/#website\"},\"datePublished\":\"2020-07-15T11:09:21+00:00\",\"dateModified\":\"2026-01-09T18:24:41+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/research.reading.ac.uk\/act\/knowledgebase\/racc-slurm-commands-and-resource-allocation-policy\/#breadcrumb\"},\"inLanguage\":\"en-GB\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/research.reading.ac.uk\/act\/knowledgebase\/racc-slurm-commands-and-resource-allocation-policy\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/research.reading.ac.uk\/act\/knowledgebase\/racc-slurm-commands-and-resource-allocation-policy\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/research.reading.ac.uk\/act\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Knowledgebase\",\"item\":\"https:\/\/research.reading.ac.uk\/act\/knowledgebase\/\"},{\"@type\":\"ListItem\",\"position\":3,\"name\":\"Reading Academic Computing Cluster &#8211; Slurm commands and resource allocation policy\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/research.reading.ac.uk\/act\/#website\",\"url\":\"https:\/\/research.reading.ac.uk\/act\/\",\"name\":\"Academic Computing Team\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\/\/research.reading.ac.uk\/act\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/research.reading.ac.uk\/act\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-GB\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/research.reading.ac.uk\/act\/#organization\",\"name\":\"University of Reading\",\"url\":\"https:\/\/research.reading.ac.uk\/act\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-GB\",\"@id\":\"https:\/\/research.reading.ac.uk\/act\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/research.reading.ac.uk\/act\/wp-content\/uploads\/sites\/2\/2017\/08\/cropped-University_of_Reading_shield-1.png\",\"contentUrl\":\"https:\/\/research.reading.ac.uk\/act\/wp-content\/uploads\/sites\/2\/2017\/08\/cropped-University_of_Reading_shield-1.png\",\"width\":512,\"height\":512,\"caption\":\"University of Reading\"},\"image\":{\"@id\":\"https:\/\/research.reading.ac.uk\/act\/#\/schema\/logo\/image\/\"}}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Reading Academic Computing Cluster - Slurm commands and resource allocation policy - Academic Computing Team","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/research.reading.ac.uk\/act\/knowledgebase\/racc-slurm-commands-and-resource-allocation-policy\/","og_locale":"en_GB","og_type":"article","og_title":"Reading Academic Computing Cluster - Slurm commands and resource allocation policy - Academic Computing Team","og_description":"(work in progress) Resources and limits Partitions:  In SLURM, partitions are (possibly overlapping) groups of nodes. Partitions are similar to queues in some other batch systems, e.g. in SGE on met-cluster and maths-cluster. The default partition is called \u2018cluster\u2019 and it is has 24 hours default time limit, and the","og_url":"https:\/\/research.reading.ac.uk\/act\/knowledgebase\/racc-slurm-commands-and-resource-allocation-policy\/","og_site_name":"Academic Computing Team","article_modified_time":"2026-01-09T18:24:41+00:00","og_image":[{"width":128,"height":128,"url":"https:\/\/research.reading.ac.uk\/act\/wp-content\/uploads\/sites\/2\/2017\/03\/cluster-1-1.png","type":"image\/png"}],"twitter_card":"summary_large_image","twitter_misc":{"Estimated reading time":"4 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/research.reading.ac.uk\/act\/knowledgebase\/racc-slurm-commands-and-resource-allocation-policy\/","url":"https:\/\/research.reading.ac.uk\/act\/knowledgebase\/racc-slurm-commands-and-resource-allocation-policy\/","name":"Reading Academic Computing Cluster - Slurm commands and resource allocation policy - Academic Computing Team","isPartOf":{"@id":"https:\/\/research.reading.ac.uk\/act\/#website"},"datePublished":"2020-07-15T11:09:21+00:00","dateModified":"2026-01-09T18:24:41+00:00","breadcrumb":{"@id":"https:\/\/research.reading.ac.uk\/act\/knowledgebase\/racc-slurm-commands-and-resource-allocation-policy\/#breadcrumb"},"inLanguage":"en-GB","potentialAction":[{"@type":"ReadAction","target":["https:\/\/research.reading.ac.uk\/act\/knowledgebase\/racc-slurm-commands-and-resource-allocation-policy\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/research.reading.ac.uk\/act\/knowledgebase\/racc-slurm-commands-and-resource-allocation-policy\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/research.reading.ac.uk\/act\/"},{"@type":"ListItem","position":2,"name":"Knowledgebase","item":"https:\/\/research.reading.ac.uk\/act\/knowledgebase\/"},{"@type":"ListItem","position":3,"name":"Reading Academic Computing Cluster &#8211; Slurm commands and resource allocation policy"}]},{"@type":"WebSite","@id":"https:\/\/research.reading.ac.uk\/act\/#website","url":"https:\/\/research.reading.ac.uk\/act\/","name":"Academic Computing Team","description":"","publisher":{"@id":"https:\/\/research.reading.ac.uk\/act\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/research.reading.ac.uk\/act\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-GB"},{"@type":"Organization","@id":"https:\/\/research.reading.ac.uk\/act\/#organization","name":"University of Reading","url":"https:\/\/research.reading.ac.uk\/act\/","logo":{"@type":"ImageObject","inLanguage":"en-GB","@id":"https:\/\/research.reading.ac.uk\/act\/#\/schema\/logo\/image\/","url":"https:\/\/research.reading.ac.uk\/act\/wp-content\/uploads\/sites\/2\/2017\/08\/cropped-University_of_Reading_shield-1.png","contentUrl":"https:\/\/research.reading.ac.uk\/act\/wp-content\/uploads\/sites\/2\/2017\/08\/cropped-University_of_Reading_shield-1.png","width":512,"height":512,"caption":"University of Reading"},"image":{"@id":"https:\/\/research.reading.ac.uk\/act\/#\/schema\/logo\/image\/"}}]}},"_links":{"self":[{"href":"https:\/\/research.reading.ac.uk\/act\/wp-json\/wp\/v2\/kbe_knowledgebase\/5923","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/research.reading.ac.uk\/act\/wp-json\/wp\/v2\/kbe_knowledgebase"}],"about":[{"href":"https:\/\/research.reading.ac.uk\/act\/wp-json\/wp\/v2\/types\/kbe_knowledgebase"}],"author":[{"embeddable":true,"href":"https:\/\/research.reading.ac.uk\/act\/wp-json\/wp\/v2\/users\/361"}],"version-history":[{"count":6,"href":"https:\/\/research.reading.ac.uk\/act\/wp-json\/wp\/v2\/kbe_knowledgebase\/5923\/revisions"}],"predecessor-version":[{"id":7984,"href":"https:\/\/research.reading.ac.uk\/act\/wp-json\/wp\/v2\/kbe_knowledgebase\/5923\/revisions\/7984"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/research.reading.ac.uk\/act\/wp-json\/wp\/v2\/media\/223"}],"wp:attachment":[{"href":"https:\/\/research.reading.ac.uk\/act\/wp-json\/wp\/v2\/media?parent=5923"}],"wp:term":[{"taxonomy":"kbe_taxonomy","embeddable":true,"href":"https:\/\/research.reading.ac.uk\/act\/wp-json\/wp\/v2\/kbe_taxonomy?post=5923"},{"taxonomy":"kbe_tags","embeddable":true,"href":"https:\/\/research.reading.ac.uk\/act\/wp-json\/wp\/v2\/kbe_tags?post=5923"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}