Research projects can purchase hardware and gain access to dedicated paid-for resources for batch computing. There are two options: 1. for smaller budget and standard requirements projects can pay for a standard compute node, and in some cases for a fraction of a compute node, 2. for large budgets and specialist requirements e.g. GPU, a lot of RAM or low latency network etc., custom hardware can be purchased.
Idle time on the contributed hardware might be shared with other users; however, this will be arranged in such a way that inconvenience for the members of the paying group is minimal.
The level of software support for the paying projects on RACC is the same as for the users of the free partitions, i.e. it is on best effort basis. Even in the case of in warranty hardware, supported by the supplier, RACC is based on free software. Also, in case of RACC we focus on capacity and efficiency, not on reliability. The cluster configuration might be adjusted to changing research requirements, and sometimes tests might be carried out on the production system. A higher level of support and reliability is provided with the Research Cloud Service.
New: It is now possible to submit requests for node purchase, and this can be done using the RACC Node Request Form available in the Self Service Portal.
The basic way to gain access to high performance hardware on RACC is to purchase one or more standard compute nodes. For accounting purposes projects own the particular hardware. However, in the cluster, those customer-owned nodes will form a larger cluster partition shared by all groups who purchased hardware and groups will have access to to resources equivalent to their purchase. This will offer better flexibility of sharing resources. We can adjust the proportional allocations to maximize resource utilization and throughput, such that researchers can have access to more resources at once, when they need them.
The plan to purchase new nodes in March 2020 have not been realised because of COVID-19 and campus lockdown. In March 2020 spec of the standard node nodes was: 2 x AMD EPYC 7402 2.8 GHz with 24 cores each and 512 GB of RAM, and the cost of one server was around £6900 plus VAT. We decided to not rush with purchasing the nodes at the end of 2020, but to wait a bit longer because more efficient AMD EPYC 3rd generation processors were to be available in early 2021. It is now possible to order the new spec nodes, as planned, with two AMD EPYC 7413 24-core CPUs and 512 GB of RAM per node. The cost is around £9000 per node. However, we are purchasing quad-server nodes, and we will need to have requests for 4 servers before the nodes can be ordered. The quote with some more hardware details can be viewed on request, and it will need to be updated before each purchase. Supplier’s pricing might change a bit, but we are not planing to change node spec, until substantially better option becomes available. To request nodes please go to the RACC Node Request Form available in the Self Service Portal, or rise a ticket to discuss your requirements before ordering.
Warranty on these servers will be 5 years, so purchasing a node will give 5 years of access to the proportional amount of resources in the project partition. After that time those nodes will be part of the free partition and will be useful to all cluster users for another couple of years. That way buying these nodes will also help ensuring the clusters keeps running in the future. The current free nodes are old, inefficient, and eventually they will need to be decommissioned.
It might be possible that the cost of one server is shared by many projects (typically the cost of a 4-way system is shared by many projects anyway). In such cases we recommend you let us know as soon as possible about your requirements. There might be some waiting time before your request is matched with other requests such that a servers can be purchased.
To start creating the new partition we need a sufficient number of projects willing to purchase nodes. IT is not able to purchase the nodes without confirmation that the cost will be covered by research grants and we need a reasonably sized pool of identical nodes to start with, otherwise sharing nodes, scheduling jobs, etc. are not going to work well. So, we encourage you to participate in the initial purchase in June 2021.
A shared partition, and not access restricted to the purchased servers is proposed, because this model has a number of advantages. Sharing resources allows access to a larger number of CPUs than the number of CPUs you purchased, while idle time on your hardware can be used by others. When not relying on a single server, a server failure will not mean that the project looses all access to computing. Sharing nodes also offers better power efficiency, so it is more environment friendly approach to computing. The nodes are automatically powered on and off as they are needed. At less busy time, in a shared partition, a number of users can share the nodes that happen to be already powered on, instead of frequently powering up their own nodes, or worse, keeping them always running, idle or underutilized.
Each group has a separate partition, however all those partition share the same pool of all purchased standard compute nodes (the exception are projects with custom hardware). Each project partition has specific resource limits, proportional to the capacity of the hardware purchased by the project. Access to each project partition is controlled via a security group with the name in the format racc-<project name>, and project managers have access to add/remove members in this group.
To submit a job to the project partition, you just need specify the partition. This can be done from the command line:
sbatch -p <project name> batch_script.sh
or in the batch script with the directives:
#SBATCH --partition=<project name>
E.g. if the project name is mpecdt we submit with the command:
sbatch -p mpecdt batch_script.sh
or in the script we add
Viewing the resurce limits
The resource limits associated with the project can be checked with the command:
sacctmgr show qos <project name> format=Name,MaxTRESPU%20,grpTRES%20
$ sacctmgr show qos mpecdt format=Name,MaxTRESPU%20,grpTRES%20 Name MaxTRESPU GrpTRES ---------- -------------------- -------------------- mpecdt cpu=384,mem=2T
Note that the CPU number shown is for virtual CPUs. It is double because of Simultaneous multithreading (SMT). The number of physical cores is half of the number shown.
Project managers can can add and remove users allowed to use project resources by adding and removing users from the relevant Active Directory group. The group names are in the format racc-<project name> (if needed we can just add a nested group to racc-<project name>). Group membership can be managed using Group Manager software available from Apps Anywhere (works on campus) and in a web browser using https://rds.act.reading.ac.uk (see Accessing Windows Servers and Desktops from off campus). In Group Manager, the manager will see only the groups they have the permissions to manage.
Activating the changes in group membership requires reloading Slurm configuration. This will happen periodically, and also when a new user logs in to RACC, so in some cases there might be a deley before the change is active.
We have compared performance and power efficiency of the new nodes with the old nodes. As an example old node we we took 10 years old Xeon X5650 because we have quite a few of them, and we would like to retire them as soon as the new nodes are purchased. Comparing performance per core of the fully utilised CPUs, for typical jobs ruining on RACC, the new processors have consistently over 2.5 better performance (we looked at one Meteorology model – improvement factor is 2.6, and at ‘Physics’ section of the Passmark test suite – improvement factor is 2.78). Power consumption of the old 12-core servers is over 250 W per node. Power consumption of the new 48-core nodes is slightly over 500 W per node. Summarising this: we get over 2.5 times faster cores, there is 10 times racks space saving (running server rooms is expensive), and power saving is 5 times (it is money saving for the university, but also reduction of the environmental impact). This can be illustrated in the following diagram:
In case of large computational projects, with the amount of hardware that could form a separate cluster, or with specialized hardware e.g. with GPUs, separate project partitions can be created. In such cases confirmation from the DTS must be obtained before any hardware is purchased.
The active meteorology paid-for projects should use the dedicated resources for interactive and batch computing on the RACC. The sizes of the CPU allocations on RACC are the same as they were on met-cluster (however, in some cases we are counting hardware threads, not CPU cores now, and the numbers will be double). The existing ‘project’ compute nodes are out-of-warranty and this project partition will be gradually reduced in size and then decommissioned as the legacy subscriptions expire.
To submit a job to the privileged ‘project’ partition, one needs to specify the account name i.e. their project name and select the partition ‘project’. This can be done from the command line:
sbatch -A <project name> -p project batch_script.sh
or, by specifying the account and the partition in the batch script with the directives:
#SBATCH --account=<project name> #SBATCH --partition=project
In case of a custom project partition you need to replace partition ‘project’ with the name of your custom project partition (which is the same as the account name). Note that specifying the account with -A <project name> and not adding -p project will result in running the job in the free partition ‘cluster’ (in some cases it might be useful). Such job will enjoy higher priority given by the ‘project’ quality of service (QOS) associated with the account, and it will count towards the account’s CPU limit.
The accounts (projects) available to the user can be listed with the command:
sacctmgr show assoc user=<username> format=User,Account,QOS
The project allocation and the list of allowed users can be checked with the command:
sacctmgr show assoc Account=<project> format=Account,User,GrpTRES
It might happen that a job does not start in the paid-for allocation because some resources are already in use, or because the job is larger than the account’s CPU allocation. In those cases the job status will be ‘AssocGrpCpuLimit’. For pending (queued) jobs, the project account can be changed. If you have access to another account you can try:
scontrol update job <job number> Account=<another project>
Or, you can reset the job to the default free account ‘shared’ and the partition ‘cluster’:
scontrol update job <job number> Account=shared Partition=cluster QOS=normal