There are three options for priority, paid for, access to RACC resources: 1) the partition for the legacy met-cluster projects, based on hardware purchased by these projects. 2. similar, but new project partition consisting of new nodes purchased by projects (no nodes purchased yet). 3. custom project partitions for some large projects with custom hardware requirements, e.g. with GPUs.
Idle time on the contributed hardware might be shared with other users. This will be arranged in such a way that inconvenience for the paying users is minimal.
The level of support for the paying projects on RACC is the same as for the users of the free partitions, i. e. it is on the best effort basis. Even in the case of in warranty hardware, supported by the supplier, RACC is based on free software. Also, in case of RACC we focus on capacity and efficiency, not on reliability. The cluster configuration might be adjusted to changing research requirements, and sometimes tests might be carried out on the production system. A higher level of support and reliability is provided by Research Cloud Service.
The active meteorology paid-for projects should now use the dedicated resources for interactive and batch computing on the RACC. met-cluster has now only one compute node (so it is not really a cluster anymore!) and is not suitable for running compute intensive jobs. The sizes of the CPU allocations on RACC are the same as they were on met-cluster (however, in some cases we are counting hardware threads, not CPU cores now, and the numbers will be double). The existing ‘project’ compute nodes are out-of-warranty and this project partition will be gradually reduced in size and then decommissioned as the legacy subscriptions expire.
When possible, some of the login nodes will be set aside for exclusive use by priority users who are member of the paying projects. Priority users have access to both ‘project’ and ‘free’ login nodes, and login sessions scheduling is performed on the pool consisting of both groups of nodes. Both the priority and free nodes have similar spec. The benefit of the priority nodes is that they are used by a smaller number of users the CPU and memory load is expected to be lower. As a new session is scheduled on the node with the lowest load, the priority users will be typically using the priority nodes, but in cases of a high load on the priority nodes they might be logged in on any available node.
To submit a job to the privileged ‘project’ partition, one needs to specify the account name i.e. their project name and select the partition ‘project’. This can be done from the command line:
sbatch -A <project name> -p project batch_script.sh
or, by specifying the account and the partition in the batch script with the directives:
#SBATCH --account=<project name>
Note that specifying the account with -A <project name> and not adding -p project will result in running the job in the free partition ‘cluster’ (in some cases it might be useful). Such job will enjoy higher priority given by the ‘project’ quality of service (QOS) associated with the account, and it will count towards the account’s CPU limit.
The accounts (projects) available to the user can be listed with the command:
sacctmgr show assoc user=<username> format=User,Account,QOS
The project allocation and the list of allowed users can be checked with the command:
sacctmgr show assoc Account=<project> format=Account,User,GrpTRES
It might happen that a job does not start in the paid-for allocation because some resources are already in use, or because the job is larger than the account’s CPU allocation. In those cases the job status will be ‘AssocGrpCpuLimit’. For pending (queued) jobs, the project account can be changed. If you have access to another account you can try:
scontrol update job <job number> Account=<another project>
Or, you can reset the job to the default free account ‘local’ and the partition ‘cluster’:
scontrol update job <job number> Account=local Partition=cluster QOS=normal
We are planning purchasing new nodes and creating a new high performance partition consisting of hardware funded by research projects. The goal is that this should be a uniform pool of nodes, not servers matched to different project budgets. Warranty on these servers will be 5 years, so purchasing a node will give 5 years of access to the proportional amount of resources in the project partition. After that time those nodes will be part of the free partition and will be useful to all cluster users for another couple of years. That way buying these nodes will help ensuring the clusters keeps running in the future. The current nodes are old, inefficient, and eventually they will need to be replaced.
The spec of the new node is: 2 x AMD EPYC 7402 2.8 GHz with 24 cores each and 512 GB of RAM. The cost of one server is around £6900 plus VAT. It is under discussion if cost of one server can be shared by many projects, similar to the former processing units system in Met department.
To start creating this partition we need a sufficient number of projects willing to purchase nodes. IT is not able to purchase the nodes without confirmation that the cost will be covered by research grants and we need a reasonably sized pool of identical nodes to start with, otherwise sharing nodes, scheduling jobs, etc. are not going to work well.
Access will be organized in a similar way as for the legacy met projects described above. It will be one shared partition with project allocation proportional to the number of contributed nodes. We can adjust the proportional allocations to maximize resource utilization and throughput, such that project can have access to more resources at once, when they need them.
A shared partition, and not access restricted to the purchased servers is proposed, because this model has a number of advantages. Sharing resources allows access to a larger number of CPUs than the number of CPUs you purchased, while idle time on your hardware can be used by others. When not relying on a single server, a server failure will not mean that the project looses all access to computing. Sharing nodes also offers better power efficiency, so it is more environment friendly approach to computing. The nodes are automatically powered on and off as they are needed. At less busy time, in a shared partition, a number of users can share the nodes that happen to be already powered on, instead of frequently powering up their own nodes, or worse, keeping them always running, idle or underutilized.
In case of large computational projects, with the amount of hardware that could form a separate cluster, or with specialized hardware e.g. with GPUs, separate project partitions can be created. In such cases confirmation from the IT must be obtained before any hardware is purchased.