Tutorial for setting up Rose/Cylc in order to run JULES on CEDA JASMIN

Tutorial for setting up Rose/Cylc in order to run JULES on CEDA JASMIN
    (last updated 19 February 2025)

        1. First, complete the JASMIN setup at the following site:
          https://help.jasmin.ac.uk/category/158-getting-started
          If you have connection problems to ‘login-01.jasmin’, see this page: https://help.jasmin.ac.uk/article/848-login-problems
        2. Request group workspace privilege and make sure that you receive group workspace privilege for ‘jules’ at this site: https://accounts.jasmin.ac.uk
          If you work at the University of Reading, then you should also request & receive group workspace privilege for ‘landsurf_rdg’ at the same time.
          If you don’t work at the University of Reading, it would be very useful to have access to another group workspace, so talk to your project leader about an appropriate group workspace.
          The second group workspace is useful both for (i) having access to long-term disk-space storage, and for (ii) SLURM accounting for access to the LOTUS2 Cylc8 batch-processing cluster.
          You might need to wait till after you get a JASMIN account for requesting these privileges.
          Granting of this privilege can take some period of time (e.g., days).
          You should receive an email that you have been granted the privilege to this group workspace.
          After you receive the email confirming that you have been granted this privilege, you can check to make sure that you have been granted access to these group workspaces with these commands:
          ls -ltr /gws/nopw/j04/jules
          ls -ltr /gws/nopw/j04/landsurf_rdg
          from sci-ph-01.jasmin or cylc2.jasmin (skip ahead to step 4 here, if you need connection details), and with successful access to the GWS you should see a list of subdirectories of the jules GWS.
        3. Ask your project leader for help in getting an MOSRS account.
        4. Once you do the initial JASMIN setup and once you have an MOSRS account, do the steps: (a)  ‘Configuring your own laptop or desktop machine’, (b) ‘Logging on to JASMIN’, and (c) ‘Modify configuration files on cylc2.jasmin’  (skipping the ‘Running the suite’ tutorial),  in the following guide:
          https://code.metoffice.gov.uk/trac/jules/wiki/RoseJULESonJASMIN
          When you do the ‘Modify configuration files on cylc2.jasmin’ section of that guide, you should use cylc2.jasmin instead of sci-ph-01.jasmin or sci-ph-02.jasmin or cylc1.jasmin (as indicated in the instructions; users should now make changes to revert to cylc2.jasmin from sci-ph-01.jasmin or sci-ph-02.jasmin, contrary to previous instructions). We’re now using cylc2.jasmin instead of sci-ph-01.jasmin or sci-ph-02.jasmin or cylc1.jasmin, since the graphical user interface and X-windows have been enabled on cylc2.jasmin, and since cylc2.jasmin can access the LOTUS2 cluster and Cylc8 (unlike cylc1.jasmin). The GUI, X-windows, and Cylc8 are necessary for this work.
          Once you do all of this, you should be able to ssh -AX from your laptop to cylc2.jasmin without entering your password. This works best on campus at the University of Reading. Using a VPN off campus should also work, as long as the network setup identified in step #1 (above) finds that you are a ‘.ac.uk’ client.
        5. If you are a University of Reading undergraduate or postgraduate student, then you might not be able to use the VPN. Furthermore, new campus hosts do not get an external DNS record that JASMIN can verify. An alternative is to use JASMIN’s NX Service, the University of Reading’s Linux Managed Desktop Service (NX), or ssh to the University of Reading’s external visible ssh server – arc-ssh both of which have public DNS records. For a user to access these systems from off campus they need to put in a request on the Self Service Portal.
        6. The Linux Managed Desktop Service (NX) (see step#5) is also very useful to all users who might want to use the cylc GUI, since this now uses a browser window opening from JASMIN.
        7. The ability to login to cylc2.jasmin via login-01.jasmin (as in step #4, above) is often useful, since this login-01.jasmin access means that access to xfer1.jasmin or xfer2.jasmin is also available. Access to these xfer* transfer nodes is useful if the user wants to transfer files to their local computers from JASMIN. But sometimes, the connection to login-01.jasmin is not working, or the user doesn’t have their access to the VPN or the Linux Managed Desktop service working. In this case, it is sometimes useful to use login-02.jasmin instead of login-01.jasmin in order to connect to cylc2.jasmin, since login-02.jasmin doesn’t require the VPN or the Linux Managed Desktop. However, using login-01.jasmin through the University of Reading’s servers is CEDA JASMIN’s preferred method of connection. Furthermore, if you use login-02.jasmin, then some of the JASMIN capabilities (like full data-transfer privileges) might not be available.
        8. Once you have your jules group workspace privilege, when logged in to cylc2.jasmin.ac.uk, type rosie checkout u-al752 at the command line. This might take 2-3 minutes. The rosie go GUI doesn’t currently work, as it is being reimplemented for Cylc8.  You won’t be able to commit any changes to this suite since you don’t have permission. However, if you make a copy of this suite, by typing rosie copy u-al752 instead of rosie checkout u-al752, then you will have a new suite with a different suite-number for which you have permissions to commit changes. You don’t need to make a different-numbered copy of the suite right now, as we will do that in step #21 of this tutorial. This suite, u-al752, was originally developed by Karina Williams (Met Office) and Anna Harper (U. Exeter), and it runs JULES models of (up to) 75 different FLUXNET2015 sites around the world. The TRAC site for this suite from Karina and Anna is here, where example plots generated on MONSooN can be downloaded and viewed. Patrick McGuire (U. Reading) ported this suite from MONSooN to CEDA JASMIN.
        9. In the ~/roses/u-al752 directory, you can view the various Rose/Cylc setup files like suite.rc, rose-suite.conf, rose-suite.info, site/suite.rc.CEDA_JASMIN, app/jules/rose-app.conf, and app/fcm_make/file/fcm-make.cfg . You can use vi or more to view these files. Study the files for a little while. Note that all the JULES namelists are condensed to the file: app/jules/rose-app.conf .
        10. To find out about the JULES/FLUXNET suite settings, the first file to look at is rose-suite.conf where we can set up the number of spinups, state whether we want prescribed datasets, give the path to output and plots folders, the path to fluxnet datasets, etc. The second file to check is suite.rc where the list of fluxnet sites, choice of using prescribed datasets, TRIFFID, Phenology and soil carbon pools is set. The version of the FLUXNET dataset used by the suite is also given in suite.rc. It is important to realize that the 2nd file (suite.rc) reads in the values defined in the 1st file (rose-suite.conf). The third file to check is app/jules/rose-app.conf , where all the namelists are, and setting parameters for the JULES run take place. More information on the configuration of prescribed datasets for fluxnet sites (ancillaries, driving datasets, lai, etc) are give in app/jules/opt/ . By searching the key words in the files above (in the recommended order) many questions can be answered. The parameters in rose-app.conf can be searched in JULES Userguide website (for the different JULES version numbers (here: version 7.3). For instance looking for ‘ignition_method’ can be carried out by following this link:
          http://jules-lsm.github.io/vn7.3/namelists/jules_vegetation.nml.html#JULES_VEGETATION::ignition_method
        11. Now change your rose-suite.conf file with vi so that it has the right myusername (in two different places):
          OUTPUT_FOLDER='/work/scratch-pw2/myusername/fluxnet/run11a/jules_output'
          PLOT_FOLDER='/work/scratch-pw2/myusername/fluxnet/run11a/peg/plots'
          Also, change the location:
          LOCATION='CEDA_JASMIN'
          and the SLURM account should be defined as the group workspace that you have access to
          (Please don’t use the jules group workspace here. Please don’t use SLURM_ACCT='jules'.):
          SLURM_ACCT='landsurf_rdg'
          and the suite_data:
          SUITE_DATA='/gws/nopw/j04/jules/pmcguire/fluxnet/kwilliam/suite_data'
          and the JULES FCM/MOSRS archive location so that it uses code compatible with LOTUS2/Cylc8:
          #JULES_FCM='fcm:jules.x_tr'
          #JULES_REVISION='26897'
          JULES_FCM='fcm:jules.x_br/dev/patrickmcguire/vn7.4_intel_rocky9'
          JULES_REVISION=''
          This JULES FCM/MOSRS branch differs from the JULES 7.4 trunk in that it uses a different curl library path. If you use JULES7.7 or later, this is already accounted for in the trunk. However, if you do use the trunk, remember that it is important to use jules.x_tr instead of jules.xm_tr.
        12. The current policy at CEDA JASMIN (November 2020) is that:
          “For testing new workflows and for new JASMIN users, the testing queue test should be used…” See:
          https://help.jasmin.ac.uk/article/4881-lotus-queues
          So, to follow this policy, if you are a new user or if you are testing this workflow, in the file ~/roses/u-al752/site/suite.rc.CEDA_JASMIN , you can change:
          [[JULES_CEDA_JASMIN]]
          inherit = None, JASMIN_LOTUS
          [[[directives]]]
          --partition = standard

          --qos = short

          to:
          [[JULES_CEDA_JASMIN]]
          inherit = None, JASMIN_LOTUS
          [[[directives]]]
          --partition = test

          --constraint = "intel"

          and:
          [[PLOTTING_CEDA_JASMIN]]
          [[[directives]]]
          --time = 08:00:00
          --partition = standard
          --qos = standard

          to:
          [[PLOTTING_CEDA_JASMIN]]
          [[[directives]]]
          --time = 04:00:00
          --partition = test
        13. JASMIN has changed over the batch queues from LSF to SLURM and LOTUS2. The batch queue information for this u-al752 suite has been updated recently. The MPI libraries for SLURM for JULES on JASMIN were updated in ~/roses/u-al752/suite.rc.CEDA_JASMIN .
        14. Currently, the standard queue with qos = short is used. Sometimes the short qos’s waiting time is long, but switching to the standard qos doesn’t work well for reducing the queuing time. Previously, we had switched to using the AMD processor type, since it was no longer essential to keep the intel processor constraint as before. The LOTUS2 cluster has mostly/exclusively AMD processors, unlike the former LOTUS cluster. Previously, if JULES was run on a different processor type than it was compiled on, then there might have been issues with the compiler-optimization flags and the processor’s instruction set. But this issue had been fixed (in the summer of 2023), and now a background compile of JULES on the cylc2 VM will properly run on AMD batch nodes. But this isn;t so important anymore, since the modules being loaded have been updated since we’re now using LOTUS2.
        15. For this u-al752 suite, since there are so many FLUXNET sites and a separate JULES Cylc8 task for each site, it is important to use the Cylc8 GUI instead of the Cylc8 TUI. The Cylc8 GUI requires that JASMIN open a local browser window, which is resource intensive.
          Therefore, in order to use the Cylc8 GUI without overloading the JASMIN connection, if you don’t already have a JASMIN NX session open, then now is the time to open a JASMIN NX session (see step#5 above).
          After you have a JASMIN NX session open, then open an xterminal in the NX session, and type:
          ssh -AX cylc2
          cylc --version
          Since we’re now using Cylc8 instead of Cylc7, this should return with a number starting with an 8, i.e. 8.4.0.
          cd ~/roses/u-al752
          cylc install u-al752
          If all looks well, hit the ‘y’ response. And then, type:
          cylc play u-al752/run1
          cylc gui
          This will pop up a browser window, and you should see your job being submitted in sequence. If for some reason it’s not playing, then press the little play button (which has the shape of a triangle). The fcm_make task should take about 10 minutes to complete. The various JULES tasks can take up to 1-2 hours to complete, and then there is the Python plotting routine, which is called when all the JULES tasks are finished. Wait for all this to finish. You can go away for a while or overnight, if necessary. If it is still not finished, you can left-mouse-click over the various entries and look at the job.err or job.out log files, etc.
          If something goes wrong and you need to make changes to the suite, first make the changes, and then type:
          cylc reinstall u-al752/run1
          cylc reload u-al752/run1
          and then go to your Cylc8 browser window in your NX session and left-mouse-click on the failed task, and retrigger the task.
        16. You might now be able to study (with ncinfo or python, etc.; if you use python, you might need to use sci-ph-01.jasmin or sci-ph-02.jasmin instead of cylc2.jasmin in order to get the proper python libraries working.) the NETCDF output files in where you set them to be in your rose-suite.conf file, e.g. in:
          OUTPUT_FOLDER='/work/scratch-pw2/myusername/fluxnet/run11a/jules_output'
          The plots are where you set them to be in your rose-suite.conf file, e.g. in:
          PLOT_FOLDER='/work/scratch-pw2/myusername/fluxnet/run11a/peg/plots'
          You can view these PDF plots by first typing module load jaspy and then typing display filename.pdf. You can also use ncview to study the NETCDF files as well.
          You can compare your PDF plots to those available at the FLUXNET TRAC site listed above.
          Here are a few examples from the FLUXNET TRAC site; for the most up-to-date ones and for other plotted variables, check at the FLUXNET TRAC site.
          FLUXNET EXAMPLE PLOT: Available Soil Moisture (top layer, JULES model only)
          FLUXNET EXAMPLE PLOT: Latent Heat Flux (LE)
          FLUXNET EXAMPLE PLOT: Sensible Heat Flux (SH)
          FLUXNET EXAMPLE PLOT: Gross Primary Production
          Note: If you get some weird NETCDF/iris error messages in the make_plots task, you might consider changing
          ~/cylc-run/u-al752/run1/bin/make_plots.py so that it uses parallel=False. Currently, the JASMIN version of u-al752 is set to do make_plots in serial mode instead of parallel mode. This can be fixed.
        17. You can also look at the log files in:
          ~/cylc-run/u-al752/run1/log/job/1
          There is one jules log subdirectory for each site, i.e.:
          ~/cylc-run/u-al752/run1/log/job/1/jules_at_neu_presc0
          In that subdirectory, the log files are
          ~/cylc-run/u-al752/run1/log/job/1/jules_at_neu_presc0/01/job.err
          ~/cylc-run/u-al752/run1/log/job/1/jules_at_neu_presc0/01/job.out
          There are also log files for the fcm_make app:
          ~/cylc-run/u-al752/run1/log/job/1/fcm_make/01/job.err
          ~/cylc-run/u-al752/run1/log/job/1/fcm_make/01/job.out
          There are also log files for the make_plots app:
          ~/cylc-run/u-al752/run1/log/job/1/make_plots/01/job.err
          ~/cylc-run/u-al752/run1/log/job/1/make_plots/01/job.out
          You can view these with vi or more. If you see anything in the log files that looks like there was something wrong, then you should investigate.
        18. You might want to transfer some PDF plots (for example) from CEDA JASMIN to your own computer. One simple way to do this is to pull them (from CEDA JASMIN to your own Linux or Macintosh machine),
          by first typing similar commands to these on CEDA JASMIN:
          mkdir ~/run11a_plots
          cp -pr /work/scratch-pw2/myusername/fluxnet/run11a/peg/plots ~/run11a_plots
          We first had to copy the files from the scratch drive to your home directory (or alternatively to a group workspace), since the transferring to external computers doesn’t work from the scratch disk.
          Then type these commands on your own machine:
          mkdir run11a_plots
          scp -pr xfer1.jasmin.ac.uk:run11a_plots run11a_plots
        19. The plotting (as of 24 September 2018) doesn’t work for for GPP for the US_WCr and ZM_Mon sites on CEDA JASMIN. These 2 sites have been disabled automatically. We think that the cause of this problem with these two sites is that a new version of iris for python (2.1) was installed in July, overriding the previous version 1.13. The function in the new version of iris on JASMIN that is called ‘aggregate_by’ doesn’t seem to handle the multi-year gaps of data in GPP for US_WCr or ZM_Mon very well.
        20. You can look at the Python code (as defined in your rose-suite.conf directory). It’s copied in:
          ~/cylc-run/u-al752/run1/bin/fluxnet_evaluation.py
          The JULES source code (FORTRAN) is copied in the subdirectories of:
          ~/cylc-run/u-al752/run1/share/fcm_make/preprocess/
        21. So the next part of the tutorial is to make a modification to your suite and then using fcm to check in your changes to MOSRS. It can be a trivial modification like changing your plotting path in rose-suite.conf .
          But first (since you don’t have permission to commit changes to the original suite u-al752 trunk version), you need to use
          rosie copy u-al752
          to check out a copy of the suite (with a new suite-number) to your account on JASMIN. Remember that both rosie go and rose edit are not available. Note the new suite number in your roses directory.
          Then you can make a change to the plotting path, for example.
          Next, you can do fcm diff -r HEAD in your roses/suite-number directory. It should show the changes you made.
          Finally, you can check-in (a.k.a. ‘commit’) these changes to MOSRS with fcm ci in your roses/suite-number directory. You need to add a comment to the log file before fcm ci will let you proceed. Some general information about version control can be found (for example) at: https://svnbook.red-bean.com/en/1.8/svn.basic.version-control-basics.html .