diff --git a/software/workflows/rp.rst b/software/workflows/rp.rst index bea51c29..c75bd772 100644 --- a/software/workflows/rp.rst +++ b/software/workflows/rp.rst @@ -10,29 +10,25 @@ Overview Scientific productivity can be enhanced through workflow management tools, relieving large High Performance Computing (HPC) system users from the tedious tasks of scheduling and designing the complex -computational execution of scientific applications. This report presents a study on the usage of ensemble -workflow tools to accelerate science using the Frontier supercomputing systems. This technical report -aims to connect science domain simulations using Oak Ridge Leadership Computing Facility (OLCF) supercomputing -platforms with ensemble workflow methods in order to accelerate HPC-enabled discovery -and boost scientific impact. We present the coupling, porting and installation of Radical-Cybertools on -two applications: Chroma and NAMD. https://www.osti.gov/biblio/2575304 +computational execution of scientific applications. This user documentation page presents several examples on +the usage of ensemble workflow tools to accelerate science using the Frontier supercomputing system. +This page presents the coupling, porting and installation of Radical-Cybertools on two applications: Chroma +and NAMD. The content of this page is adapated from a technical report where additional information and detail +can be found: https://www.osti.gov/biblio/2575304 Introduction ============= -This technical guide provides guidance for OLCF users implementing RP workflow tool on Frontier. As -the guidelines and software matures and evolves, our team will deliver biannual updates to the policies and -best practices. The document offers comprehensive technical and scientific guidelines for adopting and -configuring RP on the Frontier supercomputer, complementing RP's platform-specific documentation. We -include essential information on data management strategies and OLCF ensemble policies, while highlighting -our solutions and multi-track capabilities for installation and usability. +This guide provides summary guidance efor OLCF users implementing the RADICAL-Pilot (RP) workflow tool on Frontier. +The source `techincal report `__ offers additional comprehensive technical and scientific +guidelines for adopting and configuring RP on the Frontier supercomputer, complementing RP's platform-specific documentation. +The report includes essential information on data management strategies and OLCF ensemble policies, while highlighting +solutions and multi-track capabilities for installation and usability. RP is an ensemble tool that leverages Python-based scripts for efficient job launching, scheduling, error management, and resource allocation. Its application-agnostic design provides customizable workflows for domain-specific requirements. RP's multi-level metadata management system organizes execution data -in structured directories. While workflow tools often struggle to adapt to specific production systems and -facility policies this technical paper addresses platform heterogeneity by documenting our experience integrating, -porting, and running RP on Frontier. +in structured directories. RP demonstrates exceptional error reporting capabilities, enabling rapid job relaunch and preventing execution hangs during ensemble operations. Its efficient restart options maintain minimal overhead across @@ -40,47 +36,26 @@ our flagship applications detailed in this document. Previous publications on OL established portability as a versatile ensemble tool Titov et al. 2024; Titov et al. 2022; Merzky et al. 2021; Merzky, Turilli, and Jha 2022; Turilli et al. 2021. -INSTALLATION OF THE RADICAL-PILOT TOOL +Installation of the RADICAL-Pilot Tool ====================================== -Workflow management is a strategic approach that assists organizing and optimizing model runs on large -heterogeneous High Performance Computing (HPC) systems. At OLCF we cater to these workflow needs -and feature demands by providing complex workflow tools with state-of-the-art management capabilities. -RADICAL-Pilot has showcased the ability to simplify the computational runs on Frontier and is widely -used across platforms and scientific groups. The source materials from the developers reside here: -https://radicalpilot.readthedocs.io/en/stable/supported/frontier.html - -A user's guide is provided to encapsulate directions and practices on installing the RADICAL-Cybertools -stack (RCT) on Frontier with the pip install command. OLCF supports Python virtual environment usageincluded -with instructions for the execution environment- by creating a virtual environment with venv: +Frontier supports Python virtual environment usage: .. code-block:: console $ export PYTHONNOUSERSITE=True - $ module load cray−python/3.11.7 - $ python3 −m venv ve.rp + $ module load cray-python + $ python3 -m venv ve.rp $ source ve.rp/bin/activate -Subsequently, install RP in the activated corresponding virtual environment: +Subsequently, install RP in the newly created and activated virtual environment: .. code-block:: console $ pip install radical.pilot -An alternate way to install RP manually is the following user-based installation method for Frontier: - -.. code-block:: console - - $ module load cray−python/3.11.7 - $ python −m venv ve.rp - $ source ve.rp/bin/activate - $ pip install −U pip - -Use the pip install –user pip command if any errors appear. Passing the –user option to python --m pip install will install a package just for the current user, rather than for all users of the system. - The latest versions of RCT tools are within development branches, and include the latest fixes, updates and -new features. These versions are considered unstable and they are optional for users. +new features. These versions are considered unstable and they are optional for users, but could be installed if desired: .. code-block:: console @@ -88,15 +63,17 @@ new features. These versions are considered unstable and they are optional for u $ pip install git+https://github.com/radical-cybertools/radical.gtod.git@devel $ pip install git+https://github.com/radical-cybertools/radical.pilot.git@devel -Run the command ``radical-stack`` to verify the success of the installation. -RP application (i.e., Python application using RP as a pilot-based runtime system) can be launched as -a regular Python script: ``python rp_app.py`` (or ``./rp_app.py`` if it includes a corresponding shebang, -e.g., #!/usr/bin/env python). To keep it running in the background the following command is recommended. +Run the command ``radical-stack`` to verify the success of the installation. This should print the corresponding Python +and RP versions that have been installed. + +Running Overview +^^^^^^^^^^^^^^^^ + +A RP application (i.e., Python application using RP as a pilot-based runtime system) can be launched as +a regular Python script: ``python rp_app.py``. To keep it running in the background the following command is recommended. ``nohup python rp_app.py > OUTPUT 2>&1