Difference between revisions of "Slurm Services"

From wiki.hpc.mk
 
Line 4: Line 4:
|
|
#[[Slurm_Services#Services_Documenting|Documentation for creating SLURM Workload Manager]]
#[[Slurm_Services#Services_Documenting|Documentation for creating SLURM Workload Manager]]
#[[Slurm_Services#Slurm_Example|Example by executing a simple script]]
#[[Slurm_Services#Services_Installation|Required installation and predefined environment ]]
|}
|}


__NOTOC__
__NOTOC__




Line 22: Line 21:
* Database integration - where all user parameters and settings are stored,  
* Database integration - where all user parameters and settings are stored,  
* Use of graphic resources to perform tasks - A large number of optional possibilities for additional use of graphic resources given to a specific task / tasks in order to better perform advanced algorithms in the field of machine learning.
* Use of graphic resources to perform tasks - A large number of optional possibilities for additional use of graphic resources given to a specific task / tasks in order to better perform advanced algorithms in the field of machine learning.
'''<h1 id="Services_Installation">Required installation and predefined environment</h1>'''
First during the initial installation, a Linux Ubuntu 20.04 environment was used, with the necessary installation packages. The installation on the server side includes the installation of:
* Slurmctld - The main process through which the execution and assignment of tasks to the nodes used takes place. The same is used for monitoring the active nodes (machines) in the cluster,
* Slurmdbd - Process through which the registration of user data, their rules and policies as well as the allowed execution times takes place,
* Slurmd- Process through which the other children are controlled - Slurm sub-processes and through which the further communication with the other elements of the system takes place.

Latest revision as of 10:12, 31 August 2021

Contents
  1. Documentation for creating SLURM Workload Manager
  2. Required installation and predefined environment



Documentation for creating SLURM Workload Manager

Creating a scheduler in a heterogeneous cluster environment covers the following: First it is necessary to define which will be the main (master) node that will properly forward the user-defined scripts in the form of jobs to the defined machines in the cluster. There are several different platforms for creating a cluster environment, in this document SLURM Workload Manager will be discussed.

Advantages of using Slurm as a task environment are:

  • Support for high cluster systems and multiprocessor tasks - The SLURM environment enables the start-up, execution and monitoring of parallel tasks implemented via Message Passing Interface (MPI), on part of the allocated nodes as well as allowing efficient use of resources (nodes) according to a specific policy users,
  • Task profiling - Periodically review each resource assigned to a specific task (CPU runtime, RAM, power consumption, network resources, and disk space usage),
  • Support for MapReduce + algorithm,
  • Support for creating a sequence of tasks, ie one task can be divided into several sub-tasks that are performed in parallel for more efficient use of the given resources,
  • Database integration - where all user parameters and settings are stored,
  • Use of graphic resources to perform tasks - A large number of optional possibilities for additional use of graphic resources given to a specific task / tasks in order to better perform advanced algorithms in the field of machine learning.


Required installation and predefined environment

First during the initial installation, a Linux Ubuntu 20.04 environment was used, with the necessary installation packages. The installation on the server side includes the installation of:

  • Slurmctld - The main process through which the execution and assignment of tasks to the nodes used takes place. The same is used for monitoring the active nodes (machines) in the cluster,
  • Slurmdbd - Process through which the registration of user data, their rules and policies as well as the allowed execution times takes place,
  • Slurmd- Process through which the other children are controlled - Slurm sub-processes and through which the further communication with the other elements of the system takes place.