An HPC cluster consists of hundreds or thousands of compute servers that are networked together. InfiniBand is pervasively used in high-performance computing (HPC) to remove data exchange bottlenecks, delivering very high throughput and very low latency. As HPC becomes more mainstream and embraced by enterprise users, there is a need for assurances that performance is optimized
Users schedule their jobs to run on HPC cluster by submitting them through Spectrum LSF. IBM® Spectrum LSF (formerly IBM® Platform™ LSF®) is a complete workload management solution for demanding HPC environments.
One of the important task before execution of application is preparing jobs inorder to get best performance on the given HPC cluster. Select the most appropriate queue for each job and provide accurate wall-clock times in your job script. This will help us fit your job into the earliest possible run opportunity. Note the system's usable memory and configure your job script to maximize performance. Next, prepare/tune the nodes (or servers) to desired values. This can be done by pre-execution processing. For example , Set the CPU statically to highest frequency ( or any requirement for that matter). After , execution of LSF job , you could set it back to previously set values by post-execution processing.
source |
Configuration to enable pre- and post-execution processing :
The pre- and post-execution processing feature is enabled by defining at least one of the parameters in the list below at the application or queue level, or by using the -E option of the bsub command to specify a pre-execution command. In some situations, specifying a queue-level or application-level pre-execution command can have advantages over requiring users to use bsub -E. For example, license checking can be set up at the queue or application level so that users do not have to enter a pre-execution command every time they submit a job.
The following example illustrates how job-based pre- and post-execution processing works at the queue or application level for setting the environment prior to job execution and for transferring resulting files after the job runs.
Host-based pre- and post-execution processing is different from job-based pre- and post-execution processing in that it is intended for parallel jobs (you can also use this feature for sequential jobs) and is executed on all execution hosts, as opposed to only the first execution host. The purpose of this is to set up the execution hosts(or servers) before all job-based pre-execution and other pre-processing which depend on host-based preparation, and clean up execution hosts after job-based post execution and other post-processing.
There are two ways to enable host-based pre- and post-execution processing for a job:
Lets take the example of queue level configurations with HOST_PRE_EXEC/HOST_POST_EXEC :The following example illustrates how job-based pre- and post-execution processing works at the queue or application level for setting the environment prior to job execution and for transferring resulting files after the job runs.
source |
There are two ways to enable host-based pre- and post-execution processing for a job:
- Configure HOST_PRE_EXEC and HOST_POST_EXEC in lsb.queues.
- Configure HOST_PRE_EXEC and HOST_POST_EXEC in lsb.applications.
LSF queue (QUEUE_NAM=Queue_pre_post) set with HOST_PRE_EXEC and HOST_POST_EXEC .
where HOST_PRE_EXEC points to the " pre_setup_perf.sh" .Similarly, HOST_POST_EXEC points to "post_setup_perf.sh" and set previous values back on the server after Performance testing.
Modify the configuration file "lsb.queues" . For example
------------------------------------------------------------------------------
Begin Queue
QUEUE_NAME = Queue_pre_post
PRIORITY = 40
INTERACTIVE = NO
FAIRSHARE = USER_SHARES[[default,1]]
HOSTS = server1 server2 # hosts on which jobs in this queue can run
EXCLUSIVE = Y
HOST_PRE_EXEC = /home/sachinpb/pre_setup_perf.sh >> /tmp/pre.out
HOST_POST_EXEC = /home/sachinpb/post_setup_perf.sh >> /tmp/post.out
DESCRIPTION = For P8 performance jobs, running only if hosts are lightly loaded.
End Queue
----------------------------------------------------------------------------------
After modification, please run the command.
badmin reconfigure
-------------------------------------------------------------------------------
Check the queue status:
[sachinpb@server1 ~]$ bqueues -l Queue_pre_post
QUEUE: Queue_pre_post
-- For P8 performance jobs, running only if hosts are lightly loaded.
PARAMETERS/STATISTICS
PRIO NICE STATUS MAX JL/U JL/P JL/H NJOBS PEND RUN SSUSP USUSP RSV PJOBS
40 0 Open:Active - - - - 0 0 0 0 0 0 0
Interval for a host to accept two jobs is 0 seconds
SCHEDULING PARAMETERS
r15s r1m r15m ut pg io ls it tmp swp mem
loadSched - - - - - - - - - - -
loadStop - - - - - - - - - - -
SCHEDULING POLICIES: FAIRSHARE EXCLUSIVE NO_INTERACTIVE
USER_SHARES: [default, 1]
SHARE_INFO_FOR: Queue_pre_post/
USER/GROUP SHARES PRIORITY STARTED RESERVED CPU_TIME RUN_TIME ADJUST
sachinpb 1 0.326 0 0 350.7 0 0.000
USERS: all
HOSTS: server1 server2
HOST_PRE_EXEC: /home/sachinpb/sachin/pre_setup_perf.sh >> /tmp/pre.out
HOST_POST_EXEC: /home/sachinpb/sachin/post_setup_perf.sh >> /tmp/post.out
[sachinpb@server1 ~]$
---------------------------------------------------------------------------------
HOST_PRE_EXEC=command (in lsb.queues):
- Enables host-based pre-execution processing at the queue level.
- The pre-execution command runs on all execution hosts before the job starts.
- If the HOST_PRE_EXEC command exits with a non-zero exit code, LSF requeues the job to the front of the queue.
- The HOST_PRE_EXEC command uses the same environment variable values as the job.
- The HOST_PRE_EXEC command can only be used for host-based pre- and post-execution processing.
HOST_POST_EXEC=command (in lsb.queues):
- Enables host-based post-execution processing at the queue level.
- The HOST_POST_EXEC command uses the same environment variable values as the job.
- The post-execution command for the queue remains associated with the job. The original post-execution command runs even if the job is requeued or if the post-execution command for the queue is changed after job submission.
- Before the post-execution command runs, LSB_JOBEXIT_STAT is set to the exit status of the job. The success or failure of the post-execution command has no effect on LSB_JOBEXIT_STAT.
- The post-execution command runs after the job finishes, even if the job fails.
- Specify the environment variable $USER_POSTEXEC to allow UNIX users to define their own post-execution commands.
- The HOST_POST_EXEC command can only be used for host-based pre- and post-execution processing.
-------------------------------------------------------
Now submit the LSF job as shown :
[sachinpb@server1 sachin]$ bsub -q Queue_pre_post -n 8 -R "span[ptile=4]" < myjob.script
bsub> Job <19940> is submitted to queue <Queue_pre_post>.
[sachinpb@server1 ]$ bjobs
JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME
19940 sachinpb RUN Queue_pre_post server1 server2 myjob.script Jun 28 05:16
server2
server2
server2
server1
server1
server1
server1
[sachinpb@server1 ]$
---------------------------------------------------------------------------
Logs from each server available at " $SACHIN_HOME/logs" for both pre- and post-execute processing as shown below. Check for log files after completion of LSF jobID 19940:
There should be 4 log files -:
- There are 2 pre-check-logs executed on server1 and server2
- Similarly, there are 2 post-check-logs executed on server1 and server2
Example:
[sachinpb@server1 logs]$ ls -alsrt
8 -rw-rw-r-- 1 sachinpb sachinpb 4657 Jun 28 05:16 preScript_28-Jun-05_16_server2.out
8 -rw-r--r-- 1 sachinpb sachinpb 4657 Jun 28 05:16 preScript_28-Jun-05_16_server1.out
8 -rw-r--r-- 1 sachinpb sachinpb 4653 Jun 28 05:17 postScript_28-Jun-05_17_server1.out
8 -rw-rw-r-- 1 sachinpb sachinpb 4653 Jun 28 05:17 postScript_28-Jun-05_17_server2.out
[sachinpb@server1 logs]$
---------------------------------------------------------------------------
pre-exec script:
[sachinpb@server1]$ cat pre_setup_perf.sh
#!/bin/bash
echo "Start Pre-execution script on $(hostname)"
HOST=`hostname`
DATE=$(date +%d-%b-%H_%M)
sudo /home/sachinpb/sachin/tune_this_server.sh pre_check | tee $SACHIN_HOME/logs/preScript_${DATE}_${HOST}.out
echo "End of Pre-execution script on $(hostname)"
[sachinpb@server1 ]$
----------------------------------------------------------------------------
post-exec script:
[sachinpb@server1 ]$ cat post_setup_perf.sh
#!/bin/bash
echo "Start Post-execution script on $(hostname)"
HOST=`hostname`
DATE=$(date +%d-%b-%H_%M)
sudo /home/sachinpb/sachin/tune_this_server.sh post_check | tee /$SACHIN_HOME/logs/postScript_${DATE}_${HOST}.out
echo "End of Post-execution script on $(hostname)"
[sachinpb@server1 ]$
-----------------------------------------------------------------------------
NOTE: Similarly, you could do this in PRE_EXEC/POST_EXEC=command (in lsb.applications, lsb.queues) & HOST_PRE_EXEC/HOST-POST-EXEC=command (in lsb.applications) as per the application requirements.
I hope this blog helped in understanding how to configure pre- and post processing feature of Spectrum LSF.
No comments:
Post a Comment