In this model, the cluster that is starving for resources sends jobs over to thecluster that has resources to spare. Job status, pending reason, and resource usageare returned to the submission cluster. When the job is done, the exit code returns to the submission cluster.
By
default, clusters do not share resources, even if MultiCluster has
been installed. To enable job forwarding, enable MultiCluster queues
in both the submission and execution clusters.
How it works :
With this model, scheduling of MultiCluster jobs is a process with two schedulingphases:
- the submission cluster selects a suitable remote receive-jobs queue, and forwards the job to it
- the execution cluster selects a suitable host and dispatches the job to it.If a suitable host is not found immediately, the job remains pending in the execution cluster, and is evaluated again the next scheduling cycle.This method automatically favors local hosts; a MultiCluster send-jobs queue always attempts to find a suitable local host before considering a receive-jobs queue in another cluster.
- Send-jobs queue
- A send-jobs queue can forward jobs to a specified remote queue. By default, LSF attempts to run jobs in the local cluster first. LSF only attempts to place a job remotely if it cannot place the job locally.
- Receive-jobs queue
- A receive-jobs queue accepts jobs from queues in a specified remote cluster. Although send-jobs queues only forward jobs to specific queues in the remote cluster, receive-jobs queues can accept work from any and all queues in the remote cluster.
- Multiple queue pairs
- You can configure multiple send-jobs and receive-jobs queues in one cluster.
- A queue can forward jobs to as many queues in as many clusters as you want, and can also receive jobs from as many other clusters as you want.
- A receive-jobs queue can also borrow resources using the resource leasing method, but a send-jobs queue using the job forwarding method cannot also share resources using the resource leasing method.
In LSF multicluster capability job forwarding mode, filters out-put to display information on forwarded jobs, including the forwarded time and the name of the cluster to which the job was forwarded. -fwd can be used with other options to further filter the results.
For example, bjobs -fwd -r displays only forwarded running jobs.In LSF multicluster capability job forwarding mode, you can use the local job ID and cluster name to retrieve the job details from the remote cluster. The query syntax is:
bjobs submission_job_id@submission_cluster_name
Additional Output fields for bjobs
+--------------------+-----+----------+--------------+--------+ | Field name | Wid | Aliases | Unit | Catego | | | th | | | ry | +--------------------+-----+----------+--------------+--------+
+--------------------+-----+----------+--------------+--------+ | forward_cluster | 15 | fwd_clus | | MultiC | | | | ter | | luster | |--------------------|-----|----------|--------------| | | forward_time | 15 | fwd_time | time stamp | | |--------------------|-----|----------|--------------| | | srcjobid | 8 | | | | |--------------------|-----|----------|--------------| | | dstjobid | 8 | | | | |--------------------|-----|----------|--------------| | | source_cluster | 15 | srcluste | | | | | | r | | | +--------------------+-----+----------+--------------+--------+
Cluster Configurations:
List the clusters with basic information :
[sachinpb@powerNode06 ~]$ lsclusters
CLUSTER_NAME STATUS MASTER_HOST ADMIN HOSTS SERVERS
ppc_cluster1 ok powerNode06 lsfadmin 5 5
x86-64_cluster ok RemoteClusterHost07 lsfadmin 8 8
[sachinpb@powerNode06 ~]$
List hosts on each cluster
[sachinpb@powerNode06 ~]$ bhosts -w
HOST_NAME STATUS JL/U MAX NJOBS RUN SSUSP USUSP RSV
powerNode01 ok - 80 0 0 0 0 0
powerNode02 ok - 80 0 0 0 0 0
powerNode03 ok - 80 0 0 0 0 0
powerNode04 ok - 80 0 0 0 0 0
powerNode05 ok - 80 0 0 0 0 0
[sachinpb@powerNode06 ~]$
-------------------------- Other cluster ----------------------
[sachinpb@RemoteClusterHost7 ~]$ bhosts -w
HOST_NAME STATUS JL/U MAX NJOBS RUN SSUSP USUSP RSV
RemoteClusterHost01 ok - 40 0 0 0 0 0
RemoteClusterHost02 ok - 40 0 0 0 0 0
RemoteClusterHost03 ok - 40 0 0 0 0 0
RemoteClusterHost04 ok - 40 0 0 0 0 0
RemoteClusterHost05 ok - 40 0 0 0 0 0
RemoteClusterHost06 ok - 40 0 0 0 0 0
RemoteClusterHost07 ok - 40 0 0 0 0 0
RemoteClusterHost08 ok - 40 0 0 0 0 0
[sachinpb@RemoteClusterHost7 ~]$
Displays information about IBM Spectrum LSF multicluster capability
[sachinpb@powerNode06 ~]$ bclusters -w
LOCAL_QUEUE JOB_FLOW REMOTE CLUSTER STATUS
send_queue send receive_queue x86-64_cluster2 ok
x86_perf_q send x86_perf_q x86-64_cluster2 ok
x86_ibmgpu_q send x86_ibmgpu_q x86-64_cluster2 ok
[Resource Lease Information ]
No resources have been exported or borrowed
[sachinpb@powerNode06 ~]$
Configuration file :
To make a queue that only runs jobs in remote clusters, take the following steps:
Procedure
Update $LSF_HOME/conf/lsbatch/ppc_cluster1/configdir/lsb.queues
--------------------------------------------------
Begin Queue
QUEUE_NAME = send_queue
SNDJOBS_TO = receive_queue@x86-64_cluster2
HOSTS = none
PRIORITY = 30
NICE = 20
End Queue
--- on Other cluster--- Begin Queue
QUEUE_NAME = receive_queue
RCVJOBS_FROM = send_queue@ppc_cluster1
HOSTS = RemoteClusterHost01 RemoteClusterHost02 RemoteClusterHost03 RemoteClusterHost04 RemoteClusterHost05 RemoteClusterHost06 RemoteClusterHost07 RemoteClusterHost08
PRIORITY = 55
NICE = 10
EXCLUSIVE = Y
DESCRIPTION = Multicluster Queue
End Queue ---------------------------------------------------------------------------------
Begin Queue
QUEUE_NAME = x86_gpu_q
SNDJOBS_TO = x86_gpu_q@x86-64_cluster2
PRIORITY = 90
INTERACTIVE = NO
FAIRSHARE = USER_SHARES[[default,1]]
HOSTS = none
EXCLUSIVE = Y
MAX_RSCHED_TIME = infinit
DESCRIPTION = For x86jobs, Multicluster Queue - Job forward Mode
End Queue
----on Other cluster--
Begin Queue
QUEUE_NAME = x86_gpu_q
RCVJOBS_FROM = x86_gpu_q@ppc_cluster1
HOSTS = RemoteClusterHost01 RemoteClusterHost02 RemoteClusterHost03 RemoteClusterHost04 RemoteClusterHost05 RemoteClusterHost06 RemoteClusterHost07 RemoteClusterHost08
PRIORITY = 55
NICE = 10
EXCLUSIVE = Y
DESCRIPTION = Multicluster Queue - Job forward Mode
End Queue
---------------------------------------------------------------------------------------
Begin Queue
QUEUE_NAME = x86_perf_q
SNDJOBS_TO = x86_perf_q@x86-64_cluster2
PRIORITY = 40
INTERACTIVE = NO
FAIRSHARE = USER_SHARES[[default,1]]
HOSTS = none
EXCLUSIVE = Y
MAX_RSCHED_TIME = infinit
DESCRIPTION = For P8 performance jobs, running only if hosts are lightly loaded.
End Queue
---on Other cluster---
Begin Queue
QUEUE_NAME = x86_perf_q
RCVJOBS_FROM = x86_perf_q@ppc_cluster1
HOSTS = RemoteClusterHost01 RemoteClusterHost02 RemoteClusterHost03 RemoteClusterHost04 RemoteClusterHost05 RemoteClusterHost06 RemoteClusterHost07 RemoteClusterHost08
PRIORITY = 55
NICE = 10
DESCRIPTION = Multicluster Queue - Job forward Mode
End Queue --------------------------------------------------------------
NOTE:
For LSF multicluster forward mode, jobs will be recalled to the submission cluster when job stays pending state in the execution cluster reaching MAX_RSCHED_TIME. Set MAX_RSCHED_TIME=infinit to maintain FCFS job order of MultiCluster jobs in the execution queue. Otherwise, jobs that time out are rescheduled to the same execution queue, but they lose priority and position because they are treated as a new job submission.
Submit job1 - job1.script
[sachinpb@powerNode06 ~]$ bsub -n 8 -R "span[ptile=4]" -q x86_ibmgpu_q -R "select[type==X86_64]" job1.script
Job <25378> is submitted to queue <x86_ibmgpu_q>.
[sachinpb@powerNode06 ~]$
Submit job2 - job2.script
[sachinpb@powerNode06 ~]$ bsub -n 4 -q x86_ibmgpu_q -R "select[type==X86_64]" job2.script
Job <25383> is submitted to queue <x86_ibmgpu_q>.
[sachinpb@powerNode06 ~]$
List all forwarded jobs with -fwd option with bjobs
[sachinpb@powerNode06 ~]$ bjobs -fwd
JOBID USER STAT QUEUE EXEC_HOST JOB_NAME CLUSTER FORWARD_TIME
25378 sachinpb RUN x86_ibmgpu_q RemoteClusterHost02@x86-64_cluster2 job1.script x86-64_cluster2 Aug 6 10:25
RemoteClusterHost02@x86-64_cluster2
RemoteClusterHost02@x86-64_cluster2
RemoteClusterHost02@x86-64_cluster2
RemoteClusterHost08@x86-64_cluster2
RemoteClusterHost08@x86-64_cluster2
RemoteClusterHost08@x86-64_cluster2
RemoteClusterHost08@x86-64_cluster2
25383 sachinpb RUN x86_ibmgpu_q RemoteClusterHost04@x86-64_cluster2 job2.script x86-64_cluster2 Aug 6 10:39
RemoteClusterHost04@x86-64_cluster2
RemoteClusterHost04@x86-64_cluster2
RemoteClusterHost04@x86-64_cluster2
RemoteClusterHost04@x86-64_cluster2
[sachinpb@powerNode06 ~]$
--------------------
Observe the Job description and details for forwarded job:
[sachinpb@powerNode06 ~]$ bjobs -l 25378
Job <25378>, Job Name <sachinpb-TEST_ibm-smpi_1127>, User <sachinpb>, Project <defa
ult>, Status <RUN>, Queue <x86_ibmgpu_q>
Tue Aug 6 10:25:51: Submitted from host <powerNode06>, Exclusive Execution, 8 Task(s), Requ
ested Resources < select[type == X86_64] span[ptile=4]>;
Tue Aug 6 10:25:51: Job <25378> forwarded to cluster <x86-64_cluster2> as Job<24347>;
Tue Aug 6 10:25:51: Started 8 Task(s) on Host(s) <RemoteClusterHost2@x86-64_cluster2> <i
bmgpu02@x86-64_cluster2> <RemoteClusterHost2@x86-64_cluster2> <ibmgp
u02@x86-64_cluster2> <RemoteClusterHost8@x86-64_cluster2> <RemoteClusterHost8@
x86-64_cluster2> <RemoteClusterHost8@x86-64_cluster2> <RemoteClusterHost8@x86-
64_cluster2>, Allocated 32 Slot(s) on Host(s) <RemoteClusterHost2@x8
6-64_cluster2> <RemoteClusterHost2@x86-64_cluster2> <RemoteClusterHost2@x86-64
_cluster2> <RemoteClusterHost2@x86-64_cluster2> <RemoteClusterHost2@x86-64_clu
ster2> <RemoteClusterHost2@x86-64_cluster2> <RemoteClusterHost2@x86-64_cluster
2> <RemoteClusterHost2@x86-64_cluster2> <RemoteClusterHost2@x86-64_cluster2> <
RemoteClusterHost2@x86-64_cluster2> <RemoteClusterHost2@x86-64_cluster2> <ibmg
pu02@x86-64_cluster2> <RemoteClusterHost2@x86-64_cluster2> <RemoteClusterHost2
@x86-64_cluster2> <RemoteClusterHost2@x86-64_cluster2> <RemoteClusterHost2@x86
-64_cluster2> <RemoteClusterHost8@x86-64_cluster2> <RemoteClusterHost8@x86-64_
cluster2> <RemoteClusterHost8@x86-64_cluster2> <RemoteClusterHost8@x86-64_clus
ter2> <RemoteClusterHost8@x86-64_cluster2> <RemoteClusterHost8@x86-64_cluster2
> <RemoteClusterHost8@x86-64_cluster2> <RemoteClusterHost8@x86-64_cluster2> <i
bmgpu08@x86-64_cluster2> <RemoteClusterHost8@x86-64_cluster2> <ibmgp
u08@x86-64_cluster2> <RemoteClusterHost8@x86-64_cluster2> <RemoteClusterHost8@
x86-64_cluster2> <RemoteClusterHost8@x86-64_cluster2> <RemoteClusterHost8@x86-
64_cluster2> <RemoteClusterHost8@x86-64_cluster2>, Execution Home </
home1/sachinpb/>, Execution CWD </tmp>;
Tue Aug 6 11:41:25: Resource usage collected.
The CPU time used is 28681 seconds.
MEM: 82 Mbytes; SWAP: 0 Mbytes; NTHREAD: 22
HOST: RemoteClusterHost2
MEM: 82 Mbytes; SWAP: 0 Mbytes; CPU_TIME: 15662 seconds
PGIDs: 7309 29438 29439 29440 29441 29635 29636 29637 296
38
PIDs: 7309 7322 7324 7377 29261 29418 29438 29439 29440 2
9441 29616 29635 29636 29637 29638
HOST: RemoteClusterHost8
MEM: 0 Mbytes; SWAP: 0 Mbytes; CPU_TIME: 13019 seconds
PGIDs: -
PIDs: -
RUNLIMIT
480.0 min
MEMORY USAGE:
MAX MEM: 135.4 Gbytes; AVG MEM: 11 Gbytes
SCHEDULING PARAMETERS:
r15s r1m r15m ut pg io ls it tmp swp mem
loadSched - - - - - - - - - - -
loadStop - - - - - - - - - - -
RESOURCE REQUIREMENT DETAILS:
Combined: select[type == X86_64 ] order[r15s:pg] span[ptile=4]
Effective: select[type == X86_64 ] order[r15s:pg] span[ptile=4]
[sachinpb@powerNode06 ~]$
--------------------
Job description and details on Remote cluster for same forwarded Job with different JobID :
-bash-4.2$ bjobs -l 24347
Job <24347>, Job Name <sachinpb-TEST_ibm-smpi_1127>, User <sachinpb>, Project <defa
ult>, Status <RUN>, Queue <x86_ibmgpu_q>, Command <sh /nfs
_smpi_ci/ibm-tests/smpi-ci/bin/smpi_test.sh 1127 pr x86_64
ibm-smpi " ">
Tue Aug 6 10:30:55: Submitted from host <c712f6n06@ppc_cluster1:25378>,Executi
on, 8 Task(s), Requested Resources < select[type == X86_64] s
pan[ptile=4]>;
Tue Aug 6 10:30:55: Job <25378> of cluster <ppc_cluster1> accepted as Job <24347>;
Tue Aug 6 10:30:55: Started 8 Task(s) on Host(s) <RemoteClusterHost2> <RemoteClusterHost2> <ibmgpu
02> <RemoteClusterHost2> <RemoteClusterHost8> <RemoteClusterHost8> <RemoteClusterHost8> <RemoteClusterHost8>
, Allocated 32 Slot(s) on Host(s) <RemoteClusterHost2> <RemoteClusterHost2> <i
bmgpu02> <RemoteClusterHost2> <RemoteClusterHost2> <RemoteClusterHost2> <RemoteClusterHost2> <ibmg
pu02> <RemoteClusterHost2> <RemoteClusterHost2> <RemoteClusterHost2> <RemoteClusterHost2> <RemoteClusterHost
2> <RemoteClusterHost2> <RemoteClusterHost2> <RemoteClusterHost2> <RemoteClusterHost8> <RemoteClusterHost8>
<RemoteClusterHost8> <RemoteClusterHost8> <RemoteClusterHost8> <RemoteClusterHost8> <RemoteClusterHost8> <ib
mgpu08> <RemoteClusterHost8> <RemoteClusterHost8> <RemoteClusterHost8> <RemoteClusterHost8> <ibmgp
u08> <RemoteClusterHost8> <RemoteClusterHost8> <RemoteClusterHost8>, Execution Home </ho
me1/sachinpb/>, Execution CWD </tmp>;
Tue Aug 6 12:13:45: Resource usage collected.
The CPU time used is 41628 seconds.
MEM: 88 Mbytes; SWAP: 0 Mbytes; NTHREAD: 13
HOST: RemoteClusterHost2
MEM: 88 Mbytes; SWAP: 0 Mbytes; CPU_TIME: 22646 seconds
PGIDs: 7309 11162 11163 11164 11165 11719 11720 11721
PIDs: 7309 7322 7324 7377 11144 11162 11163 11164 11165 1
1379 11541 11700 11719 11720 11721
HOST: RemoteClusterHost8
MEM: 0 Mbytes; SWAP: 0 Mbytes; CPU_TIME: 18982 seconds
PGIDs: -
PIDs: -
RUNLIMIT
480.0 min
MEMORY USAGE:
MAX MEM: 135.4 Gbytes; AVG MEM: 9.2 Gbytes
SCHEDULING PARAMETERS:
r15s r1m r15m ut pg io ls it tmp swp mem
loadSched - - - - - - - - - - -
loadStop - - - - - - - - - - -
RESOURCE REQUIREMENT DETAILS:
Combined: select[type == X86_64 ] order[r15s:pg] span[ptile=4]
Effective: select[type == X86_64 ] order[r15s:pg] span[ptile=4]
-bash-4.2$
---------------------------------------------------------------------------------
You could also check details on executed Jobs using "bhist command"
@Submission cluster:
[sachinpb@powerNode06 configdir]$ bhist -l 25378
Job <25378>, Job Name <sachinpb-TEST_ibm-smpi_1127>, User <sachinpb>, Project <defa
ult>, Command <sh /nfs_smpi_ci/ibm-tests/smpi-ci/bin/smpi_
test.sh 1127 pr x86_64 ibm-smpi " ">
Tue Aug 6 10:25:51: Submitted from host <powerNode>, to Queue <x86_ibmgpu_q>,
ution, 8 Task(s), Requested Resources < select[type == x86-64
] span[ptile=4]>;
Tue Aug 6 10:25:51: Forwarded job to cluster x86-64_cluster2;
Tue Aug 6 10:25:51: Job 25378 forwarded to cluster x86-64_cluster2 as remote j
ob 24347;
Tue Aug 6 10:25:51: Dispatched 8 Task(s) on Host(s) <RemoteClusterHost2@x86-64_cluster2>
<RemoteClusterHost2@x86-64_cluster2> <RemoteClusterHost2@x86-64_cluster2> <ibm
gpu02@x86-64_cluster2> <RemoteClusterHost8@x86-64_cluster2> <RemoteClusterHost
8@x86-64_cluster2> <RemoteClusterHost8@x86-64_cluster2> <RemoteClusterHost8@x8
6-64_cluster2>, Allocated 32 Slot(s) on Host(s) <RemoteClusterHost2@
x86-64_cluster2> <RemoteClusterHost2@x86-64_cluster2> <RemoteClusterHost2@x86-
64_cluster2> <RemoteClusterHost2@x86-64_cluster2> <RemoteClusterHost2@x86-64_c
luster2> <RemoteClusterHost2@x86-64_cluster2> <RemoteClusterHost2@x86-64_clust
er2> <RemoteClusterHost2@x86-64_cluster2> <RemoteClusterHost2@x86-64_cluster2>
<RemoteClusterHost2@x86-64_cluster2> <RemoteClusterHost2@x86-64_cluster2> <ibm
gpu02@x86-64_cluster2> <RemoteClusterHost2@x86-64_cluster2> <RemoteClusterHost
2@x86-64_cluster2> <RemoteClusterHost2@x86-64_cluster2> <RemoteClusterHost2@x8
6-64_cluster2> <RemoteClusterHost8@x86-64_cluster2> <RemoteClusterHost8@x86-64
_cluster2> <RemoteClusterHost8@x86-64_cluster2> <RemoteClusterHost8@x86-64_clu
ster2> <RemoteClusterHost8@x86-64_cluster2> <RemoteClusterHost8@x86-64_cluster
2> <RemoteClusterHost8@x86-64_cluster2> <RemoteClusterHost8@x86-64_cluster2> <
RemoteClusterHost8@x86-64_cluster2> <RemoteClusterHost8@x86-64_cluster2> <ibmg
pu08@x86-64_cluster2> <RemoteClusterHost8@x86-64_cluster2> <RemoteClusterHost8
@x86-64_cluster2> <RemoteClusterHost8@x86-64_cluster2> <RemoteClusterHost8@x86
-64_cluster2> <RemoteClusterHost8@x86-64_cluster2>, Effective RES_RE
Q <select[type == any ] order[r15s:pg] span[ptile=4] >;
Tue Aug 6 10:25:51: Starting (Pid 7309);
Tue Aug 6 10:25:51: Running with execution home </home1/sachinpb/>, Execution CW
D </tmp>, Execution Pid <7309>;
Tue Aug 6 12:17:50: Done successfully. The CPU time used is 46995.0 seconds;
HOST: RemoteClusterHost2; CPU_TIME: 23908 seconds
HOST: RemoteClusterHost8; CPU_TIME: 23087 seconds;
RUNLIMIT
480.0 min of powerNode
MEMORY USAGE:
MAX MEM: 135.4 Gbytes; AVG MEM: 8.5 Gbytes
Summary of time in seconds spent in various states by Tue Aug 6 12:17:50
PEND PSUSP RUN USUSP SSUSP UNKWN TOTAL
0 0 6719 0 0 0 6719
-----------------------
@Execution Node:
-bash-4.2$ bhist -l 24347
Job <24347>, Job Name <sachinpb-TEST_ibm-smpi_1127>, User <sachinpb>, Project <defa
ult>, Command <sh /nfs_smpi_ci/ibm-tests/smpi-ci/bin/smpi_
test.sh 1127 pr x86_64 ibm-smpi " ">
Tue Aug 6 10:30:55: Submitted from host <powerNode>, to Queue <x86_ibmgpu_q>,
r-ibm-smpi-1127/logs/smpi_test_lsf_out_25378>, Exclusive E
xecution, 8 Task(s), Requested Resources < select[type ==
x86-64] span[ptile=4]>;
Tue Aug 6 10:30:55: Job 25378 of cluster ppc_cluster1 accepted as job 24347;
Tue Aug 6 10:30:55: Dispatched 8 Task(s) on Host(s) <RemoteClusterHost2> <RemoteClusterHost2> <ibm
gpu02> <RemoteClusterHost2> <RemoteClusterHost8> <RemoteClusterHost8> <RemoteClusterHost8> <ibmgpu
08>, Allocated 32 Slot(s) on Host(s) <RemoteClusterHost2> <RemoteClusterHost2>
<RemoteClusterHost2> <RemoteClusterHost2> <RemoteClusterHost2> <RemoteClusterHost2> <RemoteClusterHost2> <ib
mgpu02> <RemoteClusterHost2> <RemoteClusterHost2> <RemoteClusterHost2> <RemoteClusterHost2> <ibmgp
u02> <RemoteClusterHost2> <RemoteClusterHost2> <RemoteClusterHost2> <RemoteClusterHost8> <RemoteClusterHost8
> <RemoteClusterHost8> <RemoteClusterHost8> <RemoteClusterHost8> <RemoteClusterHost8> <RemoteClusterHost8> <
RemoteClusterHost8> <RemoteClusterHost8> <RemoteClusterHost8> <RemoteClusterHost8> <RemoteClusterHost8> <ibm
gpu08> <RemoteClusterHost8> <RemoteClusterHost8> <RemoteClusterHost8>, Effective RES_REQ
<select[type == any ] order[r15s:pg] span[ptile=4] >;
Tue Aug 6 10:30:55: Starting (Pid 7309);
Tue Aug 6 10:30:56: Running with execution home </home1/sachinpb/>, Execution CW
D </tmp>, Execution Pid <7309>;
Tue Aug 6 12:22:54: Done successfully. The CPU time used is 46995.0 seconds;
HOST: RemoteClusterHost2; CPU_TIME: 23908 seconds
HOST: RemoteClusterHost8; CPU_TIME: 23087 seconds
Tue Aug 6 12:23:07: Post job process done successfully;
RUNLIMIT
480.0 min of POWER8
MEMORY USAGE:
MAX MEM: 135.4 Gbytes; AVG MEM: 8.5 Gbytes
Summary of time in seconds spent in various states by Tue Aug 6 12:23:07
PEND PSUSP RUN USUSP SSUSP UNKWN TOTAL
0 0 6719 0 0 0 6719
-bash-4.2$
-------------------------------------
Example 2 :
[]$ bsub -n 8 -q x86_ibmgpu_q -R "select[type==X86_64] span[ptile=1]"
bsub> sleep 100
bsub> Job <71403> is submitted to queue <x86_ibmgpu_q>.
[]$
[]$ bjobs 71403
JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME
71403 smpici RUN x86_ibmgpu MYhost_cluster1
Host01@x8 sleep 100 Jun 20 22:54
Host02@x86-64_cluster2
Host03@x86-64_cluster2
Host04@x86-64_cluster2
Host05@x86-64_cluster2
Host06@x86-64_cluster2
Host07@x86-64_cluster2
Host08@x86-64_cluster2
[]$
-------------------------------------- ------------------------------------------------------------------------------
I hope this blog helped in understanding how to setup Spectrum LSF's Job forwarding Mode in a multinode cluster followed by job submission
Reference:
https://www.ibm.com/support/knowledgecenter/SSWRJV_10.1.0/lsf_multicluster/job_scheduling_job_forward_mc_lsf.html
https://www.ibm.com/support/knowledgecenter/en/SSWRJV_10.1.0/lsf_multicluster/queue_configure_remote_mc_lsf.html
https://www.ibm.com/support/knowledgecenter/en/SSETD4_9.1.2/lsf_multicluster/remote_timeout_limit_mc_lsf.html
No comments:
Post a Comment