Thursday, August 15, 2019

Spectrum LSF GPU enhancements & Enabling GPU features

The IBM Spectrum LSF Suites portfolio redefines cluster virtualization and workload management by providing a tightly integrated solution for demanding, mission-critical HPC environments that can increase both user productivity and hardware utilization while decreasing system management costs. The heterogeneous, highly scalable and available architecture provides support for traditional high-performance computing and high throughput workloads, as well as for big data, cognitive, GPU machine learning, and containerized workloads. Clients worldwide are using technical computing environments supported by LSF to run hundreds of genomic workloads, including Burrows-Wheeler Aligner (BWA), SAMtools, Picard, GATK, Isaac, CASAVA, and other frequently used pipelines for genomic analysis.

source
IBM Spectrum LSF provides support for heterogeneous computing environments, including NVIDIA GPUs. With the ability to detect, monitor and schedule GPU enabled workloads to the appropriate resources, IBM Spectrum LSF enables users to easily take advantage of the benefits provided by GPUs.  

Solution highlights include: 
  •     Enforcement of GPU allocations via cgroups
  •     Exclusive allocation and round robin shared mode allocation
  •     CPU-GPU affinity
  •     Boost control
  •     Power management/li>
  •     Multi-Process Server (MPS) support
  •     NVIDIA Pascal and DCGM support 
The order of GPU conditions when allocating the GPUs are as follows:
  •     The largest GPU compute capability (gpu_factor value).
  •     GPUs with direct NVLink connections.
  •     GPUs with the same model, including the GPU total memory size.
  •     The largest available GPU memory.
  •     The number of concurrent jobs on the same GPU.
  •     The current GPU mode.

Configurations:

1) GPU auto-configuration
Enabling GPU detection for LSF is now available with automatic configuration. To enable automatic GPU configuration, configure LSF_GPU_AUTOCONFIG=Y in the lsf.conf file. LSF_GPU_AUTOCONFIG controls whether LSF enables use of GPU resources automatically. If set to Y, LSF automatically configures built-in GPU resources and automatically detects GPUs. If set to N, manual configuration of GPU resources is required to use GPU features in LSF. Whether LSF_GPU_AUTOCONFIG is set to Y or N, LSF will always collect GPU metrics from hosts. On
When enabled, the lsload -gpu, lsload -gpuload, and lshosts -gpu commands will show host-based or GPU-based resource metrics for monitoring.

2) The LSB_GPU_NEW_SYNTAX=extend parameter must be defined in the lsf.conf file to enable the -gpu option and GPU_REQ parameter syntax.

3) Other configurations :

  • To configure GPU resource requirements for an application profile, specify the GPU_REQ parameter in the lsb.applications file.   i.e GPU_REQ="gpu_req"
  • To configure GPU resource requirements for a queue, specify the GPU_REQ parameter in the lsb.queues file.  i.e GPU_REQ="gpu_req"
  • To configure default GPU resource requirements for the cluster, specify the LSB_GPU_REQ parameter in the lsf.conf file. i.e LSB_GPU_REQ="gpu_req"
---------------------------------------------------------------------------------------------
Configuration change required on clusters : LSF_HOME/conf/lsf.conf

#To enable "-gpu"
LSF_GPU_AUTOCONFIG=Y
LSB_GPU_NEW_SYNTAX=extend
LSB_GPU_REQ="num=4:mode=shared:j_exclusive=yes"
--------------------------------------------------------------------------------------------
Specify additional GPU resource requirements
LSF now allows you to request additional GPU resource requirements to allow you to further refine the GPU resources that are allocated to your jobs. The existing bsub -gpu command option, LSB_GPU_REQ parameter in the lsf.conf file, and the GPU_REQ parameter in the lsb.queues and lsb.applications files now have additional GPU options to make the following requests:
  •     The gmodel option requests GPUs with a specific brand name, model number, or total GPU memory.
  •     The gtile option specifies the number of GPUs to use per socket.
  •     The gmem option reserves the specified amount of memory on each GPU that the job requires.
  •     The nvlink option requests GPUs with NVLink connections.
You can also use these options in the bsub -R command option or RES_REQ parameter in the lsb.queues and lsb.applications files for complex GPU resource requirements, such as for compound or alternative resource requirements. Use the gtile option in the span[] string and the other options (gmodel, gmem, and nvlink) in the rusage[] string as constraints on the ngpus_physical resource.

Monitor GPU resources with lsload command
Options within the lsload command show the host-based and GPU-based GPU information for a cluster. The lsload -l command does not show GPU metrics. GPU metrics can be viewed using the lsload -gpu command, lsload -gpuload command, and lshosts -gpu command.

lsload -gpu

[root@powerNode2 ~]# lsload -gpu
HOST_NAME       status  ngpus  gpu_shared_avg_mut  gpu_shared_avg_ut  ngpus_physical
powerNode1           ok      4                  0%                 0%               4
powerNode2           ok      4                  0%                 0%               4
powerNode3           ok      4                  0%                 0%               4
powerNode4           ok      4                  0%                 0%               4
powerNode5           ok      4                  0%                 0%               4
[root@powerNode2 ~]#


lsload -gpuload
[root@powerNode2 ~]# lsload -gpuload
HOST_NAME       gpuid   gpu_model   gpu_mode  gpu_temp   gpu_ecc  gpu_ut  gpu_mut gpu_mtotal gpu_mused   gpu_pstate   gpu_status   gpu_error
powerNode1 0 TeslaV100_S        0.0       33C       0.0      0%       0%      15.7G        0M            0           ok           -
                    1 TeslaV100_S        0.0       36C       0.0      0%       0%      15.7G        0M            0            ok           -
                    2 TeslaV100_S        0.0       33C       0.0      0%       0%      15.7G        0M            0            ok           -
                    3 TeslaV100_S        0.0       36C       0.0      0%       0%      15.7G        0M            0            ok           -
powerNode2 0 TeslaP100_S        0.0       37C       0.0      0%       0%      15.8G        0M            0           ok           -
                    1 TeslaP100_S        0.0       32C       0.0      0%       0%      15.8G        0M            0           ok           -
                    2 TeslaP100_S        0.0       36C       0.0      0%       0%      15.8G        0M            0           ok           -
                    3 TeslaP100_S        0.0       31C       0.0      0%       0%      15.8G        0M            0           ok           -
powerNode3 0 TeslaP100_S        0.0       33C       0.0      0%       0%      15.8G        0M            0           ok           -
                    1 TeslaP100_S        0.0       32C       0.0      0%       0%      15.8G        0M            0           ok           -
                    2 TeslaP100_S        0.0       35C       0.0      0%       0%      15.8G        0M            0           ok           -
                    3 TeslaP100_S        0.0       37C       0.0      0%       0%      15.8G        0M            0           ok           -
powerNode4 0 TeslaV100_S        0.0       35C       0.0      0%       0%      15.7G        0M            0           ok           -
                    1 TeslaV100_S        0.0       35C       0.0      0%       0%      15.7G        0M            0           ok           -
                    2 TeslaV100_S        0.0       32C       0.0      0%       0%      15.7G        0M            0           ok           -
                    3 TeslaV100_S        0.0       36C       0.0      0%       0%      15.7G        0M            0           ok           -
powerNode5 0 TeslaP100_S        0.0       31C       0.0      0%       0%      15.8G        0M            0           ok           -
                    1 TeslaP100_S        0.0       32C       0.0      0%       0%      15.8G        0M            0           ok           -
                    2 TeslaP100_S        0.0       34C       0.0      0%       0%      15.8G        0M            0           ok           -
                    3 TeslaP100_S        0.0       36C       0.0      0%       0%      15.8G        0M            0           ok           -
[root@powerNode2 ~]#


lshosts -gpu

[root@powerNode2 ~]# bhosts -gpu
HOST_NAME              ID           MODEL     MUSED      MRSV  NJOBS    RUN   SUSP    RSV
powerNode1               0 TeslaP100_SXM2_        0M        0M      0      0      0      0
                        1 TeslaP100_SXM2_        0M        0M      0      0      0      0
                        2 TeslaP100_SXM2_        0M        0M      0      0      0      0
                        3 TeslaP100_SXM2_        0M        0M      0      0      0      0
powerNode2              0 TeslaP100_SXM2_        0M        0M      0      0      0      0
                        1 TeslaP100_SXM2_        0M        0M      0      0      0      0
                        2 TeslaP100_SXM2_        0M        0M      0      0      0      0
                        3 TeslaP100_SXM2_        0M        0M      0      0      0      0
powerNode3              0 TeslaP100_SXM2_        0M        0M      0      0      0      0
                        1 TeslaP100_SXM2_        0M        0M      0      0      0      0
                        2 TeslaP100_SXM2_        0M        0M      0      0      0      0
                        3 TeslaP100_SXM2_        0M        0M      0      0      0      0
powerNode4              0 TeslaV100_SXM2_        0M        0M      0      0      0      0
                        1 TeslaV100_SXM2_        0M        0M      0      0      0      0
                        2 TeslaV100_SXM2_        0M        0M      0      0      0      0
                        3 TeslaV100_SXM2_        0M        0M      0      0      0      0
powerNode5              0 TeslaV100_SXM2_        0M        0M      0      0      0      0
                        1 TeslaV100_SXM2_        0M        0M      0      0      0      0
                        2 TeslaV100_SXM2_        0M        0M      0      0      0      0
                        3 TeslaV100_SXM2_        0M        0M      0      0      0      0
[root@powerNode2 ~]# 

 The -gpu option for lshosts shows the GPU topology information for a cluster.

[root@powerNode2 ~]# lshosts -gpu
HOST_NAME   gpu_id       gpu_model   gpu_driver   gpu_factor      numa_id
powerNode1       0 TeslaP100_SXM2_       418.67          6.0            0
                 1 TeslaP100_SXM2_       418.67          6.0            0
                 2 TeslaP100_SXM2_       418.67          6.0            1
                 3 TeslaP100_SXM2_       418.67          6.0            1
powerNode2       0 TeslaP100_SXM2_       418.67          6.0            0
                 1 TeslaP100_SXM2_       418.67          6.0            0
                 2 TeslaP100_SXM2_       418.67          6.0            1
                 3 TeslaP100_SXM2_       418.67          6.0            1
powerNode3       0 TeslaP100_SXM2_       418.67          6.0            0
                 1 TeslaP100_SXM2_       418.67          6.0            0
                 2 TeslaP100_SXM2_       418.67          6.0            1
                 3 TeslaP100_SXM2_       418.67          6.0            1
powerNode4       0 TeslaV100_SXM2_       418.67          7.0            0
                 1 TeslaV100_SXM2_       418.67          7.0            0
                 2 TeslaV100_SXM2_       418.67          7.0            8
                 3 TeslaV100_SXM2_       418.67          7.0            8
powerNode5       0 TeslaV100_SXM2_       418.67          7.0            0
                 1 TeslaV100_SXM2_       418.67          7.0            0
                 2 TeslaV100_SXM2_       418.67          7.0            8
                 3 TeslaV100_SXM2_       418.67          7.0            8
[root@powerNode2 ~]# 
Job Submission :
1) Submit a  normal job 
[sachinpb@powerNode2 ~]$  bsub -q ibm_q -R "select[type==ppc]" sleep 200
Job <24807> is submitted to queue <ibm_q>.
[sachinpb@powerNode2 ~]$

2) Submit a job with GPU requirements:
[sachinpb@powerNode2 ~]$  bsub -q ibm_q -gpu "num=1" -R "select[type==ppc]" sleep 200
Job <24808> is submitted to queue <ibm_q>.
[sachinpb@powerNode2 ~]$

3) List jobs
[sachinpb@powerNode2 ~]$ bjobs
JOBID   USER    STAT  QUEUE      ROM_HOST   EXEC_HOST   JOB_NAME     SUBMIT_TIME
24807   sachinpb  RUN   ibm_q    powerNode2   powerNode6 sleep 200     Aug  1 05:34
24808   sachinpb  RUN   ibm_q    powerNode2   powerNode2 sleep 200     Aug  1 05:34
[sachinpb@powerNode2 ~]$

We can see that job <24807>  submitted without "-gpu" option and so, it selected non-GPU node [powerNode6]. Other job <24808> was running on powerNode2 with 4 GPUs as listed in lshosts -gpu command shown in above example.

4) Submit a job with GPU requirements to Another cluster(x86-cluster2) where cluster was configured with Job-forwarding Mode:

[sachinpb@powerNode2  ~]$ lsclusters
CLUSTER_NAME    STATUS   MASTER_HOST   ADMIN    HOSTS  SERVERS
power_cluster1             ok      powerNode2                 lsfadmin       5        5
x86-64_cluster2           ok       x86-masterNode           lsfadmin       8        8
[sachinpb@powerNode2  ~]$


[sachinpb@powerNode2 ~]$ bsub -q x86_q -gpu "num=1" -R "select[type==X86_64]" sleep 200
Job <46447> is submitted to queue <x86_ibmgpu_q>.
[sachinpb@powerNode2 ~]$
[sachinpb@powerNode2 ~]$ bjobs
JOBID   USER    STAT  QUEUE        FROM_HOST   EXEC_HOST                    JOB_NAME   SUBMIT_TIME
46447   sachinpb  RUN   x86_q           powerNode2   x86_intelbox@x86-cluster2    sleep 200      Feb  9 00:55 


I hope this blog helped in understanding how to enable GPU support in Spectrum LSF followed by job submission.
NOTE: GPU enabled workloads supported from IBM Spectrum LSF Version 10.1 Fix Pack 6 onwards. LSF systems using RHEL, version 7 or higher is required to support LSF_GPU_AUTOCONFIG.

References:
https://www.ibm.com/support/knowledgecenter/SSWRJV_10.1.0/lsf_gpu/chap_submit_monitor_gpu_jobs.html

Sunday, August 4, 2019

Spectum LSF MultiCluster Job Forwarding Model - Configurations

IBM® Spectrum LSF (formerly IBM® Platform™ LSF®) is a complete workload management solution for demanding HPC environments. Featuring intelligent, policy-driven scheduling and easy to use interfaces for job and workflow management, it helps organizations to improve competitiveness by accelerating research and design while controlling costs through superior resource utilization. There are two Spectrum LSF  Multi-cluster Models : Job Forward Mode and Lease Mode. You could refer another blog for configuring your cluster to Lease Mode. Click here . Lets learn Job forwarding  Mode for Spectrum LSF cluster in sections below.

Job forwarding model overview
In this model, the cluster that is starving for resources sends jobs over to thecluster that has resources to spare. Job status, pending reason, and resource usageare returned to the submission cluster. When the job is done, the exit code returns to the submission cluster.

By default, clusters do not share resources, even if MultiCluster has been installed. To enable job forwarding, enable MultiCluster queues in both the submission and execution clusters.

How it works :
With this model, scheduling of MultiCluster jobs is a process with two schedulingphases:
- the submission cluster selects a suitable remote receive-jobs queue, and forwards the job to it
- the execution cluster selects a suitable host and dispatches the job to it.If a suitable host is not found immediately, the job remains pending in the execution cluster, and is evaluated again the next scheduling cycle.This method automatically favors local hosts; a MultiCluster send-jobs queue always attempts to find a suitable local host before considering a receive-jobs queue in another cluster.
 
Send-jobs queue
A send-jobs queue can forward jobs to a specified remote queue. By default, LSF attempts to run jobs in the local cluster first. LSF only attempts to place a job remotely if it cannot place the job locally.
Receive-jobs queue
A receive-jobs queue accepts jobs from queues in a specified remote cluster. Although send-jobs queues only forward jobs to specific queues in the remote cluster, receive-jobs queues can accept work from any and all queues in the remote cluster.
Multiple queue pairs
  • You can configure multiple send-jobs and receive-jobs queues in one cluster.
  • A queue can forward jobs to as many queues in as many clusters as you want, and can also receive jobs from as many other clusters as you want.
  • A receive-jobs queue can also borrow resources using the resource leasing method, but a send-jobs queue using the job forwarding method cannot also share resources using the resource leasing method.

In LSF multicluster capability job forwarding mode, filters out-put to display information on forwarded jobs, including the forwarded time and the name of the cluster to which the job was forwarded. -fwd can be used with other options to further filter the results. 
For example, bjobs -fwd -r displays only forwarded running jobs.In LSF multicluster capability job forwarding mode, you can use the local job ID and cluster name to retrieve the job details from the remote cluster. The query syntax is:

bjobs submission_job_id@submission_cluster_name 

Additional Output fields for bjobs
  
       +--------------------+-----+----------+--------------+--------+
       | Field name         | Wid | Aliases  | Unit         | Catego |
       |                    | th  |          |              | ry     |
       +--------------------+-----+----------+--------------+--------+
       +--------------------+-----+----------+--------------+--------+
       | forward_cluster    | 15  | fwd_clus |              | MultiC |
       |                    |     | ter      |              | luster |
       |--------------------|-----|----------|--------------|        |
       | forward_time       | 15  | fwd_time | time stamp   |        |
       |--------------------|-----|----------|--------------|        |
       | srcjobid           | 8   |          |              |        |
       |--------------------|-----|----------|--------------|        |
       | dstjobid           | 8   |          |              |        |
       |--------------------|-----|----------|--------------|        |
       | source_cluster     | 15  | srcluste |              |        |
       |                    |     | r        |              |        |
       +--------------------+-----+----------+--------------+--------+ 

Cluster Configurations:

List the clusters with  basic information :
[sachinpb@powerNode06 ~]$ lsclusters
CLUSTER_NAME   STATUS   MASTER_HOST         ADMIN      HOSTS  SERVERS
ppc_cluster1               ok            powerNode06              lsfadmin            5             5
x86-64_cluster           ok        RemoteClusterHost07      lsfadmin            8            8
[sachinpb@powerNode06 ~]$
List hosts on each cluster
[sachinpb@powerNode06 ~]$ bhosts -w
HOST_NAME             STATUS          JL/U    MAX  NJOBS    RUN  SSUSP  USUSP    RSV
powerNode01                    ok              -             80      0                 0      0                  0      0
powerNode02                    ok              -             80      0                 0      0                  0      0
powerNode03                    ok              -             80      0                 0      0                  0      0
powerNode04                    ok              -             80      0                 0      0                  0      0
powerNode05                    ok              -             80      0                 0      0                  0      0
 [sachinpb@powerNode06 ~]$
 --------------------------  Other cluster ----------------------
[sachinpb@RemoteClusterHost7 ~]$ bhosts -w 
HOST_NAME                STATUS          JL/U    MAX  NJOBS    RUN  SSUSP  USUSP    RSV 
RemoteClusterHost01         ok                -     40              0             0      0                  0      0
RemoteClusterHost02         ok                -     40              0             0      0                  0      0
RemoteClusterHost03         ok                -     40              0             0      0                  0      0
RemoteClusterHost04         ok                -     40              0             0      0                  0      0
RemoteClusterHost05         ok                -     40              0             0      0                  0      0
RemoteClusterHost06         ok                -     40              0             0      0                  0      0
RemoteClusterHost07         ok                -     40              0             0      0                  0      0
RemoteClusterHost08         ok                -     40              0             0      0                  0      0
[sachinpb@RemoteClusterHost7 ~]$

Displays information about IBM Spectrum LSF multicluster capability
[sachinpb@powerNode06 ~]$ bclusters -w
LOCAL_QUEUE                JOB_FLOW    REMOTE                  CLUSTER              STATUS
send_queue                     send              receive_queue          x86-64_cluster2         ok
x86_perf_q                       send               x86_perf_q               x86-64_cluster2        ok
x86_ibmgpu_q                  send               x86_ibmgpu_q         x86-64_cluster2         ok

[Resource Lease Information ]
No resources have been exported or borrowed
 
[sachinpb@powerNode06 ~]$

Configuration file :
 

To make a queue that only runs jobs in remote clusters, take the following steps:

Procedure

  1. Edit the lsb.queues queue definition for the send-jobs queue.
    1. Define SNDJOBS_TO. This specifies that the queue can forward jobs to specified remote execution queues.
    2. Set HOSTS to none. This specifies that the queue uses no local hosts.
    3.  Set MAX_RSCHED_TIME=infinit to maintain FCFS job order
  2. Edit the lsb.queues queue definition for each receive-jobs queue.
    1. Define RCVJOBS_FROM. This specifies that the receive-jobs queue accepts jobs from the specified submission cluster.
    2.  Set HOSTS  - list of all execution hosts

Update $LSF_HOME/conf/lsbatch/ppc_cluster1/configdir/lsb.queues
--------------------------------------------------
Begin Queue
QUEUE_NAME     = send_queue
SNDJOBS_TO     = receive_queue@x86-64_cluster2
HOSTS          = none
PRIORITY       = 30
NICE           = 20
End Queue

--- on Other cluster--- Begin Queue
QUEUE_NAME      = receive_queue
RCVJOBS_FROM    = send_queue@ppc_cluster1
HOSTS           = RemoteClusterHost01 RemoteClusterHost02 RemoteClusterHost03 RemoteClusterHost04 RemoteClusterHost05 RemoteClusterHost06 RemoteClusterHost07 RemoteClusterHost08
PRIORITY        = 55
NICE            = 10
EXCLUSIVE       = Y
DESCRIPTION     = Multicluster Queue
End Queue ---------------------------------------------------------------------------------

Begin Queue
QUEUE_NAME   = x86_gpu_q
SNDJOBS_TO   = x86_gpu_q@x86-64_cluster2
PRIORITY     = 90
INTERACTIVE  = NO
FAIRSHARE    = USER_SHARES[[default,1]]
HOSTS        = none
EXCLUSIVE    = Y
MAX_RSCHED_TIME = infinit
DESCRIPTION  = For x86jobs, Multicluster Queue - Job forward Mode
End Queue

----on Other cluster--
Begin Queue
QUEUE_NAME      = x86_gpu_q
RCVJOBS_FROM    = x86_gpu_q@ppc_cluster1
HOSTS           = RemoteClusterHost01 RemoteClusterHost02 RemoteClusterHost03 RemoteClusterHost04 RemoteClusterHost05 RemoteClusterHost06 RemoteClusterHost07 RemoteClusterHost08
PRIORITY        = 55
NICE            = 10
EXCLUSIVE       = Y
DESCRIPTION     = Multicluster Queue - Job forward Mode
End Queue
---------------------------------------------------------------------------------------

Begin Queue
QUEUE_NAME   = x86_perf_q
SNDJOBS_TO   = x86_perf_q@x86-64_cluster2
PRIORITY     = 40
INTERACTIVE  = NO
FAIRSHARE    = USER_SHARES[[default,1]]
HOSTS        = none
EXCLUSIVE    = Y
MAX_RSCHED_TIME = infinit
DESCRIPTION  = For P8 performance jobs, running only if hosts are lightly loaded.
End Queue


---on Other cluster---
Begin Queue
QUEUE_NAME      = x86_perf_q
RCVJOBS_FROM    = x86_perf_q@ppc_cluster1
HOSTS           = RemoteClusterHost01 RemoteClusterHost02 RemoteClusterHost03 RemoteClusterHost04 RemoteClusterHost05 RemoteClusterHost06 RemoteClusterHost07 RemoteClusterHost08
PRIORITY        = 55
NICE            = 10
DESCRIPTION     = Multicluster Queue - Job forward Mode
End Queue --------------------------------------------------------------


NOTE: 

 For LSF multicluster forward mode, jobs will be recalled to the submission cluster when job stays pending state in the execution cluster reaching MAX_RSCHED_TIME. Set MAX_RSCHED_TIME=infinit to maintain FCFS job order of MultiCluster jobs in the execution queue. Otherwise, jobs that time out are rescheduled to the same execution queue,  but they lose priority and position because they are treated as a new job submission.
How to submit  Jobs - example to show Job forward mode from "ppc_cluster1"  to "x86-64_cluster2"

Submit job1 - job1.script
[sachinpb@powerNode06 ~]$  bsub -n 8 -R "span[ptile=4]" -q x86_ibmgpu_q -R "select[type==X86_64]" job1.script
Job <25378> is submitted to queue <x86_ibmgpu_q>.
[sachinpb@powerNode06 ~]$

Submit job2 - job2.script
[sachinpb@powerNode06 ~]$  bsub -n 4 -q x86_ibmgpu_q -R "select[type==X86_64]" job2.script
Job <25383> is submitted to queue <x86_ibmgpu_q>.
[sachinpb@powerNode06 ~]$

List all forwarded jobs with -fwd option with bjobs

[sachinpb@powerNode06 ~]$ bjobs -fwd
JOBID   USER    STAT     QUEUE                              EXEC_HOST                            JOB_NAME       CLUSTER             FORWARD_TIME
25378   sachinpb  RUN   x86_ibmgpu_q  RemoteClusterHost02@x86-64_cluster2       job1.script        x86-64_cluster2      Aug  6 10:25
                                                                   RemoteClusterHost02@x86-64_cluster2
                                                                   RemoteClusterHost02@x86-64_cluster2
                                                                   RemoteClusterHost02@x86-64_cluster2
                                                                   RemoteClusterHost08@x86-64_cluster2
                                                                   RemoteClusterHost08@x86-64_cluster2
                                                                   RemoteClusterHost08@x86-64_cluster2
                                                                   RemoteClusterHost08@x86-64_cluster2
25383   sachinpb  RUN   x86_ibmgpu_q  RemoteClusterHost04@x86-64_cluster2      job2.script         x86-64_cluster2      Aug  6 10:39
                                                                   RemoteClusterHost04@x86-64_cluster2
                                                                   RemoteClusterHost04@x86-64_cluster2
                                                                   RemoteClusterHost04@x86-64_cluster2
                                                                   RemoteClusterHost04@x86-64_cluster2
[sachinpb@powerNode06 ~]$

--------------------
Observe the Job description and details for forwarded job:
[sachinpb@powerNode06 ~]$ bjobs -l 25378
Job <25378>, Job Name <sachinpb-TEST_ibm-smpi_1127>, User <sachinpb>, Project <defa
                     ult>, Status <RUN>, Queue <x86_ibmgpu_q>
Tue Aug  6 10:25:51: Submitted from host <powerNode06>,  Exclusive Execution, 8 Task(s), Requ
                     ested Resources < select[type == X86_64] span[ptile=4]>;
Tue Aug  6 10:25:51: Job <25378> forwarded to cluster <x86-64_cluster2> as Job<24347>;
Tue Aug  6 10:25:51: Started 8 Task(s) on Host(s) <RemoteClusterHost2@x86-64_cluster2> <i
                     bmgpu02@x86-64_cluster2> <RemoteClusterHost2@x86-64_cluster2> <ibmgp
                     u02@x86-64_cluster2> <RemoteClusterHost8@x86-64_cluster2> <RemoteClusterHost8@
                     x86-64_cluster2> <RemoteClusterHost8@x86-64_cluster2> <RemoteClusterHost8@x86-
                     64_cluster2>, Allocated 32 Slot(s) on Host(s) <RemoteClusterHost2@x8
                     6-64_cluster2> <RemoteClusterHost2@x86-64_cluster2> <RemoteClusterHost2@x86-64
                     _cluster2> <RemoteClusterHost2@x86-64_cluster2> <RemoteClusterHost2@x86-64_clu
                     ster2> <RemoteClusterHost2@x86-64_cluster2> <RemoteClusterHost2@x86-64_cluster
                     2> <RemoteClusterHost2@x86-64_cluster2> <RemoteClusterHost2@x86-64_cluster2> <
                     RemoteClusterHost2@x86-64_cluster2> <RemoteClusterHost2@x86-64_cluster2> <ibmg
                     pu02@x86-64_cluster2> <RemoteClusterHost2@x86-64_cluster2> <RemoteClusterHost2
                     @x86-64_cluster2> <RemoteClusterHost2@x86-64_cluster2> <RemoteClusterHost2@x86
                     -64_cluster2> <RemoteClusterHost8@x86-64_cluster2> <RemoteClusterHost8@x86-64_
                     cluster2> <RemoteClusterHost8@x86-64_cluster2> <RemoteClusterHost8@x86-64_clus
                     ter2> <RemoteClusterHost8@x86-64_cluster2> <RemoteClusterHost8@x86-64_cluster2
                     > <RemoteClusterHost8@x86-64_cluster2> <RemoteClusterHost8@x86-64_cluster2> <i
                     bmgpu08@x86-64_cluster2> <RemoteClusterHost8@x86-64_cluster2> <ibmgp
                     u08@x86-64_cluster2> <RemoteClusterHost8@x86-64_cluster2> <RemoteClusterHost8@
                     x86-64_cluster2> <RemoteClusterHost8@x86-64_cluster2> <RemoteClusterHost8@x86-
                     64_cluster2> <RemoteClusterHost8@x86-64_cluster2>, Execution Home </
                     home1/sachinpb/>, Execution CWD </tmp>;
Tue Aug  6 11:41:25: Resource usage collected.
                     The CPU time used is 28681 seconds.
                     MEM: 82 Mbytes;  SWAP: 0 Mbytes;  NTHREAD: 22
                     HOST: RemoteClusterHost2
                     MEM: 82 Mbytes;  SWAP: 0 Mbytes; CPU_TIME: 15662 seconds
                     PGIDs:  7309 29438 29439 29440 29441 29635 29636 29637 296
                     38
                     PIDs:  7309 7322 7324 7377 29261 29418 29438 29439 29440 2
                     9441 29616 29635 29636 29637 29638
                     HOST: RemoteClusterHost8
                     MEM: 0 Mbytes;  SWAP: 0 Mbytes; CPU_TIME: 13019 seconds
                     PGIDs: -
                     PIDs: -
 RUNLIMIT
 480.0 min
 MEMORY USAGE:
 MAX MEM: 135.4 Gbytes;  AVG MEM: 11 Gbytes
 SCHEDULING PARAMETERS:
           r15s   r1m  r15m   ut      pg    io   ls    it    tmp    swp    mem
 loadSched   -     -     -     -       -     -    -     -     -      -      -
 loadStop    -     -     -     -       -     -    -     -     -      -      -
 RESOURCE REQUIREMENT DETAILS:
 Combined: select[type == X86_64 ] order[r15s:pg] span[ptile=4]
 Effective: select[type == X86_64 ] order[r15s:pg] span[ptile=4]
[sachinpb@powerNode06 ~]$

--------------------
Job description and details on Remote cluster for same forwarded Job with different JobID :

-bash-4.2$ bjobs -l 24347
Job <24347>, Job Name <sachinpb-TEST_ibm-smpi_1127>, User <sachinpb>, Project <defa
                     ult>, Status <RUN>, Queue <x86_ibmgpu_q>, Command <sh /nfs
                     _smpi_ci/ibm-tests/smpi-ci/bin/smpi_test.sh 1127 pr x86_64
                      ibm-smpi "  ">
Tue Aug  6 10:30:55: Submitted from host <c712f6n06@ppc_cluster1:25378>,Executi
                     on, 8 Task(s), Requested Resources < select[type == X86_64] s
                     pan[ptile=4]>;
Tue Aug  6 10:30:55: Job <25378> of cluster <ppc_cluster1> accepted as Job <24347>;
Tue Aug  6 10:30:55: Started 8 Task(s) on Host(s) <RemoteClusterHost2> <RemoteClusterHost2> <ibmgpu
                     02> <RemoteClusterHost2> <RemoteClusterHost8> <RemoteClusterHost8> <RemoteClusterHost8> <RemoteClusterHost8>
                     , Allocated 32 Slot(s) on Host(s) <RemoteClusterHost2> <RemoteClusterHost2> <i
                     bmgpu02> <RemoteClusterHost2> <RemoteClusterHost2> <RemoteClusterHost2> <RemoteClusterHost2> <ibmg
                     pu02> <RemoteClusterHost2> <RemoteClusterHost2> <RemoteClusterHost2> <RemoteClusterHost2> <RemoteClusterHost
                     2> <RemoteClusterHost2> <RemoteClusterHost2> <RemoteClusterHost2> <RemoteClusterHost8> <RemoteClusterHost8>
                     <RemoteClusterHost8> <RemoteClusterHost8> <RemoteClusterHost8> <RemoteClusterHost8> <RemoteClusterHost8> <ib
                     mgpu08> <RemoteClusterHost8> <RemoteClusterHost8> <RemoteClusterHost8> <RemoteClusterHost8> <ibmgp
                     u08> <RemoteClusterHost8> <RemoteClusterHost8> <RemoteClusterHost8>, Execution Home </ho
                     me1/sachinpb/>, Execution CWD </tmp>;
Tue Aug  6 12:13:45: Resource usage collected.
                     The CPU time used is 41628 seconds.
                     MEM: 88 Mbytes;  SWAP: 0 Mbytes;  NTHREAD: 13
                     HOST: RemoteClusterHost2
                     MEM: 88 Mbytes;  SWAP: 0 Mbytes; CPU_TIME: 22646 seconds
                     PGIDs:  7309 11162 11163 11164 11165 11719 11720 11721
                     PIDs:  7309 7322 7324 7377 11144 11162 11163 11164 11165 1
                     1379 11541 11700 11719 11720 11721

                     HOST: RemoteClusterHost8
                     MEM: 0 Mbytes;  SWAP: 0 Mbytes; CPU_TIME: 18982 seconds
                     PGIDs: -
                     PIDs: -
 RUNLIMIT
 480.0 min
 MEMORY USAGE:
 MAX MEM: 135.4 Gbytes;  AVG MEM: 9.2 Gbytes
 SCHEDULING PARAMETERS:
           r15s   r1m  r15m   ut      pg    io   ls    it    tmp    swp    mem
 loadSched   -     -     -     -       -     -    -     -     -      -      -
 loadStop    -     -     -     -       -     -    -     -     -      -      -
 RESOURCE REQUIREMENT DETAILS:
 Combined: select[type == X86_64 ] order[r15s:pg] span[ptile=4]
 Effective: select[type == X86_64 ] order[r15s:pg] span[ptile=4]
-bash-4.2$

---------------------------------------------------------------------------------
You could also check  details on executed Jobs using "bhist command"

@Submission cluster:
[sachinpb@powerNode06 configdir]$ bhist -l 25378
Job <25378>, Job Name <sachinpb-TEST_ibm-smpi_1127>, User <sachinpb>, Project <defa
                     ult>, Command <sh /nfs_smpi_ci/ibm-tests/smpi-ci/bin/smpi_
                     test.sh 1127 pr x86_64 ibm-smpi "  ">
Tue Aug  6 10:25:51: Submitted from host <powerNode>, to Queue <x86_ibmgpu_q>,
                     ution, 8 Task(s), Requested Resources < select[type == x86-64
                     ] span[ptile=4]>;
Tue Aug  6 10:25:51: Forwarded job to cluster x86-64_cluster2;
Tue Aug  6 10:25:51: Job 25378 forwarded to cluster x86-64_cluster2 as remote j
                     ob 24347;
Tue Aug  6 10:25:51: Dispatched 8 Task(s) on Host(s) <RemoteClusterHost2@x86-64_cluster2>
                     <RemoteClusterHost2@x86-64_cluster2> <RemoteClusterHost2@x86-64_cluster2> <ibm
                     gpu02@x86-64_cluster2> <RemoteClusterHost8@x86-64_cluster2> <RemoteClusterHost
                     8@x86-64_cluster2> <RemoteClusterHost8@x86-64_cluster2> <RemoteClusterHost8@x8
                     6-64_cluster2>, Allocated 32 Slot(s) on Host(s) <RemoteClusterHost2@
                     x86-64_cluster2> <RemoteClusterHost2@x86-64_cluster2> <RemoteClusterHost2@x86-
                     64_cluster2> <RemoteClusterHost2@x86-64_cluster2> <RemoteClusterHost2@x86-64_c
                     luster2> <RemoteClusterHost2@x86-64_cluster2> <RemoteClusterHost2@x86-64_clust
                     er2> <RemoteClusterHost2@x86-64_cluster2> <RemoteClusterHost2@x86-64_cluster2>
                     <RemoteClusterHost2@x86-64_cluster2> <RemoteClusterHost2@x86-64_cluster2> <ibm
                     gpu02@x86-64_cluster2> <RemoteClusterHost2@x86-64_cluster2> <RemoteClusterHost
                     2@x86-64_cluster2> <RemoteClusterHost2@x86-64_cluster2> <RemoteClusterHost2@x8
                     6-64_cluster2> <RemoteClusterHost8@x86-64_cluster2> <RemoteClusterHost8@x86-64
                     _cluster2> <RemoteClusterHost8@x86-64_cluster2> <RemoteClusterHost8@x86-64_clu
                     ster2> <RemoteClusterHost8@x86-64_cluster2> <RemoteClusterHost8@x86-64_cluster
                     2> <RemoteClusterHost8@x86-64_cluster2> <RemoteClusterHost8@x86-64_cluster2> <
                     RemoteClusterHost8@x86-64_cluster2> <RemoteClusterHost8@x86-64_cluster2> <ibmg
                     pu08@x86-64_cluster2> <RemoteClusterHost8@x86-64_cluster2> <RemoteClusterHost8
                     @x86-64_cluster2> <RemoteClusterHost8@x86-64_cluster2> <RemoteClusterHost8@x86
                     -64_cluster2> <RemoteClusterHost8@x86-64_cluster2>, Effective RES_RE
                     Q <select[type == any ] order[r15s:pg] span[ptile=4] >;
Tue Aug  6 10:25:51: Starting (Pid 7309);
Tue Aug  6 10:25:51: Running with execution home </home1/sachinpb/>, Execution CW
                     D </tmp>, Execution Pid <7309>;
Tue Aug  6 12:17:50: Done successfully. The CPU time used is 46995.0 seconds;
                     HOST: RemoteClusterHost2; CPU_TIME: 23908 seconds
                     HOST: RemoteClusterHost8; CPU_TIME: 23087 seconds;
 RUNLIMIT
 480.0 min of powerNode
MEMORY USAGE:
MAX MEM: 135.4 Gbytes;  AVG MEM: 8.5 Gbytes
Summary of time in seconds spent in various states by  Tue Aug  6 12:17:50
  PEND     PSUSP    RUN      USUSP    SSUSP    UNKWN    TOTAL
  0               0            6719           0            0                     0        6719

-----------------------
@Execution Node:

-bash-4.2$ bhist -l 24347

Job <24347>, Job Name <sachinpb-TEST_ibm-smpi_1127>, User <sachinpb>, Project <defa
                     ult>, Command <sh /nfs_smpi_ci/ibm-tests/smpi-ci/bin/smpi_
                     test.sh 1127 pr x86_64 ibm-smpi "  ">
Tue Aug  6 10:30:55: Submitted from host <powerNode>, to Queue <x86_ibmgpu_q>,
                     r-ibm-smpi-1127/logs/smpi_test_lsf_out_25378>, Exclusive E
                     xecution, 8 Task(s), Requested Resources < select[type ==
                     x86-64] span[ptile=4]>;
Tue Aug  6 10:30:55: Job 25378 of cluster ppc_cluster1 accepted as job 24347;
Tue Aug  6 10:30:55: Dispatched 8 Task(s) on Host(s) <RemoteClusterHost2> <RemoteClusterHost2> <ibm
                     gpu02> <RemoteClusterHost2> <RemoteClusterHost8> <RemoteClusterHost8> <RemoteClusterHost8> <ibmgpu
                     08>, Allocated 32 Slot(s) on Host(s) <RemoteClusterHost2> <RemoteClusterHost2>
                     <RemoteClusterHost2> <RemoteClusterHost2> <RemoteClusterHost2> <RemoteClusterHost2> <RemoteClusterHost2> <ib
                     mgpu02> <RemoteClusterHost2> <RemoteClusterHost2> <RemoteClusterHost2> <RemoteClusterHost2> <ibmgp
                     u02> <RemoteClusterHost2> <RemoteClusterHost2> <RemoteClusterHost2> <RemoteClusterHost8> <RemoteClusterHost8
                     > <RemoteClusterHost8> <RemoteClusterHost8> <RemoteClusterHost8> <RemoteClusterHost8> <RemoteClusterHost8> <
                     RemoteClusterHost8> <RemoteClusterHost8> <RemoteClusterHost8> <RemoteClusterHost8> <RemoteClusterHost8> <ibm
                     gpu08> <RemoteClusterHost8> <RemoteClusterHost8> <RemoteClusterHost8>, Effective RES_REQ
                      <select[type == any ] order[r15s:pg] span[ptile=4] >;
Tue Aug  6 10:30:55: Starting (Pid 7309);
Tue Aug  6 10:30:56: Running with execution home </home1/sachinpb/>, Execution CW
                     D </tmp>, Execution Pid <7309>;
Tue Aug  6 12:22:54: Done successfully. The CPU time used is 46995.0 seconds;
                     HOST: RemoteClusterHost2; CPU_TIME: 23908 seconds
                     HOST: RemoteClusterHost8; CPU_TIME: 23087 seconds
Tue Aug  6 12:23:07: Post job process done successfully;
 RUNLIMIT
 480.0 min of POWER8
MEMORY USAGE:
MAX MEM: 135.4 Gbytes;  AVG MEM: 8.5 Gbytes
Summary of time in seconds spent in various states by  Tue Aug  6 12:23:07
  PEND     PSUSP    RUN      USUSP    SSUSP    UNKWN    TOTAL
  0                    0        6719              0             0                   0        6719
-bash-4.2$

-------------------------------------
 Example 2 :
[]$ bsub -n 8 -q x86_ibmgpu_q -R "select[type==X86_64] span[ptile=1]"
bsub> sleep 100
bsub> Job <71403> is submitted to queue <x86_ibmgpu_q>.
[]$

[]$ bjobs 71403
JOBID   USER    STAT  QUEUE      FROM_HOST   EXEC_HOST   JOB_NAME   SUBMIT_TIME
71403   smpici  RUN    x86_ibmgpu  MYhost_cluster1  
                                             Host01@x8 sleep 100  Jun 20 22:54
                                             Host02@x86-64_cluster2
                                             Host03@x86-64_cluster2
                                             Host04@x86-64_cluster2
                                             Host05@x86-64_cluster2
                                             Host06@x86-64_cluster2
                                             Host07@x86-64_cluster2
                                             Host08@x86-64_cluster2
[]$
-------------------------------------- ------------------------------------------------------------------------------

I hope this blog helped in understanding how to setup Spectrum LSF's Job forwarding Mode in a multinode cluster  followed by job submission

Reference:
https://www.ibm.com/support/knowledgecenter/SSWRJV_10.1.0/lsf_multicluster/job_scheduling_job_forward_mc_lsf.html 
https://www.ibm.com/support/knowledgecenter/en/SSWRJV_10.1.0/lsf_multicluster/queue_configure_remote_mc_lsf.html
https://www.ibm.com/support/knowledgecenter/en/SSETD4_9.1.2/lsf_multicluster/remote_timeout_limit_mc_lsf.html