Author: Zuguang Gu ( firstname.lastname@example.org )
The bsub package is only broadly tested on DKFZ ODCF cluster and the default configurations work fine there. It is also possible for other clusters that use LSF system to use bsub package. The following global options should be properly configured.
If you have difficulties to configure bsub package for your cluster, please feel free to contact me. I am glad to help.
You need to configure how to call the
Rscript command, especially when you have different versions of R installed. Following is how we call
Rscript on our cluster (we use
module to manage the bash environment).
Then the different version of
Rscript can be switched by setting different
bsub_opt$R_version option (The value of
bsub_opt$R_version will be passed to the
Similarlly, if you use conda for managing different versions of software, you can also choose R with different versions by setting a proper
bsub_opt$call_Rscript. Let assume you have conda environments for different R versions with the naming schema
R_$version (e.g. R_3.6.0), then you can set
The names of the nodes where the jobs are submitted can be set by
bsub_opt$submission_node option. The value can be a vector of node names. bsub will randomly select one to connect.
If you want to submit jobs, the login node should be the same as the submission node. By default, the value for
login_node is automatically copied from
submission_node. But there are cases when the login node is different from the submission node. For example, If you are working on your own laptop, to connect to your institute’s cluster, you need to first connect to server A to enter your institute’s network, then on server A, you continue to connect to server B which is the submission node. In this case, server A is the login node. When you explicitly set
login_node, you should also set
NULL because in this circumstance, you cannot submit jobs:
You can still query job status. Please refer to two_ssh.html to see the instructions of how to configure bsub packages when you need to establish two connections to the submission node.
Your username on the submission node can be set by
bsub_opt$user. The default value is
Sys.info()['user']. If the username on the submission node is different, you need to explicitly set it.
bsub_opt$ssh_envir should be properly set so that LSF binaries such as
bjobs can be found. There are some environment variables initialized when logging in the bash terminal while they are not initialized with the ssh connection. Thus, some environment variables should be manually set.
An example for
bsub_opt$ssh_envir is as follows. The
LSF_SERVERDIR should be defined and exported.
The values of these two variables can be obtained by entering following commands in your bash terminal (on the submission node):
echo $LSF_ENVDIR echo $LSF_SERVERDIR
You need to define the template for calling the
bsub command by
bsub_opt$bsub_template option. The self-defined function should accepts following arguments:
memorymemory, in GB.
corenumber of cores to use.
outputpath of output file.
...should be added as the last argument of the function.
The temporary bash script to submit is automatically appended as the last argument of
E.g., the template on our cluster is defined as:
You can use
GetoptLong::qq() to construct a complex string with interpolating multiple variables.
The time strings by LSF
bjobs command might be different for different installations. The bsub package needs to convert the time strings to
POSIXlt objects for calculating the time difference in the
bjobs() function. Thus, if the default time string parsing fails, users need to provide a user-defined function and set with
bsub_opt$parse_time option in
bsub_opt. The function accepts a vector of time strings and returns a
POSIXlt object. For example, if the time string returned from
bjobs command is in a form of
Dec 1 18:00:00 2019, the parsing function can be defined as:
bsub package automatically recognizes several formats of time strings.