# Workflow Grammar

We have defined a complete set of genome sequencing workflow grammars. It keeps the user’s traditional usage habit as much as possible. It requires a very low learning cost to learn how to write and use the workflow. If you have other type workflow grammar files, you need to convert them to the gene container syntax first.

## Template structure

field required type description
version Yes string The version of workflow
inputs No Map The variable of the workflow. You can define multiple, set the actual values of these variables at execution time.
workflow Yes Map Define the steps in the workflow and the dependencies between the steps.
volumes No Map Define the used claim and mount path of the shared storage for the workflow steps.

## workflow example

version: genecontainer_0_1
inputs: # workflow variant
sample: # variant name
default: sample1
type: string
description: variant description
workflow:
test-job-a:
tool: busybox:latest
resources:
memory: 4G
cpu: 4C
commands_iter:
command: sleep 10; touch /result/test-job/${sample}.${item}.${1}.txt vars_iter: - [0,1] test-job-b: tool: busybox:latest resources: memory: 4G cpu: 4C commands: - sleep 10; touch /result/test-job/${sample}.job-b.txt
depends:
- target: test-job-a
volumes:
genobs:
mount_path: /result
mount_from:
pvc: test-pvc # claimName


## version

The version of gene workflow, the current support is genecontainer_0_1.

## inputs

Optional. Used to define the variable of the genome sequencing process. Inputs consists of multiple variables, up to 60 variables can be defined, and each variable name is unique. If the variable name is repeated, the latter definition will override the previously defined one. Variables can be referenced in other parts of the workflow file, using the shell script format, ie: ${var}, where var is the variable name. ### inputs format inputs: <var name>: type: <type> default: <default value> value: <value> description：<description>  ### Field Field required type description var name Yes String The var name consists of letters, numbers, and underscores “-” or underscores “_” with a length of [1, 20]. type Yes String Parameter type, allowed type is as follows: • string • number • bool • array The default type is “string” if not specified. default No interface{} The default value of the parameter, fill in the corresponding value according to type. Note：the type of default and value must be consistent with the type field. value No interface{} Value is the literal value to use for the parameter. The value can be overwrite by an external input.Precedence: external input value > default. Note: one of external input, value and default must be specified. description No String Parameter description information, must be no more than 255 characters. ### Inputs example inputs: sample: type: string default: /home/root split: default: 3 type: number description: parallel number chromlist: default: - 'chr1:10000-103863906' - 'chr1:103913906-205922707' - 'chr1:206072707-249240621' - 'chr2:10000-87668206' - 'chr2:87718206-149690582' - 'chr2:149790582-243189373' - 'chr3:10000-90504854' type: array flag: value: true type: bool  ## Workflow Required. Define the tasks involved in the process and the dependencies between the tasks. A workflow consists of multiple steps, each of which can perform a specific task. ### workflow format workflow: <task name>: description: <task info> tool: <toolName:version> resources：<resources required by the workflow> commands：<commands> commands_iter: <commands with variable> depends：< dependent task> condition: <The task will be executed only when this condition is satisfied>  ### Field #### workflow field  field required type description task name Yes String The name of task. It must consist of lower case alphanumeric characters or '-', and must start and end with an alphanumeric character. And must be no more than 40 characters. description No String Information about this task, must be no more than 255 characters. tool Yes String The tool and version of the current task required, format "toolName:toolVersion". For example, the bwa of version 0.7.12 is set to "tool: bwa:0.7.12". resources No struct Compute Resources required by the task. commands No array[string] The command executed in the container. The length of the array indicates the number of concurrent. Each member represents a command executed in a container.In the following example, if there are four lines in the command, the number of concurrent containers is 4, and each container executes a different command. commands: - sh /obs/shell/run-xxx/run.sh 1 a - sh /obs/shell/run-xxx/run.sh 2 a - sh /obs/shell/run-xxx/run.sh 1 b - sh /obs/shell/run-xxx/run.sh 2 b  Note 1: One of field commands and commands_iter must be set. Use commands_iter if the commands need to pass variables, otherwise use commands. commands_iter No Struct The command executed in the container, and the difference between command and commandIter is that the commandIter supports shell scripts with variables. depends No array[Struct] Specify other tasks that the current task depends on. condition No interface{} The task will be executed only when this condition is satisfied. #### resource field description  field required type description memory No String The amount of memory resources required, in G. Format: "Number + Unit". The number can be decimals. The unit is G or g. For example, if the memory size required is 4G, you can fill in "4G" or "4g" here. cpu No String The amount of memory resources required, in C. Format: "Number + Unit". The number can be decimals. The unit is C or c. #### commands_iter field description  field required type description command Yes String A shell script with variables, for example: echo${1} ${2}${item}  There are two ways to define variables: Variable parameters defined in commands_iter, including vars and vars_iter parameters, you can set one of them. The format is "${n}", n is a positive integer starting at 1. And it will be replaced by the nth element of vars. The vars needs to list all combinations of parameter, and the vars_iter is an automatic traversal combination of parameter. Built-in variable "${item}". Represent the order number of all the possible parameter combinations. For example:  commands_iter: command: echo ${1}${item} vars: - a - b - c  Then the final command will be:  - echo a 0 - echo b 1 - echo c 2  vars No array[array] A two-dimensional array, will be used to replace the command variable, represent all the possible parameter combinations. In the two-dimensional array, the members of each row represent the variables ${1}, ${2}, ${3} in the command. ${1} represents the first member of each line. ${2} represents the second member of each line. And ${3} represents the third member of each line. The length of the two-dimensional array indicates that how many times the command will be executed with different parameters. Each line of the array is used to instantiate the command.The number of rows in the array is the number of k8s job that will run. For example, the vars has four lines. command: echo ${1}${2} ${item} vars: - [0, 0] # 0 ->${1}; 0 -> ${2}; 0 ->${item} - [0, 1] # 0 -> ${1}; 1 ->${2}; 1 -> ${item} - [1, 0] # 1 ->${1}; 0 -> ${2}; 2 ->${item} - [1, 1] # 1 -> ${1}; 1 ->${2}; 3 -> ${item}  4 k8s jobs will run to execute the commands. For the 1st job, the command is: echo 0 0 0  For the 2nd job, the command is: echo 0 1 1  For the 3rd job, the command is: echo 1 0 2  For the 4th job, the command is: echo 1 1 3  vars_iter No array[array] A two-dimensional array. vars_iter list all the possible parameters for every position in the command line. And we will use algorithm Of Full Permutation to generate all the permutation and combinations for these parameter that will be used to replace the ${n} variable. The first row member of the array replace the variable ${1} in the command, the second row member replace the variable ${2} in the command, and so on. For example, commands_iter: command: sh /tmp/step1.splitfq.sh ${1}${2} ${3} vars_iter: - ["sample1", "sample2"] - [0, 1] - [25]  then the final command will be: sh /tmp/scripts/step1.splitfq.sh sample1 0 25 sh /tmp/scripts/step1.splitfq.sh sample2 0 25 sh /tmp/scripts/step1.splitfq.sh sample1 1 25 sh /tmp/scripts/step1.splitfq.sh sample2 1 25  If there are many array members per line, you can use the range function. The format of range function: range(start, end, step) Start and end are all integer. And step can only be positive integer. If you do not specify step, the default is 1. Range(1, 4) represents array [1,2,3] Range(1, 10, 2) represents array [1, 3, 5, 7, 9] vars_iter: - range(0, 4) the same as: vars_iter: - [0, 1, 2, 3]  In case if we need to pass the inputs dynamically based on the output of other task, we can use the get_result function. The format of get_result function: get_result(Job-Target, sep) Job - Target-Specifies the target job name for the result. sep - The separator used to split the string.  get_result(job-a, " ") represents array [1,2,3] if the output of the job-a is "1 2 3" get_result:More-Info #### Depends field description  field required type description target Yes String The name of the task relying on, make sure that the specified task name must exist in the workflow. type No String Dependency type, the value can be: whole, overall dependencies, this is the default. iterate, iterative dependency. For example, if both task A and task B are executed concurrently 100. Setting “whole” indicates that task B can start execution after all 100 steps of task A finished. Setting “iterate” means that the 1st step of task A is completed, then the 1st step of the task B can start execution. Iterative execution can improve the overall concurrency efficiency. #### command field description command:More-Info ### Workflow example workflow: job-a: tool: nginx:latest resources: memory: 2G cpu: 1c commands: - sleep expr 3 \*${wait-base}; echo ${output-prefix}job-a | tee -a${obs}/${output}/${result};
job-b:
tool: nginx:latest
commands_iter:
command: sleep expr ${1} \*${wait-base}; echo ${output-prefix}job-b-${item} | tee -a ${obs}/${output}/${result}; vars_iter: - range(0, 3) depends: - target: job-a type: whole job-c: tool: nginx:latest type: GCS.Job resources: memory: 8G cpu: 2c commands_iter: command: sleep expr${1} \* ${wait-base}; echo${output-prefix}job-c-${item} | tee -a${obs}/${output}/${result};
vars_iter:
- [3, 20]
depends:
- target: job-a
type: iterate
- target: job-b


## volumes

Optional, information about volume that required genome sequencing process required, such as mount path in container, pvc name and so on. You can specify multiple volumes.

### Volumes format

volumes:
<volume name>:
mount_path: <mount_path>
mount_from: <pvc info>


### field

 field required type description mount_path Yes string Path within the container at which the volume should be mounted. Must not contain ':'. mount_from Yes struct Detail info about volume

mount_from field description

 field required type description pvc Yes string The pvc name. Note: the specified pvc must exist in the cluster.

### volumes example

volumes:
genref:
mount_path: ${volume-path-ref} mount_from: pvc:${my_k8s_pvc}
genobs:
mount_path: /volume-path-obs
mount_from:
pvc: sample-data-pvc