Wait
Task coordination mechanisms rely on waiting for some tasks to finish before starting new ones.
Tasks are asynchronous
A key concept, is that tasks are asynchronous, which means that task execution order is not guaranteed.
For example, executing the following program (file test_13.bds)
for( int i=0 ; i < 10 ; i++ ) task echo BEFORE $i
for( int i=0 ; i < 10 ; i++ ) task echo AFTER $i
you may see:
$ ./test_13.bds
BEFORE 0
BEFORE 4 <-- Notice, tasks are out of order
BEFORE 3
BEFORE 2
BEFORE 1
BEFORE 5
BEFORE 7
BEFORE 6
BEFORE 8
AFTER 1
AFTER 0
BEFORE 9 <-- Notice this 'BEFORE' task is executed after 'AFTER' task started
AFTER 6
AFTER 5
AFTER 4
AFTER 3
AFTER 2
AFTER 7
AFTER 8
AFTER 9
This is because the task
statement only schedules a task to be executed, but it's up to scheduler to decide when to execute the task.
Schedulers can, and often do, re-order the tasks to be executed.
Waiting for tasks
If we need some kind of "barrier" to wait for tasks, we use the wait
statement
If a task must be executed after another task finishes, we can introduce a wait
statement.
File test_13.bds
for( int i=0 ; i < 10 ; i++ ) task echo BEFORE $i
wait # Wait until ALL scheduled tasks finish
print("We are done waiting, continue...\n")
for( int i=0 ; i < 10 ; i++ ) task echo AFTER $i
Now, we are sure that all tasks 'AFTER' really run after 'BEFORE'
$ ./test_14.bds
BEFORE 0
BEFORE 2
BEFORE 1
BEFORE 4
BEFORE 7
BEFORE 3
BEFORE 6
BEFORE 5
BEFORE 8
BEFORE 9
We are done waiting, continue...
AFTER 0
AFTER 2
AFTER 4
AFTER 1
AFTER 3
AFTER 5
AFTER 6
AFTER 8
AFTER 7
AFTER 9
Waiting for one task to finish
We can also wait for a specific task to finish by providing a task ID wait taskId
, e.g.:
string tid = task echo Hi
wait tid # Wait only for one task
Waiting for a list of tasks to finish
You can wait
for a list of tasks by providing a list of taskIds
.
For instance, in this program, we create a list of two task IDs and wait
on the list:
string[] tids
for( int i=0 ; i < 10 ; i++ ) {
# Tasks that wait a random amount of time
int sleepTime = randInt( 5 )
string tid = task echo BEFORE $i ; sleep $sleepTime ; echo DONE $i
# We only want to wait for the first two tasks
if( i < 2 ) tids.add(tid)
}
# Wait for all tasks in the lists (only the first two tasks)
wait tids
print("End of wait\n")
When we run it, we get:
$ bds z.bds
BEFORE 2
BEFORE 0
BEFORE 7
BEFORE 5
BEFORE 6
BEFORE 4
BEFORE 3
BEFORE 1
DONE 0 <- First task finished
DONE 3
DONE 4
DONE 5
DONE 6
DONE 7
BEFORE 8
BEFORE 9
DONE 1 <- Second task finished
End of wait <- Wait finished here: we were waiting for the first two tasks
DONE 2
DONE 8
DONE 9
Note: There is an implicit wait
statement at the end of the program.
So a program does not exit until all tasks have finished running.
Waiting for ALL tasks to finish
A single wait
statement without any arguments, will wait
for ALL tasks to finish.
For example:
for( int i=0 ; i < 10 ; i++ ) task echo BEFORE $i
wait # This will wait for ALL tasks
println "All tasks finished!"
Implicit wait statement
What happens if there are tasks runnign, but there is not wait
statement?
For example, in the following program there is a long-running task, but there is no wait
statement
task echo "TASK START"; sleep 5; echo "TASK END"
println "All tasks scheduled, program end!"
So the program execution will finish before the task
finishes running.
What happens with the task
that are still executing when bds
finishes executing the program?
bds
introduces an implicit wait
statement at the end of every program.
If you execute the program above from the command line, you'll see:
$ bds test/z.bds
All tasks scheduled, program end!
TASK START
TASK END
$ <- Note that the command line prompt is AFTER the task finished
In the above example, the implicit wait
allows bds
to let the tasks finish.
Task failure
What happens when a task
fails?
When a task fails, a WaitException
error is raised.
We need to remember that the task
command does NOT actually execute a task, it only schedules the task
.
So, where is the exception thrown?
When a task fails, the WaitException
is thrown in any wait
statement that waiting for tasks to finish.
For example, the following code shows where the WaitException
is thrown:
# This task will fail because "my_unknown_command" does not exist
task my_unknown_command input.txt > output.txt
wait # A WaitException will be thrown here because the task fails
println "This will never be executed!"
If you execute the code, the output is (output edited for readability):
$ bds task_28.bds
... line 7: my_unknown_command: command not found
...
Fatal error: task_28.bds. WaitException thrown: Error in wait statement, file task_28.bds, line 5
Task failure on implicit wait
What happens if there is a task error, but there is no wait
statements (i.e. the program has finished executing)?
In that case an error
will be produced by the implicit wait statement that bds
introduces after every program.
There is no point in throwing an exception during an "implicit wait", because this is out of the program's code, so there is no way to catch exceptions from "implicit wait statements".
Example:
# This task will fail because "my_unknown_command" does not exist
# Note: It will fail at the "implicit wait" added by bds at the end of the program.
# Since the implicit 'wait' is after the program's end, there is no exception
# thrown, only an error.
task my_unknown_command
executing the above code, you get ():
$ bds task_29.bds
task.task_29.line_4.id_1.35e61e0f12d941dd.sh: line 7: my_unknown_command: command not found
00:00:00.632 ERROR: Task failed:
Program & line : 'task_29.bds', line 4
Task Name : 'null'
Task ID : 'task_29.bds.20221213_071909_868/task.task_29.line_4.id_1.35e61e0f12d941dd'
Task PID : '84742'
Task hint : 'my_unknown_command'
Task resources : 'cpus: 1 mem: -1.0 B timeout: 86400 wall-timeout: 86400'
State : 'ERROR'
Dependency state : 'ERROR'
Retries available : '1 / 0'
Retries available : '1'
Input files : '[]'
Output files : '[]'
Script file : 'task_29.bds.20221213_071909_868/task.task_29.line_4.id_1.35e61e0f12d941dd.sh'
Exit status : '1'
StdErr (10 lines) :
task_29.bds.20221213_071909_868/task.task_29.line_4.id_1.35e61e0f12d941dd.sh: line 7: my_unknown_command: command not found
As shown above, an error is produces, instead of an exception, i.e. there is no " WaitException thrown" message.
Catching wait exceptions
Since wait
statements throws an exception when the task
fails, it is possible to handle task failures using try
/ catch
/ finally
blocks.
For example:
captured := false
try {
# This task will fail because "my_unknown_command" does not exist
task my_unknown_command
# Wait exception has to be inside the 'try' clause because
# this is where the exception is thrown in case of errors
wait
println "This will NOT be executed" # Code after exception is thrown
} catch( WaitException e) {
captured = true
println "Wait exception captured"
}
println "This WILL be executed: captured = $captured"
exit 0 # Force exit code, otherwise bds returns non-zero on task failure
Executing the code, will output (edited for readibility):
$ bds task_31.bds
task.task_31.line_6.id_1.sh: line 7: my_unknown_command: command not found
00:00:00.691 ERROR: Task failed:
Program & line : 'task_31.bds', line 6
Task Name : 'null'
Task ID : 'task.task_31.line_6.id_1.3e45862489d7bb51'
Task PID : '90485'
Task hint : 'my_unknown_command'
Task resources : 'cpus: 1 mem: -1.0 B timeout: 86400 wall-timeout: 86400'
...
Wait exception captured <- This message is from the 'catch' code
This will be executed: captured = true <- This code would not be executed if we didn't cature the WaitException
...
IMPORTANT: bds
will produce a non-zero exitCode
when a task fails.
If you want your program to return a zero exitCode
to the command line when you've handled the WaitException
, you need to explicitly add an exit
statement.
If we look at the previous program's exit code:
# In bash, the '$?' variable shows the exit code of the
# previouly executed command (which was the bds program)
$ echo "Exit code: $?"
Exit code: 0
So, we were able to catch
the WaitException produce as an error in the task
, and handle the exception and program's exitCode
.