Difference between revisions of "Infrastructure/trickle"
(Created page with "i am no big fan of the Abel arrayrun(1) facility. what i usually do is create a large file with as many command lines as i want to run jobs, each with whatever parameters tha...") |
|||
Line 1: | Line 1: | ||
− | i am no big fan of the | + | i am no big fan of the SLURM |
arrayrun(1) facility. what i usually do is create a large file with | arrayrun(1) facility. what i usually do is create a large file with | ||
as many command lines as i want to run jobs, each with whatever | as many command lines as i want to run jobs, each with whatever | ||
Line 20: | Line 20: | ||
<pre> | <pre> | ||
− | / | + | /cluster/shared/nlpl/operation/tools/trickle --start --limit 20 ~/echo.jobs |
− | while true; do / | + | while true; do /cluster/shared/nlpl/operation/tools/trickle --limit 20 |
~/echo.jobs ; sleep 30; done | ~/echo.jobs ; sleep 30; done | ||
[19-02-16 15:00:37] trickle[20]: 20 jobs; 3 running; 0 new. | [19-02-16 15:00:37] trickle[20]: 20 jobs; 3 running; 0 new. |
Revision as of 19:23, 2 March 2020
i am no big fan of the SLURM arrayrun(1) facility. what i usually do is create a large file with as many command lines as i want to run jobs, each with whatever parameters that job requires. a silly example of such a master job file could be something like
for i in 0 1 2 3 4 5 6 7 8 9; do for j in 0 1 2 3 4 5 6 7 8 9; do echo "sbatch ${HOME}/echo.slurm ${i} ${j}"; done; done > ~/echo.jobs
assuming such a file, i have a script that ‘trickles’ through the sequence of jobs, keeping up to some maximum limit of queue entries at any point in time, and filling up the queue to the limit again as jobs terminate. my idiom of setting into motion this process then goes as follows:
/cluster/shared/nlpl/operation/tools/trickle --start --limit 20 ~/echo.jobs while true; do /cluster/shared/nlpl/operation/tools/trickle --limit 20 ~/echo.jobs ; sleep 30; done [19-02-16 15:00:37] trickle[20]: 20 jobs; 3 running; 0 new. [19-02-16 15:01:07] trickle[20]: 17 jobs; 0 running; 3 new. [19-02-16 15:01:38] trickle[23]: 17 jobs; 0 running; 3 new. [19-02-16 15:02:10] trickle[26]: 20 jobs; 3 running; 0 new. ...
the first integer is the pointer into the job sequence, 20 initially, then at each step advancing by the number of new jobs submitted for that call.
—just in case you might find this useful ... for all i know, this script provides similar functionality to arrayrun(1), but i find it more convenient to be able to pass each job its full command line directly, without having to redirect on the job indices under arrayrun(1) control.