The UNIX way.

Around the clock, across the globe. By Vladimir Legeza

Archive for April 2011

Parallelization task in a shell script.

leave a comment »

Assume we have a 1000 files and we need to process each of them in a some way. File processing takes at least a 15 minutes. So, to process them all consistently we need aproximately 10 days. If we suppose that we will process them on a computer with more then one core, it would be logical to run some processes in parallel. But if we would start them all at the same time – our computer is going to be dead.

The following shell code defines a number of a processes that should be run concurrently, and supported the same number of concurrently running processes until the last file would not be processed.

PROC_IN_PARALLEL=3
FILES_LIST=1 2 3 4 5 6 7

processing(){
echo "Start processing file $1"
sleep 1
echo "File $1 is processed."
}

# Main loop
for i in $FILES_LIST; do
processing $i &
pid_list="$pid_list $!"

# Control of the concurrent processes
while [ `echo $pid_list|wc -w` -ge "$PROC_IN_PARALLEL" ] ; do
new_pid_list=
for pid in $pid_list ; do
if [ -d /proc/$pid ] ;then
new_pid_list="$new_pid_list $pid"
fi
done
pid_list=$new_pid_list
sleep 1
done

done

Read the rest of this entry »

Advertisements

Written by Vladimir Legeza

April 1, 2011 at 9:54 am