The UNIX way.

Around the clock, across the globe. By Vladimir Legeza

Parallelization task in a shell script.

leave a comment »

Assume we have a 1000 files and we need to process each of them in a some way. File processing takes at least a 15 minutes. So, to process them all consistently we need aproximately 10 days. If we suppose that we will process them on a computer with more then one core, it would be logical to run some processes in parallel. But if we would start them all at the same time – our computer is going to be dead.

The following shell code defines a number of a processes that should be run concurrently, and supported the same number of concurrently running processes until the last file would not be processed.

PROC_IN_PARALLEL=3
FILES_LIST=1 2 3 4 5 6 7

processing(){
echo "Start processing file $1"
sleep 1
echo "File $1 is processed."
}

# Main loop
for i in $FILES_LIST; do
processing $i &
pid_list="$pid_list $!"

# Control of the concurrent processes
while [ `echo $pid_list|wc -w` -ge "$PROC_IN_PARALLEL" ] ; do
new_pid_list=
for pid in $pid_list ; do
if [ -d /proc/$pid ] ;then
new_pid_list="$new_pid_list $pid"
fi
done
pid_list=$new_pid_list
sleep 1
done

done


Output of executed code should looks like follows:

$ ./task_parallelizatin.sh
Start processing file 1
Start processing file 2
Start processing file 3
File 1 is processed.
File 2 is processed.
File 3 is processed.
Start processing file 4
Start processing file 5
Start processing file 6
File 4 is processed.
File 5 is processed.
File 6 is processed.
Start processing file 7
File 7 is processed.

If rewrite processing() procedure and add some random sleep time

processing(){
echo "Start processing file $1"
sleep `echo $RANDOM|cut -c 2`
echo "File $1 is processed."
}

then output will look a bit different:

$ ./task_parallelizatin.sh
Start processing file 1
Start processing file 2
Start processing file 3
File 2 is processed.
File 3 is processed.
Start processing file 4
Start processing file 5
File 1 is processed.
Start processing file 6
File 4 is processed.
Start processing file 7
File 7 is processed.
File 5 is processed.
File 6 is processed.

As we can see, the fourth process starts immediately after previous process completion (no mater which of the previous processes would finished).

Advertisements

Written by Vladimir Legeza

April 1, 2011 at 9:54 am

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: