Stop using too many processes for simple tasks!

If you have ever tried to get out some sort of information from a specific command or from a file you might have used one the following commands.

grep – Print lines matching a pattern(1)
sed – A stream editor for filtering and transforming text(1)
awk – Pattern scanning and processing language(1)

All these commands are powerful in their own ways. Grep is a powerful tool when you just want to get information fast and easy. Sed is a powerful tool when it comes to working with regular expressions. Awk is powerful for getting information in complex patterns.
But all these commands are similar to each and one another. You can use awk for the same purpose as as grep, you can use awk for basicly the same purpose as sed and so on.

A common practise in the world of Linux is that people seem to use these commands in faulty, or rather lazy ways. You may have heard or read about the classic faulty use of the command cat.
Say for example you want to get the line containing information about the root user from the /etc/passwd file. How would you do this? Many people would execute the following command, cat /etc/passwd|grep "root". As much as this may work it’s a unneccesary way of doing it. You are starting two different processes for one simple use, the cat and the grep command. If we want to be extra harsh we are using three, if your are counting the pipe command.
You can just use the grep command for the same desired result, grep "root" /etc/passwd. Here we are only using the grep command. We only had to use one process for our desired result.

Now, I was writing a script earlier today to get certain information from the df command. I wanted to get current use in percent from the /dev/sda1 partition. I wanted the output to only be interpreters, for example “10”. Now if we first run the df command and inspect the output we can see the layout and start to approach a way to get our desired information.

# df
Filesystem                  1K-blocks     Used  Available Use% Mounted on
udev                            10240        0      10240   0% /dev
tmpfs                          819540     2400     817140   1% /run
/dev/sda1                   110664528  6484064   98559012   7% /
tmpfs                            5120        0       5120   0% /run/lock
tmpfs                         2596920      152    2596768   1% /run/shm

Here we can see the output of the filesystems disk usage for all the mount points. We can see that “/dev/sda1″ is mounted on “/” (root). We can see that the percent use is at “7”. Now how would we get that information, the “7%”, without the percent symbol or any other information surronding it.

We can start by trying to grep our partition.

# df |grep "/dev/sda1"
/dev/sda1                   110664528  6484064   98559012   7% /

We now only have information revolving the “/dev/sda1″ partition. How would we be able to only grep the “Use%” or “7%” part? Let’s try using awk.

# df |grep "/dev/sda1"|awk '{print $5}'
7%

Now we are really getting somewhere. But there is a percent symbol in our output, we only want the interprenter. How would we be able to remove the percent symbol? Lets try using sed for that.

# df |grep "/dev/sda1"|awk '{print $5}'|sed 's/%//g'
7

It works! We now only have the desired interprenter in out output. Well done.
Lets look at our syntax and break it down.

First we used the command df to get information about our filesystem disk usage for all of our mounted partitions. Second we used the grep command to grep a certain line in from the output of the df command. We got the desired line revolving the “/dev/sda1″ partition. Third we used the awk command to only grep a specific part of our former grep command (the “Use%” or “7%” part). We used awk to grep a specific part out of the grep command, we told awk to print the fifth element from the grep command (print $5) and got “7%”. Lastly we wanted to remove the “%” symbol from our output. We used the sed command for this desired outcome and told sed to substitute “%” for nothing (“”). Finally we got our desired output “7”.
Lets count how many processes we used for this task, 4 (7 if your are counting pipe). That is a lot of processes for one simple task. What if I told you this could be done with only 2.

Lets try doing this with only the use of the df and the awk command.

# df|awk '/\/dev\/sda1/ {print $5+0}'
7

Wow, that was quite simple and looks more pleasing to the eye.

Lets break it down. First we used the df command again to get information about the filesystem disk usage for all of our mounted partitions. Then we used awk to grep and substitute our output for the desired outcome. Awk have the ability to act as the grep command by using the “/” symbol, for example, awk /root/ /etc/password would generate the same information as, grep "root" /etc/passwd would. That means we can eliminate the use of grep and just use awk for this purpose. Next we wanted to grep a specific part of our output and we used the awk command for this purpose earlier and we can of course do it again (print $5). Lastly we wanted to substitute the “%” symbol with nothing, or in layman’s term, remove it. Thankfully awk can use “+0″ that will force “$5″ to be converted into a number. Remember “$5″ holds “7%”. So in that case “%” becomes invalid and gets dropped and $5 only holds “7”.
In the end we have our desired output while only using 2 processes instead of 4.

Lets try this again with only using the df and the sed command.

# df --output=pcent /dev/sda1 |sed -e 's/%//g' -e '1d'
7

Lets break this one down also. First we used the df command again. But this time we fully utilized some of the df command’s flags or operators.
We first used the –output command to “grep” a specific part of the df output, in this case the “Use%” part. Then we told df to only output information about “/dev/sda1″. Lastly we used sed to restructure the output given from the df command. We first removed the “%” percent symbol with ‘s/%//g’ and lastly we removed the “Use%” part with ‘1d’.

Some people might argue that starting 2 extra processes would not have such a big inpact in the permormance of a system. And that might be true in 99% of the cases. But one day you may stumble upon a system that can’t handle that many processes and maybe would result in a kernel panic or something similar. Or let’s say you are running a script that uses this method every minute. Wouldn’t it be way more efficient to just have to start 2 processes instead of 4 every minute? Yes of course it would. Therefore it would be really unnecessary and rather dumb to use the double amount of processes when you could only use half.
If I would tell you, you could make 200,000$ a year or 400,000$ year under the same circumstances would you pick the 60,000$ over 120,000$? Of course not you wouldn’t. This is the same principle as running more processes than you need in a script.

To sum this little “rant” up. Stop using more processes than you need. Stop being lazy and start being efficient.

1 comment

Leave a Reply

Your email address will not be published. Required fields are marked *