9/16/12

Count the number of lines in a File in UNIX

UNIX doesn't have any real command which helps us to get the number of lines which has records or data in it. But what UNIX offers is to get the number of new lines.

The command wc -l <file_name> gives us the number of new line characters and not the no. of lines which has data. For Example, in UNIX make a file, open it and then press enter (dont write anything). Now for the same file run the above command, you will find that it returns an output as 1, which is wrong as it should return 0 as the file has no data in it.

Therefore, to overcome this shortcoming of wc -l, I thought of writing a Shell script which will give us the exact count of the number lines which has data in it.


Here we will be using commands which might be new to some as they are not there in our past tutorials on UNIX/Shell Scripting. We will be using:

* wc 
* nl 
* awk

Thats it, only the three commands and we will get our desired output. So before we move ahead let me tell you about the above commands:  

* wc: Its a word count command in unix. Its usage can be as follows:


    • wc -l <filename> print the line count
    • wc -c <filename> print the byte count
    • wc -m <filename> print the character count
    • wc -L <filename> print the length of longest line
    • wc -w <filename> print the word count
* nl: The nl commad helps us to print the no. of lines in the file. We will be using the following version:

    • nl -ba <filename> print the line count including the spaces in file
* awk: Its a command which helps us to find a particular pattern inside the file or in a line.

So this was the basic definition of the three commands. Lets start with the script. 

Count_lines.sh


#!/bin/ksh

#Getting the no. of new line characters
wc_count=`wc -l <file_name> | awk -F " " {'print $1'}`

#Getting the no. of lines in the file
nl_count=`nl -ba <file_name> | tail -1 | awk -F " " {'print $1'}`


#Logic to get the exact count
if [ $wc_count -eq $nl_count ]; then
    echo "No. of lines in the file is: $wc_count"
else

#The nl command can give two values also if there is no. new line 
#after the last entry in the file    
    check_var=`${#nl_count} | tail -1 | awk -F " " {'print $1'}`

    check_len=`echo $check_var | grep " "`
    if [ ${#check_len} -gt 0 ]; then    # ${#<variable> gives the 
#length of the variable
        nl_count1=`echo $check_var | awk -F " " {'print $1'}`
        nl_count2=`echo $check_var | awk -F " " {'print $2'}`        
        if [ $nl_count1 -gt $nl_count2 ]; then
            nl_count=$nl_count1
        else
            nl_count=$nl_count2
        fi
    fi
    
    if [ $wc_count == $nl_count ]; then
        echo "No. of lines in the file is: $wc_count"
    else
        echo "No. of lines in the file is: $nl_count"    
    fi
fi

if [ ! -s <file_name> ]; then     #Checking if the file is zero byte 
#then count should be 0
    echo "No. of lines in the file is: 0"
fi

NOTE: The script requires the file_name to be put. But if the file and script are in diff. directory the you will have to give the absolute path for the file.

SHARE THIS POST:

0 comments:

Post a Comment