Tuesday, March 23, 2010

CUT

cut is a Unix command line utility which is used to extract sections from each line of input — usually from a  Extraction of line segments can typically be done by bytes (-b), characters (-c), or fields (-f) separated by a delimiter (-d — the tab character by default). A range must be provided in each case which consists of one of N, N-M, N- (N to the end of the line), or -M (beginning of the line to M), where N and M are counted from 1 (there is no zeroth value). Since version 6, an error is thrown if you include a zeroth value. Prior to this the value was ignored and assumed to be 1.

    The following options are supported:
    list
    A comma-separated or blank-character-separated list of integer field numbers (in increasing order), with optional - to indicate ranges (for instance, 1,4,7; 1-3,8; -5,10 (short for 1-5,10); or 3- (short for third through last field)).
    -b list
    The list following -b specifies byte positions (for instance, -b1-72 would pass the first 72 bytes of each line). When -b and -n are used together, list is adjusted so that no multi-byte character is split.
    -c list
    The list following -c specifies character positions (for instance, -c1-72 would sekect the first 72 characters of each line).
    -d delim
    The character following -d is the field delimiter (-f option only). Default is tab. Space or other characters with special meaning to the shell must be quoted. delim can be a multi-byte character.
    -f list
    The list following -f is a list of fields assumed to be separated in the file by a delimiter character (see -d ); for instance, -f1,7 copies the first and seventh field only. Lines with no field delimiters will be passed through intact (useful for table subheadings), unless -s is specified.
    -n
    Do not split characters. When -b list and -n are used together, list is adjusted so that no multi-byte character is split.
    -s
    Suppresses lines with no delimiter characters in case of -f option. Unless specified, lines with no delimiters will be passed through untouched

 -c option specifies to output to display only these character positions.

To display output of 10 character(N character) of each line


$ cut -c 10 passwdfile
x
5
x
$

To display output of 5 and 10 characters(N,M characters) of each line


$ cut -c 5,10 passwdfile
ux
b5
px
$

To display output from 5th to 10th characters(N-M characters) of each line


$ cut -c 5-10 passwdfile
user:x
ba:x:5
pter:x
$

To display output upto 10th character(-M charcters) of each line



$ cut -c -10 passwdfile
unixuser:x
oradba:x:5
scripter:x
$

To display output from 10th character(N- charcters) to end of the each line


$ cut -c 10- passwdfile
x:501:506:Unix User:/home/unixuser:/bin/bash
502:506:DBA User:/home/oradba:/bin/bash
x:1658:506:Scripter World:/home/scripter:/bin/bash
$

Unix cut -d option and Unix cut -f option
-d The character following -d is the field delimiter . Default is tab. Space or other characters with special meaning to the shell must be quoted. delim can be a multi-byte character.

-f specifies the fields to display from the list of fields formed from the input data separate by delimter(-d) .Fields are counted from one and separate multiple values by comma (,)

To get the user description(5th field) from our example file passwdfile use unix cut -d -f options at once

Use : as delimiter and display 5th field


$ cut -d ":" -f5 passwdfile
Unix User
DBA User
Scripter World
$

To display user description(5th filed) and user home directory(6th field) use comma with ":" as delimiter


$ cut -d ":" -f5,6 passwdfile
Unix User:/home/unixuser
DBA User:/home/oradba
Scripter World:/home/scripter
$

To display fields up to user description in our passwd file ,use - before the user description(5th filed) field


$ cut -d ":" -f-5 passwdfile
unixuser:x:501:506:Unix User
oradba:x:502:506:DBA User
scripter:x:1658:506:Scripter World
$

To display fields from user description ,Use - after the user description(5th filed) field


$ cut -d ":" -f5- passwdfile
Unix User:/home/unixuser:/bin/bash
DBA User:/home/oradba:/bin/bash
Scripter World:/home/scripter:/bin/bash
$

Unix cut last character
To display a last character of a string

STRING="unixuser:x:501:506:Unix User:/home/unixuser:/bin/bash"
echo ${STRING} awk '$0=$NF' FS=


$ STRING="unixuser:x:501:506:Unix User:/home/unixuser:/bin/bash"
$ echo ${STRING} awk '$0=$NF' FS=
h
$


Use cut last field or column
To display a last field of a string with a delimter (Here delimter is ":" colon)

STRING="unixuser:x:501:506:Unix User:/home/unixuser:/bin/bash"
echo ${STRING} awk '$0=$NF' FS=":"


$ STRING="unixuser:x:501:506:Unix User:/home/unixuser:/bin/bash"
$ echo ${STRING} awk '$0=$NF' FS=":"
/bin/bash
$


The external cut command displays selected columns or fields from each line of a file. It is a UNIX equivalent to the relational algebra selection operation.  If the capabilities of cut are not enough, then the alternatives are AWK and  Perl.


The cut command uses IFS (Input Field Separators) to determine where to split fields.  You can check it with set | grep IFS. You can also set it, for example
 IFS=" \t\n"
The most typical usage is cutting one of several columns from a file (often a log file) to create a new file.  For example:
cut -d ' ' -f 2-7
retrieves the second to seventh field assuming that each field is separated by a single (note: single) blank. Option -d specified a single character delimiter (in the example about it is a blank) which serves as field separator. option -f  which specifies range of fields included in the output (fields range from two to seven ).   Note that option -d presuppose usage of option -f.         
Cut can work in two modes:
  • column delimited selection (each column starts with certain fixed offset defined as range from-to)
  • separator-delimited selection (with column separator being  a single character like blank, comma, colon, etc).  In this mode cut uses a delimiter defined by -d option (as in example above). By default cut uses the value of delimiter stored in a shell variable called IFS (Input Field Separators) -- typically TAB.
Cut is essentially a simple text parsing tool and unless the task in hands is also simple you will be better off using  other, more flexible, text parsing tools instead. On modern computers difference between invocation of cut and invocation of awk is negligible.  You can also use  Perl in command line mode for the same task. If option -a ( autosplit mode) is specified, then each line in Perl is converted into array @F.  So Perl emulation of cut consist of writing a simple print statement that outputs the necessary fields.  The advantage of the Perl is that the columns can be counted from the last (using negative indexes).

Examples

 

 

Assuming a file named file containing the lines:
foo:bar:baz:qux:quux
one:two:three:four:five:six:seven
alpha:beta:gamma:delta:epsilon:zeta:eta:teta:iota:kappa:lambda:mu
To output the fourth through tenth characters of each line:
% cut -c 4-10 file
This gives the output:
:bar:ba
:two:th
ha:beta
To output the fifth field through the end of the line of each line using the colon character as the field delimiter:
% cut -d : -f 5- file
This gives the output:
quux
five:six:seven
epsilon:zeta:eta:teta:iota:kappa:lambda:mu
A column is one character position. In this mode cut acts as a generalized  for  files substr function. Classic Unix cat cannot count characters from the   back of the line like Perl substr  function,   but rcut  can ). This type of  selection is specified with -c  option.  List entries can be open (from the beginning like in  -5, or to the end like in 6-), or closed (like 6-9). 
cut -c 4,5,20 foo # cuts foo at columns 4, 5, and 20.
cut -c 1-5 a.dat | more  # print the first 5 characters of every line in the file a.dat
cut -c -5 a.dat | more  #  same as above but using open range

Field selection mode

In this mode cut selects not characters but fields delimited by specific one character delimiter specified by  option -d.  The list of fields is specified with -f option ( -f [list] )
cut -d ":" -f1,7 /etc/passwd  # cuts fields 1 and 7 from /etc/passwd
cut -d ":" -f 1,6- /etc/passwd # cuts fields 1,  6 to the end from /etc/passwd
The default delimiter is TAB. If space is used as a delimiter, be sure to put it in quotes (-d " ").
Note: Another way to specify blank (or other shell-sensitive character) is  to use \  -- the following example prints the second field of every line in the file /etc/passwd
% cut -f2 -d\  /etc/passwd | more

Line suppresssion option

In field selection mode cut can suppress lines that contain no defined  in option -d delimiters (-s option). Unless this option is specified, lines with no delimiters will be included  in the output untouched

Complement selection (GNU cut only)

This is GNU cut option only. Option --complement converts the set of selected bytes, characters or fields to its complement.  It applies to the preceding option.  In this case you can specify not the list of fields of character columns to be retained, but those that needs to be excluded. In some cases that simplifies the writing of the selection range.
For example instead of the example listed above:
    cut -d ":" -f 1,6- /etc/passwd # cuts fields 1 and 6 to the end on the line from /etc/passwd
you can specify:
    cut -d ":" -f 2-5 --complement  /etc/passwd # cuts fields 1 and 6 to the end on the line from /etc/passwd
By using pipes and output shell redirection operators you can create new files with a subset of columns or fields contained in the first file.

Usage in Shell

Sometimes cut is used in shell programming to select certain substrings from a variable, for example:
echo Argument 1 = [$1]
c=`echo $1 | cut -c6-8`
echo Characters 6 to 8 = [$c]
Output:
Argument 1 = [1234567890]
Characters 6 to 8 = [678]
This is one of many ways to perform such a selection. In all but simplest cases AWK  or Perl are better tools for the job.  If you are selecting fields of a shell variable, you should probably use the set command and echo the desired positional parameter into pipe.
For complex cases Perl is definitely a preferable tool. Moreover several Perl re-implementations of cut exists: see for example Perl cut.
BTW Perl implementations are more flexible and less capricious that the C-written original Unix cut command.

Notes on syntax

As I mentioned before there are two variants of cut: the first in character column cut and the second is delimiter based (parsing) cut. In both cases option can be separated from the value by a space, for example
-d ' '
In other words POSIX and GNU implementations of cut uses "almost" standard logical lexical parsing of argument although most examples in the books use "old style" with arguments "glued" to options.   "Glued" style of specifying arguments is generally an anachronism.  Still quoting of delimiter might not always be possible even in modern versions for example most implementations of cut requires that delimiter \t (tab) be specified without quotes. You generally need to experiment with your particular implementation.
1. Character column cut
cut -c list [ file_list ]
Option:
-c list Display (cut) columns, specified in list, from the input data. Columns are counted from one, not from zero, so the first column is column 1. List can be separated from the option by space(s) but no spaces are allowed within the list. Multiple values must be comma (,) separated. The list defines the exact columns to display. For example, the -c 1,4,7 notation cuts columns 1, 4, and 7 of the input. The -c -10,50  would select columns 1  through 10 and 50 through end-of-line (please remember that columns are conted from one)2. Delimiter-based  (parsing) cut
cut -f list [ -d char ] [ -s ] [ file_list ]
Options:
d char The character char is used as the field delimiter. It is usually quoted but can be escaped.  The default delimiter is a tab character. To use a character that has special meaning to the shell, you must quote the character so the shell does not interpret it. For example, to use a single space as a delimiter, type -d' '.
-f list  Selects (cuts) fields, specified in list, from the input data. Fields are counted from one, not from zero. No spaces are allowed within the list. Multiple values must be comma (,) separated. The list defines the exact field to display. The most practically important ranges are "open" ranges, were either starting field or the last field are not specified explicitly (omitted).  For example:
  •  Selection from the beginning of the line to a certain field is specified as -N, were N is the number of the filed. For example
    -f -5
  •  Selection  from the certain filed to the end of the line (all fileds starting from N) is specified as N-. For example -f 5-
Specification can be complex and include both selected fields and ranges. For example, -f 1,4,7 would select fields 1, 4, and 7. The -f2,4-6,8 would select fields 2 to 6 (range) and field 8.

Limitations

Please remember that cut is good only for simple cases. In complex cases AWK and Perl actually save your time. Limitations are many. Among them:
  • Delimiter are single characters; they are not regular expressions. This leads to huge disappointment when you try to parse blank-delimited file with cut: multiple blanks are counted as multiple filed separators.
  • Syntax is irregular and sometimes tricky. For example one character delimiters can be quoted but escaped delimiters cannot be quoted.  
  • Semantic is the most basic. Cut is essentially a text parser and as such is suitable mainly for parsing colon delimited and similar files. Functionality does even match the level of Fortran IV format statement.
  1. [From AIX cut man page] To display several fields of each line of a file, enter:
    cut  -f 1,5 -d : /etc/passwd
    This displays the login name and full user name fields of the system password file. These are the first and fifth fields (-f 1,5) separated by colons (-d :). For example, if the /etc/passwd file looks like this:

    su:*:0:0:User with special privileges:/:/usr/bin/sh
    daemon:*:1:1::/etc:
    bin:*:2:2::/usr/bin:
    sys:*:3:3::/usr/src:
    adm:*:4:4:System Administrator:/var/adm:/usr/bin/sh
    pierre:*:200:200:Pierre Harper:/home/pierre:/usr/bin/sh
    joan:*:202:200:Joan Brown:/home/joan:/usr/bin/sh
    The cut command produces:

    su:User with special privileges
    daemon:
    bin:
    sys:
    adm:System Administrator
    pierre:Pierre Harper
    joan:Joan Brown
  2. [From AIX cut man page] To display fields using a blank separated list, enter:
    cut -f "1 2 3" -d : /etc/passwd
    The cut command produces:

    su:*:0
    daemon:*:1
    bin:*:2
    sys:*:3
    adm:*:4
    pierre:*:200
    joan:*:202
  3. [from The cut command  of ebook  Shell Scripting by Hamish Whittal] Since we're only interested in fields 2,3 and 4 of our memory, we can extract these using:
free | tr -s ' ' | sed '/^Mem/!d' | cut -d" " -f2-4 >> mem.stats   


Another example of more or less complex pipeline using cat
Below is command to find out number of connections to each ports which are in use using netstat & cut.
netstat -nap | grep 'tcp\|udp' | awk '{print $4}' | cut -d: -f2 | sort | uniq -c | sort -n
Below is description of each commands :: Netstat command is used to check all incoming and outgoing connections on linux server.  Using Grep command you can sort lines which are matching pattern you defined.  AWk is very  important command  generally used for scanning  pattern and process it. It is powerful tool for shell scripting.  Sort is used to sort output and sort -n is for sorting output in numeric order. Uniq -c this help to get uniq output by deleting duplicate lines from it.

The cut command has the ability to cut out characters or fields. cut uses delimiters.
The cut command uses delimiters to determine where to split fields, so the first thing we need to understand about cut is how it determines what its delimiters are. By default, cut's delimiters are stored in a shell variable called IFS (Input Field Separators).
Typing:
set | grep IFS
will show you what the separator characters currently are; at present, IFS is either a tab, or a new line or a space.
Looking at the output of our free command, we successfully separated every field by a space (remember the tr command!)
Similarly, if our delimiter between fields was a comma, we could set the delimiter within cut to be a comma using the -d switch:
cut -d ","
The cut command lets one cut on the number of characters or on the number of fields. Since we're only interested in fields 2,3 and 4 of our memory, we can extract these using:
free | tr -s ' ' | sed '/^Mem/!d' | cut -d" " -f2-4
Why do you need to set -d " " even when IFS already specifies that a spaces is a IFS ?
If this does not work on your system, then you need to set the IFS variable.
Detour:
Setting shell variables is easy. If you use the bash or the Bourne shell (sh), then:
IFS=" \t\n"
In the csh or the ksh, it would be:
setenv IFS=" \t\n"
That ends this short detour.
At this point, it would be nice to save the output to a file. So let's append this to a file called mem.stats:
free | tr -s ' ' | sed '/^Mem/!d' | cut -d" " -f2-4 >> mem.stats
Every time you run this particular command it should append the output to the mem.stats file.
The -f switch allows us to cut based upon fields. If we were wanting to cut based upon characters (e.g. cut character 6-13 and 15, 17) we would use the -c switch.
To affect the above example:
free | tr -s ' ' | sed '/^Mem/!d' | cut -c6-13,15,17 >> mem.stats
 

Unix Utilities
The output of the above programs are ideal for use with the standard unix utilities such

as egrep, cut, join and nawk. These may also be used to query the data files directly

although this is not very efficient. For example, the following stores all HIP identifiers of

entries in hip_main with a DSS chart in file hip.DSS:

cut -f2,70 -d | hip_main.dat | egrep D| cut -f1 -d " " >hip.DSS


The cut command has the ability to cut out characters or fields. cut uses delimiters.
The cut command uses delimiters to determine where to split fields, so the first thing we need to understand about cut is how it determines what its delimiters are. By default, cut's delimiters are stored in a shell variable called IFS (Input Field Separators).
Typing:
set | grep IFS            
will show you what the separator characters currently are; at present, IFS is either a tab, or a new line or a space.
Looking at the output of our free command, we successfully separated every field by a space (remember the tr command!)
Similarly, if our delimiter between fields was a comma, we could set the delimiter within cut to be a comma using the -d switch:
cut -d ","            
The cut command lets one cut on the number of characters or on the number of fields. Since we're only interested in fields 2,3 and 4 of our memory, we can extract these using:
free | tr -s ' ' | sed '/^Mem/!d' | cut -d" " -f2-4
            
Why do you need to set -d " " even when IFS already specifies that a spaces is a IFS ?
If this does not work on your system, then you need to set the IFS variable.

Detour:

Setting shell variables is easy. If you use the bash or the Bourne shell (sh), then:
IFS=" \t\n"            
In the csh or the ksh, it would be:
 setenv IFS=" \t\n"            
That ends this short detour.
At this point, it would be nice to save the output to a file. So let's append this to a file called mem.stats:
free | tr -s ' ' | sed '/^Mem/!d' | cut -d" " -f2-4 >> mem.stats
            
Every time you run this particular command it should append the output to the mem.stats file.
The -f switch allows us to cut based upon fields. If we were wanting to cut based upon characters (e.g. cut character 6-13 and 15, 17) we would use the -c switch.
To affect the above example:
free | tr -s ' ' | sed '/^Mem/!d' | cut -c6-13,15,17 >> mem.stats
            
First Example in stages: 1. For the next example I'd like you to make sure that you've logged on as a user (potentially root) on one of your virtual terminals.
How do you get to a virtual terminal? Ctrl-Alt plus F1 or F2 or F3 etcetera.
It should prompt you for a username and a password. Log in as root, or as yourself or as a different user and once you've logged in, switch back to your X terminal with Alt-F7. If you weren't working on X at the beginning of this session, then the Ctrl + Alt + F1 is not necessary. A simple Alt + F2 would open a new terminal, to return to the first terminal press Alt+F1.
2. Run the who command:
who
This will tell us who is logged on to the system. We could also run the w command:
w
This will not only tell us who is logged on to our system, but what they're doing. Let's use the w command, since we want to save information about what users are doing on our system. We may also want to save information about how long they've been idle and what time they logged on.
3. Find out who is logged on to your system. Pipe the output of the w command into the input of cut. This time however we're not going to use a delimiter to delimit fields but we're going to cut on characters. We could say:
w | cut -c1-8
This tells the cut command the first eight characters. Doing this you will see that it cuts up until the first digit of the second. So in my case the time is now
09:57:24 
and it cuts off to
09:57:2
It also cuts off the user. So if you look at this, you're left with USER and all the users currently logged onto your system. And that's cutting exactly 8 characters.
4. To cut characters 4 to 8?
w | cut -c4-8
This will produce slightly bizarre-looking output.
So cut cannot only cut fields, it can cut exact characters and ranges of characters. We can cut any number of characters in a line.
Second Example in stages: Often cutting characters in a line is less than optimal, since you never know how long your usernames might be. Really long usernames would be truncated which clearly would not be acceptable. Cutting on characters is rarely a long-term solution.. It may work because your name is Sam, but not if your name is Jabberwocky!
1. Let's do a final example using cut. Using our password file:
cat /etc/passwd
I'd like to know all usernames on the system, and what shell each is using.
The password file has 7 fields separated by a ':'. The first field is the login username, the second is the password which is an x (because it is kept in the shadow password file), the third field is the userid, the fourth is the group id, the fifth field is the comment, the sixth field is the users home directory and the seventh field 7 indicates the shell that the user is using. I'm interested in fields 1 and 7.
2. How would we extract the particular fields? Simple:[6]
cat /etc/passwd |cut -d: -f1,7
 cut -d  -f1,7 
 cut -d" " -f 1,7
 
If we do this, we should end up with just the usernames and their shells. Isn't that a nifty trick?
3. Let's pipe that output to the sort command, to sort the usernames alphabetically:
cat /etc/passwd | cut -d: -f1,7 | sort
                
Third example in stages So this is a fairly simple way to extract information out of files. The cut command doesn't only work with files, it also works with streams. We could do a listing which that would produce a number of fields. If you recall, we used the tr command earlier to squeeze spaces.
ls -al 
If you look at this output, you will see lines of fields. Below is a quick summary of these fields and what they refer to.
field number indication of
1 permissions of the file
2 number of links to the file
3 user id
4 group id
5 size of the file
6 month the file was modified
7 day the file was modified
8 time the file was modified
9 name of the file
I'm particularly interested in the size and the name of each file.
1. Let's try and use our cut command in the same way that we used it for the password file:
ls -al  |  cut -d' ' -f5,8
The output is not as expected. Because it is using a space to look for separate fields, and the output contains tabs. This presents us with a bit of a problem.
2. We could try using a \t (tab) for the delimiter instead of a space, however cut only accepts a single character (\t is two characters). An alternative way of inserting a special character like tab is to type Ctrl-v then hit the tab key.
^v +  
That would replace the character by a tab.
ls -al  |  cut -d"       " -f5,8
                
That makes the delimiter a tab. But, we still don't get what we want, so let's try squeezing multiple spaces into a single space in this particular output. Thus:
ls -la |  tr -s ' ' | cut -d' ' -f5,8
3. And hopefully that should now produce the output we're after. If it produces the output we're after on your system, then we're ready for lift-off. If it doesn't, then try the command again.
Now what happens if we want to swap the name with the size? I'll leave that as an exercise for you.



Example pipes
 line_count=`wc -l $filename | cut -c1-8`
 process_id=`ps -ef \
     | grep $process \
     | grep -v grep \
     | cut -f1 -d\ `
 upper_case=`echo $lower_case | tr '[a-z]' '[A-Z]'`
In all cases the pipeline has been used to set a variable to the value returned by the last command in the pipe. In the first example, the wc -l command counts the number of lines in the filename contained in the variable $filename. This text string is then piped to the cut command which snips off the first 8 characters and passes them on to stdout, hence setting the variable line_count.
In the second example, the pipeline has been folded using the backslash and we are searching for the process_id or PID of an existing command running somewhere on the system. The ps -ef command lists the whole process table from the machine. Piping this through to the grep command will filter out everything except any line containing our wanted process string. This will not return one line however, as the grep command itself also has the process string on its command line. So by passing the data through a second grep -v grep command, any lines containing the word grep are also filtered out. We now have just the one line we need and the last thing is to get the PID from the line. As luck would have it, the PID is the first thing on the line, so piping through a version of cut using the field option, we finally get the PID we are looking for. Note the field option delimiter character is an escaped tab character here. Always test the blank characters that UNIX commands return, they are not always what you would think they are.

Finding Files in Unix Filesystems

Finally, to show some of the flexibility of find, let's look at one example that is a bit more advanced. Suppose we were looking for all data files in the HP user home directory filesystems (which are named /u and /u2) that are over one million bytes long and were modified in the past 30 days. The command below, where the output of find is piped into a few other Unix commands for postprocessing, results in a mail message being sent to the issuer of the command, containing the desired information in a neat tabular form.
The full command is:

   find /u /u2 -type f -size +1000000c -mtime -30 -print | \
         xargs file | grep data$ | cut -d: -f1 | \
         xargs ls -aoq | cut -c16- | sort | mailx $LOGNAME

UNIX Is It For You Part II For The Programmer

Once you have mastered various UNIX programs, the power doesn't stop there. The UNIX shell lets you build sophisticated "pipelines" that send data from one program into another. As an example, let's find out the most common first name of all the users on a UNIX machine. In a single command, you can get a list of all user names from the file /etc/passwd, extract the first names, sort them, count adjacent identical names, sort the resulting numbers, and then find the largest:
Command: cut -d: -f5 /etc/passwd  \
    | cut -d' ' -f1  \
    | sort   \
    | uniq -c  \
    | sort -nr  \
    | head -1

 Response: 12 John

No comments:

Post a Comment