Monday, March 22, 2010

Quoting with the Bourne Shell


The first problem shell programmers experience is quotation marks. The standard keyboard has three quotation marks. Each one has a different purpose, and only two are used in quoting strings. Why quote at all, and what do I mean by quoting? Well, the shell understands many special characters, called meta-characters. These each have a purpose, and there are so many, beginners often suffer from meta-itis. Example: The dollar sign is a meta-character, and tells the shell the next word is a variable. If you wanted to use the dollar sign as a regular character, how can you tell the shell the dollar sign does not indicate a variable? Answer: the dollar sign must be quoted. Why? Quoted characters do not have a special meaning. Let me repeat this with emphasis.
Quoted characters do not have a special meaning
A surprising number of characters have special meanings. The lowly space, often forgotten in many books, is an extremely important meta-character. Consider the following:
rm -i file1 file2
The shell breaks this line up into four words. The first word is the command, or program to execute. The next three words are passed to the program as three arguments. The word "-i" is an argument, just like "file1." The shell treats arguments and options the same, and does not know the difference between them. In other words, the program treats arguments starting with a hyphen as special. The shell doesn't much care, except that it follows the convention. In this case, rm looks at the first argument, realizes it is an option because it starts with a hyphen, and treats the next two arguments as filenames. The program then starts to delete the files specifies, but firsts asks the user for permission because of the "-i" option. The use of the hyphen to indicate an option is a convention. There is no reason you can't write a program to use another character. You could use a forward slash, like DOS does, to indicate a hyphen, but then your program would not be able to distinguish between an option and a path to filename whose first characters is a slash.
Can a file have a space in the name? Absolutely. This is UNIX. There are few limitations in filenames. As far as the operating system is concerned, You can't have a filename contain a slash or a null. The shell is a different story, and one I don't plan to discuss.
Normally, a space delineates arguments. To include a space in a filename, you must quote it. Another verb used in the UNIX documentations is "escape;" this typically refers to a single character. You "quote" a string, but "escape" a meta-character. In both cases, all special characters are treated as regular characters.
Assume, for a moment, you had a file named "file1 file2," This is one file, with a space between the "1" and the "f." If this file is to be deleted, one way to quote the space is
rm 'file1 file2'
There are other ways to do the same. Most people consider the quotation mark as something you place at the beginning and end of the string. A more accurate description of the quoting process is a switch, or toggle. The following variations are all equivalent:
rm 'file1 file2' rm file1' 'file2 rm f'ile1 file'2
In other words, when reading a shell script, you must remember the current "quoting state." The quote toggles the state. Therefore if you see a quotation mark in the middle of a line, it may be turning the toggle on or off. You must start from the beginning, and group the quotation marks in pairs.
There are two other forms or quoting. The second uses a backslash "," which only acts to "escape" the next character. The double quotation mark is similar to the single quotes used above, but weaker. I'll explain strong and weak quotation later on. Here is the earlier example, this time using the other forms of quoting:
rm "file1 file2" rm file1 file2 rm file1" "file2

Nested quotations

A very confusing problem is placing quotation marks within quotation marks. It can be done, but it is not always consistent or logical. Quoting a double quote is perhaps the simplist, and does what you expect:
echo '"' echo """ echo "
The backslash is different. Look at the three variations:
echo '' echo "" echo

As you can see, single quotes and double quotes behave differently. A double quote is weaker, and does not quote a backslash. Single quotes are different again. You can escape them with a backslash, or quote them with double quotes:
echo ' echo "'"
The following does not work:
echo '''

It is identical to
echo '
Both examples start a quoting operation, but do not end the action. In other words, the quoting function will stay toggled, and will continue until another single quote is found. If none is found, the shell will read the rest of the script, until an end of file is found.

Strong versus weak quoting

Earlier I described single quotes as strong quoting, and double quotes as weak quoting. What is the difference? Strong quoting prevents characters from having special meanings, so if you put a character inside single quotes, what you see is what you get. Therefore, if you are not sure if a character is a special character or not, use strong quotation marks.
Weak quotation marks treat most characters as plain characters, but allow certain characters (or rather meta-characters) to have a special meaning. As the earlier example illustrates, the backslash within double quotation marks is a special meta-character. It indicates the next character is not, so it can be used before a backslash and before a double quotation mark, escaping the special meaning. There are two other meta-characters that are allowed inside double quotation marks: the dollar sign, and the back quote.
Dollar signs indicate a variable. One important variable is "HOME" which specifies your home, or starting directory. The following examples illustrates the difference:
$ echo '$HOME' $HOME $ echo '$HOME' $HOME $ echo "$HOME" /home/barnett $ echo "$HOME" $HOME
The back quote does command substitution. The string between backquotes is executed, and the results replaces the backquoted string:
$ echo 'The current directory is `pwd`' The current directory is `pwd` $ echo 'The current directory is `pwd`' The current directory is `pwd` $ echo "The current directory is `pwd`" The current directory is `/home/barnett` $ echo "The current directory is `pwd`" The current directory is `pwd`

Quoting over several lines

There is a large difference between the C shell and the Bourne shell when a quote is larger than a line. The C shell is best suited for interactive sessions. Because of this, it assumes a quote ends with the end of a line, if a second quoute character is not found. The Bourne shell makes no assumptions, and only stops quoting when you specify a second quotation mark. If you are using this shell interactively, and type a quotation mark, the normal prompt changes, indicating you are inside a quote. This confused me the first time it happened. The following Bourne shell example illustrates this:
$ echo 'Don't do this' > ls > pwd > ' Dont do this ls pwd $
This is a minor inconvenience if you use the shell interactively, but a large benefit when writing shell scripts that contain multiple lines of quoted text. I used the C shell for my first scripts, but I soon realized how awkward the C shell was when I included a multi-line awk script insice the C shell script. The Bourne shell's handling of awk scripts was much easier:

#!/bin/sh
# Print a warning if any disk is more
# than 95% full.
/usr/ucb/df | tr -d '%' | awk '
# only look at lines where the first field contains a "/"
$1 ~ /\// { if ($5 > 95) {
  printf("Warning, disk %s is %4.2f%% full\n",$6,$5);
 }
}'

Click here to get file: diskwarn.sh

Mixing quotation marks

Having two types of quotation marks simplifies many problems, as long as you remember how meta-characters behave. You will find that the easiest way to escape a quotation mark is to use the other form of quotation marks.
echo "Don't forget!" echo 'Warning! Missing keyword: "end"'

Quotes within quotes - take two

Earlier I showed how to include a quote within quotes of the same kind. As you recall, you cannot place a single quote within a string terminated by single quotes. The easiest solution is to use the other type of quotation marks. But there are times when this is not possible. There is a way to do this, but it is not obvious to many people, especially those with a lot of experience in computer languages. Most languages, you see, use special characters at the beginning and end of the string, and has an escape to insert special characters in the middle of the string. The quotation marks in the Bourne shell are not used to define a string. There are used to disable or enable interpretation of meta-characters. You should understand the following are equivalent:
echo abcd echo 'abcd' echo ab'c'd echo a"b"cd echo 'a'"b"'c'"d"
The last example protects each of the four letters from special interpretation, and switches between strong and weak quotation marks for each letter. Letters do not need to be quoted, but I wanted a simple example. If I wanted to include a single quote in the middle of a string delineated by a single quote marks, I'd switch to the different form of quotes when that particular character is encountered. That is, I'd use the form
'string1'"string2"'string3'
where string2 is a single quote character. Here is the real example:
$ echo 'Strong quotes use '"'"' and weak quotes use "' Strong quotes use ' and weak quotes use "
It is confusing, but if you start at the beginning, and following through, you will see how it works.

Placing variables within strings

Change the quoting mid-stream is also very useful when you are inserting a variable in the middle of a string. You could use weak quotes:
echo "My home directory is $HOME, and my account is $USER"
You will find that this form is also useful:

echo 'My home directory is '$HOME', and my account is '$USER
When you write your first multi-line awk or sed script, and discover you want to pass the value of a variable to the middle of the script, the second form solves this problem easily.

No comments:

Post a Comment