Whether reading input from a file (a script) or from the keyboard, the processing steps are the same. A few extra steps are done on most modern shells when the shell is run interactively. These include displaying prompts, history expansions, and line editing; these features don't apply to scripts. Also, the shell that is run first when a user logs in is call a login shell. A login shell runs a login script (other shells don't) before printing a shell prompt. Finally some shells have a restricted mode, which disables a number of features. The idea is to only provide a restricted shell for guest users and for system accounts that need to run shell scripts.
Most shells accepts command line arguments that change their behavior to a strictly POSIX compliant mode (or restricted, login, or interactive mode).
POSIX shells process input by following the POSIX Shell Command Language Introduction steps. Modern shells such as Bash, Korn shell, Z shell, etc., add several extra steps to support their extra features. The shell does a number of distinct chores, in this order:
-c command
option, or from stdin
(standard input).
Note if the first line of a file of shell commands starts with the characters
#!
, the results are unspecified (because some other
utility is then responsible for reading the file).
| & ; < > ( ) $ ` \ " ' <space> <tab> <newline>
The following characters need to be quoted under certain circumstances:
* ? [ # ~ = %
Ignoring quoting for the moment, the tokenizing can be explained this way:
The shell starts reading input one character at a time until a token is recognized.
The special characters listed above are tokens by themselves.
Words are just word tokens; however
at the start of processing a command some words are
recognized as keywords such as if
, while
, fi
,
and so on.
These same words are not keywords if seen elsewhere on the command line, or if quoted.
This explains why you need a semicolon or newline in front of
and then
in this example:
fi
if test -r foo; then ...; fi
Word tokens are separated by white-space. (The white-space separators are not tokens). Finally there is the maximum munch rule, used to resolve the ambiguous case when some sequence of characters may be interpted as either a single token or two tokens. Consider these examples:
& & 2 tokens && 1 token date&&cal 3 tokens echo hello 2> foo 4 tokens: echo, hello, 2>, foo echo hello 2>foo the same 4 tokens echo hello2>foo 4 tokens: echo, hello2, >, foo
Quoted characters are always word tokens, never operators or keywords.
The various quoting mechanisms are the escape character, single-quotes, and
double-quotes: 'x'
, "x"
('$
',
'\
', and '`
' still special), and \x
(when x
is a meta-character), which is sometimes called escaping.
From the SUS:
The backslash shall retain its special meaning as an escape character
[inside double-quotes] only when followed by one of the following
[5] characters [...]:
$ ` " \ newline
(Note '!
' acts weirdly (history expansion) in Bash inside of
double quotes from the command line, but not from in a shell script as
history is turned off.)
Another part of tokenizing is line joining.
If a line ends with (an unquoted) backslash, the \newline
is skipped and the next line is joined to the current one.
The \newline
doesn't separate tokens!
(Line joining applies inside double-quotes too.)
The end of a command is marked by a control operator token, one of:
& && ( ) ; ;; newline | || (and EOF)
These are assigned a precedence, to resolve the ambiguity of (say):
false && echo one ; echo two # What's the output?
(You can use command grouping to make this do what you want.)
;
,
a newline, etc.), |
), &&
or ||
), ;
or &
), if
statement, case
statement, etc.), and (Technically speaking, there are simple commands, function definitions, and everything else is a compound command.)
Note that only simple commands can be proceeded by variable assignments.
These are put into the environment of that command only.
So if FOO=one
then:
FOO=two echo $FOO # prints "one"! FOO=two sh -c 'echo $FOO' # prints "two"! FOO=two eval echo \$FOO # prints "two"! FOO=two for i in 1 2 do; echo $i; done # error! FOO=two (echo $foo) # error! (FOO=two env) |grep FOO # prints "FOO=two"! (FOO=two w | env) |grep FOO # prints "FOO=one"!
Some simple commands are
shell built-in commands, but these are in
no way different from other, non-built-in utilities (the term isn't even used
in the standard).
Any utility may be built-in (test
, echo
,
and printf
are common examples) or not.
But some other simple commands are called
special built-in commands, which must be built in.
There are two things that make special built-ins different from other utilities:
the shell exits when a special builtin encounters certain (syntax) errors, and
variable assignments preceding a special built-in persist after the builtin
completes.
(Using the command
command with a special built-in command suppresses
both of those.)
The special builtins are: break
, colon (
),
:
continue
, dot (
), .
eval
,
exec
, exit
, export
, readonly
,
return
, set
, shift
, times
,
trap
, and unset
.
At this point the shell will process each simple command separately, in order. If the shell hasn't read in enough input to find the end of the command, it goes back to the preceding (tokenizing) step and reads in more input. For each simple command the following is done:
Note that field splitting is never done on
variable assignments!
As long as the name=value
is recognized as a single
word in step 2, any expansions done on the value will result in a single word.
Consider:
foo=* # no quotes needed foo='x date' bar=$foo # no quotes needed
The order of steps c and d may be reversed when processing special built-ins.
What is the output of the following?
x=y echo $x x=y x=z sh -c 'echo $x' x=y : | x=z echo $x x=y : | x=z sh -c 'echo $x' env x=y echo $x
@
within double-quotes.
The expansions are done in this order (and are discussed in detail later):
~username
to the
absolute pathname of the home directory for
username
.
If the word is a bare tilde (~
) it expands to the
absolute pathname of the current user's home directory.)
$
)
with the results of a lookup in the environment.
Note if no such parameter is found this expands to nothing.
also note that words of the form $(stuff)
are treated specially.)
$(embedded command line)
and
`embedded command line`
by recursively
processing and running the embedded command, and replacing
the embedded command line with the standard output of the command.)
$((expression))
.)
FILES="file1 file2" ls $FILES # results in: ls file1 file2
But since tokenizing was already done,
will be one word and
cause an error!
So the expanded tokens (which are called fields) need to be
split into separate words.
(Demo: file1 file2
IFS=; FILES='foo bar'; ls $FILES
)
Running: IFS=: echo $PATH
doesn't work as
you might expect; PATH
is expanded before the new IFS
setting takes effect (in echo
's environment.
You can use eval
for this:
IFS=: eval echo \$PATH
Field splitting is controlled by the parameter IFS
.
If set to null (i.e.
) no field splitting is done.
Otherwise the shell treats each character of the parameter IFS=""
IFS
as a white
space character (or delimiter).
The results of unquoted expansions are split into fields (words) separated by runs of
such white space.
Any leading or trailing white space is skipped as well.
If IFS
is unset then the default delimiters are
<space>, <tab>, and <newline>.
For example, the input:
<newline><space><tab>foo<tab><tab>bar<space>
yields two fields, foo
and bar
.
Brace Expansions:
While not part of POSIX most modern shells also do brace expansion.
Brace expansion generates arbitrary strings, not filenames.
There are two forms.
A word with the form
will expand into one word for each item in list.
For example:
pre{list}post
vi file{1,2,3,}.txt
will expand to the line:
vi file1.txt file2.txt file3.txt file.txt
Note pre may not be
.
The list must include at least one unquoted comma.
$
The second form uses a range instead of a list between the braces:
vi file{1..10}.txt
A range can be integers or characters
(e.g.
).
ls /dev/sd{a..g}
Depending on the shell you can or cannot use shell special characters in list without quoting them; it depends on when the shell does the brace expansions. Bash does brace expansion between steps 4a and 4b; Zsh after step 4e (and field-splitting isn't done!); and ksh between steps 5 and 6.
Ksh supports more elaborate forms as well. Also, Zsh only permits a range of numbers.
-f
is in effect).
Note that unlike the other expansions, this can produce multiple words.
Also, this is the only expansion done after field-splitting.
So if “*.txt
” expands out to several
filenames, each name is one word even if it contains spaces.
1
to
n
, and the name of the command (or the name of
the script) as the positional parameter numbered 0
.
The environment for the new command is initialized from the shell's
current environment, modified by any I/O redirections
and variable assignments made.
$?
"
to that value.
Note the history mechanism works at some point (via the
readline
library on Linux), but is not part of
POSIX.
Since some characters have special meaning to readline
(such as '^
' and '!
'), these may
appear to be meta-characters sometimes and not meta-characters
at other times.
It depends on the shell in use and its history configuration,
if readline
is used, and your ~/.inputrc
file (which is used to configure readline
).
Apparently readline
knows about single-quote and backslash
quoting, but doesn't recognize double-quotes.