Bash Command Line Processing Outline

Bash follows the POSIX Shell Command Language Introduction, but adds several extra steps to support the extra features of Bash.  The shell does a number of distinct chores, in this order:

  1. (Input)  The shell reads its input either from a file, from the -c command option, or from stdin (standard input).  Note if the first line of a file of shell commands starts with the characters "#!", the results are unspecified (because some other utility is then responsible for reading the file).
  2. (Tokenizing)  The shell breaks the input into tokens: words and operators separated by spaces (or some meta-characters).  This is a complex step!  One important part is quoting; this is used to remove the special meaning of meta-characters or words to the shell.  Quoting can be used to preserve the literal meaning of the following special characters (and prevent reserved words from being recognized as such):
    
      | & ; <  > (  ) $ ` \ " ' <space> <tab> <newline>
    

    The following characters need to be quoted under certain circumstances:

    
      *  ?  [  #  ˜   =  %
    

    The various quoting mechanisms are the escape character (back-slash) which quotes the following character and is sometimes called escaping, single-quotes(e.g. 'x') which quotes all enclosed characters, and double-quotes ("x") which quotes all enclosed characters except '$', '\', '`', and '\x' only when x is a meta-character).  Note '!' (history expansion) acts weirdly in bash inside of double quotes from the command line but not in a shell script (which has no history mechanism enabled).

    Another part of tokenizing is line joining.  If the current line ends with "\newline" these two characters skipped and the next line is joined to the current one.

    Comment removal is the last part of tokenizing the input.

  3. (Parsing / reserved word recognition)  After tokenizing the words are examined to see which (if any) are (unquoted) reserved words.  The shell uses this information in a later step when parsing the input into simple commands, pipelines, grouped command lists, and compound commands (such as if statements).
  4. (Expansions)  The shell performs several types of expansions on different parts of each command, resulting in a list of fields (or words), some to be treated as a command and the rest as arguments and/or pathnames (the command's parameter list).  Most expansions that occur within a single word expand to a single field.  (It is only pathname expansion that can create multiple fields from a single word.)  The single exception to this rule is the expansion of the special parameter '@' within double-quotes.

    The expansions are done in this order:

    1. Alias expansion is done if the command name word of a simple command is determined to be an unquoted, valid alias name.  (Note if an alias is expanded, processing starts again on the substituted line.)
    2. Tilde expansion  (Expands words of the form "~username" to the absolute pathname of the home directory for username.  If the word is a bare tilde ("~") it expands to the absolute pathname of the current user's home directory.)
    3. Parameter expansion  (Expands words that start with a dollar sign ("$") with the results of a lookup in the environment.  Note if no such parameter is found this expands to nothing.  also note that words of the form "$(stuff)" are treated specially by Bash.)
    4. Command substitution  (Expands "$(embedded command line)" and "`embedded command line`" by recursively processing and running the embedded command, and replacing the embedded command line with the standard output of the command.)
    5. Arithmetic expansion  (Expands words of the form "$((expression))".)
  5. Field splitting is performed on the fields generated by the previous step.  The shell treats each character of the IFS as a delimiter and splits the results of parameter expansion and command substitution into fields (words).  If the value of IFS is <space>, <tab>, and <newline> (collectively called whitespace) or if it is unset, then any whitespace at the beginning or end of the input is skipped and any sequence of whitespace characters within the input delimits a field.  For example, the input:
       <newline><space><tab>foo<tab><tab>bar<space>
    

    yields two fields, foo and bar.

    If the value of IFS is null then no field splitting is performed.

  6. Pathname (or wildcard)expansion is done next (unless the set option -f is in effect).
  7. Quote removal is performed next.  (If the complete expansion for a word results in an empty field, that empty field is deleted from the expanded command unless the original word contained single-quote or double-quote characters.)
  8. I/O redirection is performed.  Then any redirection operators and their operands are removed from the parameter list.
  9. The shell is (finally!) ready to execute the command (which may be a function, built-in, executable file, or script).  It sets up the environment first, giving the names of the arguments as positional parameters numbered 1 to n, and the name of the command (or the name of the script) as the positional parameter numbered 0.  The environment for the new command is initialized from the shell's current environment, modified by any I/O redirections and variable assignments made.
  10. After starting the command the shell optionally waits for the command to complete (unless it was started in the background) and collects the exit status, setting the variable "$?" to that value.

 

Note the history mechanism works at some point (via the readline library), but is not part of POSIX.  Since some characters have special meaning to readline (such as ‘^' and ‘!') these may appear to be meta-characters sometimes and not meta-characters at other times.  It depends on the shell in use and its history configuration, if readline is used, and your ~/.inputrc file which is to configure it.  Apparently readline knows about single-quote and backslash quoting but doesn't recognize double-quotes.