Parsing the command line with getopts

Normal *nix command line utilities take one or more one-letter options (either separate or combined after a single dash), some with a (single) required argument, followed by a list of arguments (often filenames).  For example:

cmd -ab -c c_arg file1 file2 file3

To allow the list of arguments to start with a dash, a special end of options option of “--” is often used.

When writing a script you often allow options and arguments.  Thus it is common to need to parse the command line to figure out what options and arguments were supplied.  In the past you needed to write your own parsing code.  Using a standard utility makes this task easier, for the most common cases.

For unusual cases, such as options mixed with arguments, or options that take more than one argument, you still need to write your own code.  The common technique is to use one loop to find and process all the options, by setting variables (often called flags) for each.   This is then followed by another loop to process the remaining arguments one at a time.  (Sometimes you have a series of if statements between, to ensure the set of options provided make sense together and don’t represent an error.

This sort of script is almost always similar to this:

opt_A=false; opt_B=false; opt_C=false; opt_C_arg=
while test $# -gt 0; do
  case "$1"
    -a|--longa) opt_A=true ;;
    -b|--longb) opt_B=true ;;
    -c|--longc) opt_C=true;
                opt_C_arg="$2"; shift ;;
    --longc=*)  opt_C=true;
                opt_C_arg="${1#--longc=}" ;;
    --) shift; break ;;
   -*) error ;;
   *) break;  # end of options
  esac
  shift
done

for arg  # for remaining non-option arguments
do
  ...body of script to process $arg goes here...
done

The problem with this is the user may combine options so “$1” may be set to “-ab”.  Also the space after an option and its argument may be missing (e.g., “-c10” instead of “-c 10”).  You wouldn’t care for your own, “one-off” script that you don’t plan on using or sharing in the future, but otherwise you need to use more complex code to handle these situations.

Writing your own parsing code may also be necessary for unusual syntax, such as options that take more than one argument, or options that can appear mixed with non-option arguments on the command line.  The standard tools don’t handle these cases.

POSIX requires a built-in shell utility called getopts.  This utility modifies the environment variables OPTARG and OPTIND and so must be built-in.  The POSIX standard requires the ability to parse short (one letter) options only. 

Use this utility in a shell script, in a loop like this:

getopts optstring name [arg...]

The set of option letters to allow (by your script) is specified by optstring.  Each time the getopts utility is invoked it puts the value of the next option in the shell variable specified by the name operand and the index of the next argument to be processed in the shell variable OPTIND.  (The shell initializes OPTIND to 1.)

When some option requires an option-argument, follow that option letter with a colon (“:”) in optstring.  getopts will put it in OPTARG . If no option was found, or if the option that was found does not have an option-argument, OPTARG is unset.

The end of options is indicated by parsing “--”, finding an argument that does not begin with a ‘-’, or encountering an error.  At the end getopts returns false (1) and OPTIND is set to one beyond the last option (A “--” is considered an option here).

If the first character in optstring is a colon, it disables diagnostics concerning missing option-arguments and unexpected option characters.

The following example (adopted from the Single Unix Spec man page) parses the command line, setting environment variables for each option (and arg) found, then prints the remaining arguments.  (Normally those are processed in another loop!)

mflag=

mval=

hflag=

iflag=

while getopts m:hi name  # with no other args parses $@

do case $name in

     h) hflag=1;;

     i) iflag=1;;

     m) mflag=1

        mval="$OPTARG";;

     *) echo "Usage: %s: [-hi] [-m value] args\n" $0

        exit 2;;

  esac

done

if [ "$hflag" ]; then echo "Option -h specified\n";  fi

if [ "$iflag" ]; then echo "Option -i specified\n";  fi

if [ "$mflag" ]; then
   echo 'Option -m "%s" specified\n' "$mval";  fi

shift $(($OPTIND - 1))

echo "Remaining arguments are: %s\n" "$*"

xxx

Gnu getopt        The Gnu utility getopt supports both long and short option forms, optional option arguments, and other extensions, but isn’t POSIX compatible.  Still, being Gnu it is commonly available and is worth learning.  getopt basically re-writes the command line arguments and options into a standard form, to allow parsing in a simple loop.  You use getopt with set and eval, to get the re-written args.  The use is as follows:

eval set -- $(getopt -n foo -o +m:ih -- "$@")

while test "$1" != "--"

do case "$1" in

      -h) : $((++H_OPT0) ;;

      -i) : $((++I_OPT)) ;;

      -m) : $((++M_OPT)); M_ARG="$2"; shift ;;

      *)  echo "***Illegal option: $1" >&2; exit 1 ;;

   esac

   shift  # delete "$1"

done

shift  # delete the "--"

echo "remaining args are: $*"

-n name means to use name when reporting illegal arguments.  Use “-o +shortOpts” to specify one-letter options.  (The “+” means to use POSIX compliant mode.)  Options that have a required argument should be followed with a colon; ones with an optional argument are followed by two colons.  (Note that no space is allowed between an option and it’s optional argument or getopt gets confused.)  To make getopt recognize long options use “-l longOptList” (e.g., “-l foo,bar”).  Use one or two colons as with shortOpts.  This is followed by “--” and the actual arguments to be parsed (usually “-- "$@"”).

Solaris also has a non-Gnu getopt command.  If you have a Solaris shell script using this older getopt, you can use /usr/lib/getoptcvt to read a shell script and converts it to use the POSIX standard shell built-in getopts instead.

Passing only some arguments to another program

Sometimes your shell script must parse the arguments, and select only some of them to be passed to a shell function, script, or another utility.  The problem is if you examine the arguments, and build a list of those you want to pass on, you have evaluated them once already.  Special characters such as quotes, dollar signs, pound signs, and so on may mess up your resulting command line.

You can use a trick in this case:

for arg  # for each command line argument
do if test "$arg" ...;  # should it be passed?
     set -- "$@" "$i"; # append to the end
   fi
   shift;  # done with arg
done
do-something-interesting "$@"

Most commands would have trouble with weird file names, such as “ARGC=0” or “-” (or sometimes “foo:bar”.  In a production-quality (robust) script, you could pre-process the command line arguments to your script, to convert all filenames to pathnames.  This will usually prevent any problems:

  for i
  do case $i in
     (/*) ;;
     (*) i=./$i;;
    esac
    set -- "$@" "$i"
    shift
  done
  awk '...' "$@"