Email Infrastructure

 

Written by Wayne Pollock, Tampa Florida USA, ©2006–2016

Table Of Contents

  1. Overview
  2. Headers, Body, and Envelope
  3. Email Address Format
  4. Components of a Mail System
  5. How Mail Works on the Internet
  6. Protocols
  7. Mailing Lists
  8. Configuring an MDA

Overview:

Before email (short for electronic mail, or e-mail) when dinosaurs walked the Earth, it was difficult to avoid “phone tag”.  To leave someone a message, you needed to physically travel to their office and leave a slip of paper.  (There were no cell phones in those days, and even answering machines were rare.)  At some point it was realized that (at some organizations anyway) people had computer terminals in their offices, and that it was possible to leave a message in a file at a given location.  When a user returned to their office they could check if that file existed and contained any new messages.

To make this easier, a simple program was used.  To send email to someone, the original mail program could be used this way:  “mail user”.  Any text you typed thereafter would be appended to that user's mailbox file, sometimes called the inbox.  (The standard location back then was /usr/spool/mail/username.)  When done entering the message, you would indicate EOF by hitting Control+D.  (You could also pipe into this command.)

To read your mail (if any), a user would just type “mail”.  This would dump the contents of that user's mailbox file to the screen.  Over time this command became more sophisticated, to allow the mailbox to hold multiple messages, to automatically add a header with the sender's username and the time the message was sent.  This scheme allowed the mailbox to hold multiple messages since the header separated one from the next.  That file is called a mail folder (or mailbox), but is a single text file containing one message after another.

The newer mail command also displayed one message at a time and allowed options for the user to save the message, delete the message, and even to reply to it (that is, send a message back to the person who sent you a message).  The reply message usually has the same “subject” as the original, with “Re:” prepended.  Sometimes the original message is quoted in the new message body as well.

Email continues to be very popular.  The marketing research firm Radicati reports over 215 billion email messages are sent every day (in 2016).  That comes to over 2 million email messages sent or received every second!  (Sadly, other estimates indicate that over 90% of that is spam or other forms of malware, such as viruses.)

The invention of email is generally credited to V.A. Shiva Ayyadurai in 1978, while he was a high school student in New Jersey.  However, this isn't true at all; email was in use in the early 1970s.  Apparently, this guy invented a program he called “EMAIL”, and the press got confused.

Headers, Body, and Envelope:

An email message contains three parts: the envelope which identifies the sender and recipients, the headers, and the message body.  The headers and body together are referred to as the message.

It is important to understand that only the recipients listed on the envelope will receive an email message; mail servers never look at any headers to determine this!  Also, the envelope is not part of the message that gets saved in a user's mailbox.  So once some email message is delivered to you, you can't tell who else was listed on the envelope.

The special header mentioned above that identifies the start of an email message in a mailbox looks like this:

From sender date

(Note the space and no colon after “From ”.  This special “From ” header is part of the mailbox format common to most systems, known as MBOX or Berkeley mailbox format, and is not a standard email header at all.  See RFC-4155 and the mbox(5) man page for details.)  This “From ” header is generated automatically from the envelope from address.  Even if a “From ” header was provided, it is over-written by most mail servers with the real sender.  This header is always the first one of an email message.  When a mail program is reading a mailbox, an email message begins with this header and continues until the next “From ” header, or until the end of the file.

What happens if the message body contains a line starting with “From ”?  The mail program will typically insert an ASCII space in front of that line, a technique called space-stuffing.  When displaying messages the mail program removes the extra spaces.

Eventually other headers were allowed as well such as “Subject:”.  A problem is, how to tell the difference between the headers added by the mail program and the message entered by the sender?  The answer is to have all the headers at the beginning (hence the name), followed by one blank line, and that followed by the message entered by the user.  That part is known as the message body.

Recall the “To:” header does not determine to whom the email gets delivered.  The addresses passed to the mail server (the envelope addresses) and not the ones listed in any mail headers determine who receives the email.  Since the various headers in the message do not determine who receives the message, they may be faked easily.

There are many standard headers that can be used, such as:  Subject:, To:, From:, Cc: (carbon copy), Date:, etc.  The carbon copy list is the same as the “To:” list, just more recipients to add to the envelope.  The only difference is that somehow being listed on the To: list confers more status than being listed on the Cc: list.  Note there is no such thing as a Bcc: header; a “Blind carbon copy” recipient is listed on the envelope but not in any headers.  See RFC-2076 and RFC-4021 for a description of standard email headers.

Email Address Format:

Modern email addresses look like this:

username@hostname

The hostname is a host or computer on a network.  The “@hostname” part is optional; if missing usually localhost (that is, the current system) is assumed.  The hostname should be a valid DNS host name, or an IP address enclosed in square-brackets (e.g., “hymie@[192.0.2.3]”).  It can be another name defined in the DNS system, in an MX record.  A common example is to use an organizations domain name only and not the name of any particular host; for example “user@example.com” and not “user@mailserver.example.com”.

The username should be a valid account on that system (or a defined alias such as “webmaster”). 

The credit for the invention of this form of email address goes to Raymond Samuel Tomlinson, while working on an extension to the localhost only email program, SNDMSG.  The result was the first email program that could send messages to a user on another host.  Tomlinson wrote this for the early ARPANET.

Components of a Mail System:

There are a number of programs play a role in composing, reading, and delivering email:

MTA (Mail Transport Agent) —  Examples include Sendmail, Postfix, Exim, and Exchange.  The MTA is the software that accepts email from an MUA (see below) and then routes and forwards the email (several hops if necessary) to the destination MTA.  The destination MTA also must handles security issues such as rejecting email or sending a redirection message back, alias expansion, forwarding, relaying, etc.  An MTA that accepts mail destined for other MTAs is relaying email.  An MTA that does this without requiring sender authentication is called an open mail relay.  Spammers love these!

MDA (Mail Delivery Agent) —  An example is procmail.  The MDA handles final delivery issues such as virus scanning, spam filtering, return-receipt handling, automatic mail processing (by piping into some program), forwarding email to users and groups, sorting email into different mailboxes, etc.  The most common action is to simply append to user's inbox.  (Note some software such as Exchange or Sendmail includes both an MTA and MDA, and possibly additional software.)

The MDA must be configured with the location of a user's inbox.  The standard location is /var/mail/username, but this can be changed (how depends on which MDA you use).  Of course the MUA must also know that pathname, if it accesses the mailbox directly.  (If not, the MAA must be configured with that pathname, and the MUA must be configured to use the MAA.)  The MAIL environment variable is often used to tell an MUA this location.

MAA (Mail Access Agent) —  Examples include Courier and Cyrus.  When email is sent to a user, the MDA stores it on the server's hard disks.  Users rarely have login access to that mail server!  (YborStudent is an exception.)  Consider AOL mail, Yahoo mail, Gmail, or Hotmail.  Your mailboxes are stored on those remote servers.  Somehow you need to access those remote mailboxes with your local mail reading software (your MUA).  An MAA server is used in addition to the MTA to provide this remote access.  The user's MUA authenticates the user to the MAA which then downloads a copy of the user's mailbox (or selected messages only) to the MUA, where the user can then read, save, copy, print, or delete messages.  (Remember on some systems, notably Exchange server, the MTA, MDA, and even MAA are a part of a single program.)

MUA (Mail User Agent) —  Examples include alpine (was pine), mutt, Eudora, Outlook, Thunderbird, HotMail.com, mail, and mailx (or nail).  The MUA (also called an email client) is the software that allows you to compose, send, and read your email.  An MUA must be configured with the MTA to use to send mail and the MAA to use to fetch the mail.  (Older MUAs such as mailx can't use an MAA; they just read the mailbox file directly.)  In addition some MTAs and MAAs require usernames and passwords for authentication and may also require additional security configuration.

Some MUAs such as mail.yahoo.com, hotmail.com, and mail.google.com (or Gmail) are not installed on your local computer, but on a web server that users access with a web browser.  Even though they are installed on a server they are still just MUAs or email clients.

MSA (Mail Submission Agent) —  In the past, MUAs sent email directly to the outgoing MTA.  However, MUAs don’t always create complete, legal emails.  In between the MUA and the outgoing MTA can sit an MSA (RFC-6409).  The MSA (usually just another MTA, configured a bit differently) cleans up the email, adding/fixing any missing/broken headers, rewriting hostnames with the organization’s DNS name, adding automatic signatures (legal notices), scanning for viruses or any email that violates the organization’s policies, and finally, sends it to the outgoing MTA.

MFA (Mail Filtering Agent) —  This is not an official acronym, your instructor made it up.  (And it won’t be on a test.)  With about 90% of email currently rejected as spam or as containing viruses, a commonly used mail architecture today is to use an MFA to screen out and drop such email before the MTA and MDA must process it.  In this case the MDA won't also need to scan for viruses or spam (although it can, perhaps using a more sophisticated filter to catch spam that makes it through the bulk filter used in the MFA).  Typically, such a filtering agent is really a stripped down MTA that does all sorts of checks, and possibly handling return-receipts (delivery notices).  A typical MFA would be Amavisd-new.

HCC email filters screen for (and reject) viruses, obvious spam, and email with “zip” attachments.  Gmail filters screen for attached Windows executables.  (HCC used to use an MFA product from Sophos; now they use a service provided by Microsoft.)

How Mail Works on the Internet:

First you (let's call you “User A”) start up your MUA, compose a message, add some headers (such as Subject:), and state to whom the email should be sent (let's say “User B”).  When you are finished (and click on “Send”), the MUA (usually) adds some additional standard headers and then sends the mail message to your MSA, which scans and possibly modifies the email before sending it to the outgoing MTA.  The mail gets routed from MTA to MTA (nowadays very few hops are needed), with the MTAs in the middle relaying the mail to the next MTA along the path from the source to the destination.  (Each MTA that receives an email will add a “Received:” header to it.  By looking at these headers you can see the path a message took.)

The mail arrives and is accepted by the MTA at the destination (possibly filtered with an MFA first).  That MTA gives the mail to MDA which may filter the mail, forward it, sort it to different mail folders, ..., and finally deliver the mail to the user's inbox.

Note: when sending email to many users on the same system, all the users are listed on RCPT: line (envelope addresses, not a header) but the actual email is sent only once to that (remote) system.

The recipient is somehow notified of the arrived email, and eventually reads it.  The recipient uses their MUA to talk with their MAA.  The process looks like this:

diagram showing flow of email from sender to recipient

(Note the diagram shows the optional MSA and MFA.)  The MTAs work in a store and forward manner.  An MTA receives an email message and stores it on a (local) disk.  Then the message is relayed to another MTA or handed to the local MDA for delivery.  Various problems can cause messages to not get delivered.  This is usually known as a bounced message.

A mail store is where email is stored on disk, in one or more mailboxes (or mail folders).  The MDA stores email there, and the MAA (or sometimes the recipient’s MUA) fetches email from there.  The original (and still common) mailbox format is known as MBOX, which has one file for each mailbox or mail folder.  Newer formats include Maildir and Maildir++.  These (and other) formats use one folder (directory) per mailbox, with each message in a separate file.  (Microsoft Outlook uses propriety file formats such as “PST”).  Email may also be stored in databases.

Whenever your email is stored on a server, it is subject to some email policy (such as quota limits).  You can view HCC's email policy at www.hccfl.edu/oit/email-storage-policy.aspx.

Mailer-daemon is the usual name of an MTA or MDA when it generates error email messages to return to the sender.  Common causes include: bad user-name (destination MTA will bounce the email, which means return it to the sender with an explanation as to what happened), bad hostname (sender's MTA will bounce it), destination MTA is down (Sender's MTA—actually, the MTA immediately upstream of the destination—will try for a while and then send a warning.  If the destination server never responds then the MTA will eventually give up and will bounce the email).

biff is a mail notification program named after a BSD developer's dog.  (It's not true that Biff used to bark at the mailman, that's just a myth.)  Use the arguments “y” or “n” to enable or disable these notifications.  biff is annoying but if you have a GUI then xbiff is useful.  For TUI (or CLI) use the mail notification feature of the shell (most shells have such a feature), which only notifies you between running commands and not in the middle of some task the way biff does.

Protocols:

The different components in the mail system must communicate with each other, passing the mail messages and other information.  The rules of communication and the definition of message formats are called protocols.  There are a number of standard protocols defined so that different vendor's software can interoperate easily (as well as the different components of an email system).  The most important protocols are:

SMTP (Simple Mail Transport Protocol) —  ESMTP is the modern enhanced version.  (A good mnemonic for SMTP is “Send Mail To People”.)  MUAs use this protocol to talk with MTAs or MSAs; MTAs use it to talk to each other.  Interestingly this protocol is designed to be used interactively by humans!  ESMTP is defined by RFC-5321 and RFC-5322.  (It was originally defined by RFC-821 and RFC-822, which was replaced with RFC-2821 and RFC-2822.  It is still common to talk about “822” email.)

POP (Post Office Protocol) —  Sometimes called POP3 (since POP is such a popular acronym adding the version number makes the name stand out), this protocol is used between an MUA and an MAAPOP is popular with ISPs because it is simple and cheap to implement.  It allows you to send a username and password, then your entire mailbox is downloaded to your MUA.  The only option is whether to delete the mailbox contents from the server after downloading, or not.  POP is defined by RFC-1939.

POP3 assigns each mail message a unique ID called the UIDL, so it can tell which messages have been downloaded already.  In addition some vendor's have implemented an extension to POP3 called “XTND XMIT”, that allows clients to transmit outbound mail.  (Normally SMTP is used for that.)

IMAP (Internet Message Access Protocol) —  Like POP, IMAP is a protocol used between an MUA and an MAAIMAP is more powerful and flexible than POP but takes more resources so ISPs rarely offer it.  IMAP allows for selective message downloading and deleting, downloading of headers only, multiple mailboxes, and more.  IMAP is defined by RFC-3501.

Some vendor's use proprietary protocols to talk between their proprietary mail servers and MUAs.  Examples include Novell's “GroupWise”, IBM's “Lotus Notes”, and Microsoft's “Exchange”.  Some companies have reverse-engineered these protocols and claim to have compatible products but it is not a good idea to rely on those.  An “email gateway” is an MTA that translates between standard email protocols and some proprietary system's protocols.

One point to note about all these protocols is that the usernames and passwords are sent in plain text.  Variations of all three (ESMTP, POPS and IMAPS) allow for encryption to protect usernames and passwords (and the contents of the email messages).  However few ISPs support these protocols since they require more resources (and hence are more expensive to implement).

Mailing Lists:

A list of e-mail addresses identified by a single name such as students@hccfl.edu is called a mailing list.  When an e-mail message is sent to the mailing list name it gets sent to all the addresses in the list.  Each list also has a special email address for configuring your use of the list (for example, add or remove yourself from the list) and another address for the (human) owner of the list.  A good resource for learning about mailing lists is “Understanding Mailing Lists” by Harley Hahn (the author of our textbook).

Configuring an MDA:

Users can configure the procmail MDA to manage mail logs, to spam check, to filter email, to automatically reply to some mail, etc.  See the man pages for procmailrc and procmailex.  The file ~wpollock/.procmailrc contains examples of MDA return-receipts and spam filter using regular expressions, spamassassin (www.SpamAssassin.org).