Email Tutorial

 

Written by Wayne Pollock, Tampa Florida USA, ©2006–2016

Table Of Contents

  1. Overview
  2. Headers, Body, and Envelope
  3. Email Address Format
  4. How Mail Works on the Internet
  5. MIME
  6. Using the alpine command line MUA
  7. Email Issues
  8. Mailing Lists
  9. Configuring an MDA
  10. Public Key Encryption And Digital Signatures
  11. Summary

Overview:

Before email (short for electronic mail, or e-mail) when dinosaurs walked the Earth, it was difficult to avoid “phone tag”.  To leave someone a message, you needed to physically travel to their office and leave a slip of paper.  (There were no cell phones in those days, and even answering machines were rare.)  At some point it was realized that (at some organizations anyway) people had computer terminals in their offices, and that it was possible to leave a message in a file at a given location.  When a user returned to their office they could check if that file existed and contained any new messages.

To make this easier, a simple program was used.  To send email to someone, the original mail program could be used this way:  “mail user”.  Any text you typed thereafter would be appended to that user's mailbox file, sometimes called the inbox.  (The standard location back then was /usr/spool/mail/username.)  When done entering the message, you would indicate EOF by hitting Control+D.  (You could also pipe into this command.)

To read your mail (if any), a user would just type “mail”.  This would dump the contents of that user's mailbox file to the screen.  Over time this command became more sophisticated, to allow the mailbox to hold multiple messages, to automatically add a header with the sender's username and the time the message was sent.  This scheme allowed the mailbox to hold multiple messages since the header separated one from the next.  That file is called a mail folder (or mailbox), but is a single text file containing one message after another.

The newer mail command also displayed one message at a time and allowed options for the user to save the message, delete the message, and even to reply to it (that is, send a message back to the person who sent you a message).  The reply message usually has the same “subject” as the original, with “Re:” prepended.  Sometimes the original message is quoted in the new message body as well.

Email continues to be very popular.  The marketing research firm Radicati reports over 182 billion email messages were sent every day in 2013, over 215 billion in 2016, and over 290 billion in 2019.  That comes to over 3.3 million email messages sent or received every second!  (Sadly, other estimates indicate that over 90% of that is spam or other forms of malware such as viruses.)

The invention of email is generally credited to V.A. Shiva Ayyadurai in 1978, while he was a high school student in New Jersey.  However, this isn't true at all; email was in use in the early 1970s.  Apparently, this guy invented a program he called “EMAIL”, and the press got confused.

Headers, Body, and Envelope:

An email message contains three parts: the envelope which identifies the sender and recipients, the headers, and the message body.  The headers and body together are referred to as the message.

It is important to understand that only the recipients listed on the envelope will receive an email message; mail servers never look at any headers to determine this!  Also, the envelope is not part of the message that gets saved in a user's mailbox.  So once some email message is delivered to you, you can't tell who else was listed on the envelope.

The special header mentioned above that identifies the start of an email message in a mailbox looks like this:

From sender date

(Note the space and no colon after “From ”.  This special “From ” header is part of the mailbox format common to most systems, known as MBOX or Berkeley mailbox format, and is not a standard email header at all.  See RFC-4155 and the mbox(5) man page for details.)  This “From ” header is generated automatically from the envelope from address.  Even if a “From ” header was provided, it is over-written by most mail servers with the real sender.  This header is always the first one of an email message.  When a mail program is reading a mailbox, an email message begins with this header and continues until the next “From ” header, or until the end of the file.

What happens if the message body contains a line starting with “From ”?  The mail program will typically insert an ASCII space in front of that line, a technique called space-stuffing.  When displaying messages the mail program removes the extra spaces.

Eventually other headers were allowed as well such as “Subject:”.  A problem is, how to tell the difference between the headers added by the mail program and the message entered by the sender?  The answer is to have all the headers at the beginning (hence the name), followed by one blank line, and that followed by the message entered by the user.  That part is known as the message body.

Recall the “To:” header does not determine to whom the email gets delivered.  The addresses passed to the mail server (the envelope addresses) and not the ones listed in any mail headers determine who receives the email.  Since the various headers in the message do not determine who receives the message, they may be faked easily.

There are many standard headers that can be used, such as:  Subject:, To:, From:, Cc: (carbon copy), Date:, etc.  The carbon copy list is the same as the “To:” list, just more recipients to add to the envelope.  The only difference is that somehow being listed on the To: list confers more status than being listed on the Cc: list.  Note there is no such thing as a Bcc: header; a “Blind carbon copy” recipient is listed on the envelope but not in any headers.  See RFC-2076 and RFC-4021 for a description of standard email headers.

Email Signatures

It is often the case where you need to add your name, title, contact information, and sometimes a legal notice, to some or to every email message you send.  This information is known as a signature block, signature line, sig block, or just a signature.  (This should not be confused with digital signatures, discussed below.)  That can get tedious to type in for each message!  Many mail programs (MUAs) and mail servers (MTAs) have a feature where you can set a signature to be automatically appended to the body of all email messages.

There are rules of “netiquette” (network etiquette) for email signatures.  They should always begin with a line only containing two dashes and a space.  This signature separator is called sig dashes, signature cut line, or sig-marker.  (It is recognized automatically by most email programs, which can treat signatures specially when replying to a message and quoting the message body.)  The other rules are that the signature should be plain text, with no more than 4 lines; each line should be at most 80 columns long.

There are many rules of “netiquette” for the body of email.  When using traditional, plain text email there are rules for formatting signatures, quoting material, line length, line wrapping, and so on.  These are defined in RFC-3676 (Text/Plain Format).  This also defines when to use “space stuffing” (adding a space to lines that start with “From ”, a space, or a “>”).

One useful convention you should follow is to quote URLs with angle-brackets (“<” and “>”).  This allows an MUA to recognize a URL even when wrapped over multiple lines.

Email Address Format:

Modern email addresses look like this:

username@hostname

The hostname is a host or computer on a network.  The “@hostname” part is optional; if missing usually localhost (that is, the current system) is assumed.  The hostname should be a valid DNS host name, or an IP address enclosed in square-brackets (e.g., “hymie@[192.0.2.3]”).  It can be another name defined in the DNS system, in an MX record.  A common example is to use an organizations domain name only and not the name of any particular host; for example “user@example.com” and not “user@mailserver.example.com”.

The username should be a valid account on that system (or a defined alias such as “webmaster”). 

The credit for the invention of this form of email address goes to Raymond Samuel Tomlinson, while working on an extension to the localhost only email program, SNDMSG.  The result was the first email program that could send messages to a user on another host.  Tomlinson wrote this for the early ARPANET.

How Mail Works on the Internet:

First you (let's call you “User A”) start up your MUA (email client software), compose a message, add some headers (such as Subject:), and state to whom the email should be sent (let's say “User B”).  When you are finished (and click on “Send”), the program (usually) adds some additional standard headers and then sends the mail message to your outgoing email server, either an MSA (Mail Submission Agent) or an MTA (Mail Transport Agent), which may scan or modify the email before sending it out.  The mail gets routed from one mail server to another (nowadays very few hops are needed), with the servers in the middle relaying the mail to the next server (or “hop”) along the path from the source to the destination.  (Each mail server that receives an email will add a “Received:” header to it.  By looking at these headers you can see the path a message took.)

The mail arrives and is accepted by the recipient's mail server.  The mail may be filtered, forwarded, or sorted into different mail folders, or simply delivered to the user's inbox.

Note: when sending email to many users on the same system, all the users are listed on envelope but the actual email is sent only once to that (remote) system.

The recipient is somehow notified of the arrived email, and eventually reads it.  The recipient uses their email client software to talk with their mail server.  The process looks like this:

diagram showing flow of email from sender to recipient

(Note the diagram shows the optional MSA and MFA.)  The MTAs (mail servers) work in a store and forward manner.  An MTA receives an email message and stores it on a (local) disk.  Then the message is relayed to another MTA or handed to the local MDA (Mail Delivery Agent, the software that actually delivers email once received) for delivery.  Various problems can cause messages to not get delivered.  This is usually known as a bounced message.

Email is submitted, transported, accepted, delivered, and read using a number of server components.  These are collectively known as mail servers.  All of the pieces, the MUAs, MTAs, MSAs, MDAs, and MAAs, communicate with each other using various protocols.  These protocols can be proprietary or public; for example the common public protocol that MTAs understand is known as SMTP (simple mail transport protocol), or the modern enhanced version (ESMTP).

Mailer-daemon is the usual name of an MTA or MDA when it generates error email messages to return to the sender.  Common causes include: bad user-name (destination MTA will bounce the email, which means return it to the sender with an explanation as to what happened), bad hostname (sender's MTA will bounce it), destination MTA is down (Sender's MTA—actually, the MTA immediately upstream of the destination—will try for a while and then send a warning.  If the destination server never responds then the MTA will eventually give up and will bounce the email).

A mail store is where email is stored on disk, in one or more mailboxes (or mail folders).  The MDA stores email there, and the MAA (or sometimes the recipient’s MUA) fetches email from there.  The default, standard location for a user's incoming email is a file named /var/mail/name, but this can be configured to a different location (as long as all the software used, MUA, MDA, and MAA, uses the same location and format).  The original (and still common) mailbox format is known as MBOX, which has one file for each mailbox or mail folder.  Newer formats include Maildir and Maildir++.  These (and other) formats use one folder (directory) per mailbox, with each message in a separate file.  (Microsoft Outlook uses propriety file formats such as “PST”).  Email may also be stored in databases.

Whenever your email is stored on a server, it is subject to some email policy (such as quota limits).  You can view HCC's email policy at www.hccfl.edu/oit/email-storage-policy.aspx.

The various mail components communicate between each other using protocols.  Examples of some used for email are SMTP, POP, and IMAP.  (Demo SMTP protocol by sending to wpollock@hccfl.edu.  Show email “Received” headers to show the hops taken, by logging into Outlook, select the email, Actions→View original message.  In the pop up window, click on the Message Details button.)

MIME:

In the old days email was plain ASCII text, which takes only seven bits of each byte.  Much of the early Internet dropped the 8th bit of every byte to gain a 12.5% speedup.  Naturally this won't work with binary files such as GIFs, binary data files, or programs.  So these needed to be encoded (in essence adding a zero bit after every seventh bit) when sent, and decoded by the recipient.  Here's how this was done:

uuencode filename filename > file.uu
mail recipient < file.uu

The encoded file is copied into the body of the outgoing email.  Once delivered the recipient would have to save the body, edit it to remove all but the encoded file, and decode the file:

mail
... save received email body in “file.uu” ...
uudecode file.uu

What a pain!  Besides this problem, plain text email is... plain-looking.  Early business adopters of email wanted better looking email with features such as bold, italics, underline, justified text, and color.

MIME (Multipurpose Internet Mail Extensions) is a protocol (actually, an encoding, or type of formatting, of the email body) that MUAs use to provide styled text demanded by business users of early email.  Today that isn't important (as we now use HTML for email with styles, graphics, and fancy formatting).  But most importantly MIME supports multi-part email messages.  This is when the body of the message is split into several parts, separated by a MIME separator string, where each part contains its own headers and is automatically encoded and decoded.  Each of these parts are known as an attachment.  Note that MIME is invisible to MTAs, MDAs, and MAAs, which only see a single message body with some weird stuff in it.  (Virus scanners do know about attachments, of course.)

Today much of the Internet uses all eight bits of a byte, but not all of it so encoding is still used.  MIME uses a technique known as Base-64 encoding (RFC-4648)MIME is defined by RFC-2045.

View a sample email message that uses MIME attachments.

Even though HTML or PDF attachments make for attractive email, the information they contain should always be available as plain text in the body.  Aside from security concerns about opening attachments (even simple graphics), the bodies of email messages can be searched; attachments generally cannot.

Using the alpine command line MUA:

Many old command line MUAs exist, the most popular of those is mailx.  However old mailx doesn't know about MAAs or MIME.  An updated compatible MUA nail is available (nail = new mail?).  nail is often installed under the name mailx on Linux and some Unix systems.

These older MUAs are still valuable because some of them (mailx but not alpine) can be used non-interactively to send mail from a shell script, and because system administrators often use command line access to Unix/Linux servers via SSH and may need to read or send mail directly from that server.

The use of a MUA such as alpine should be easy to learn since it is menu-driven.  alpine is the current version of the pine MUA.  (The developers wanted to change the license before continuing development, and they couldn't with the old name.)

In alpine, at any point you can examine the menus to see what you can do.  These are context-sensitive menus.  For instance hitting ^J (control+J) when a header field is highlighted means to add an attachment; if the message body is highlighted this means to justify the text.  In message body use ^R to read a file and paste its contents into the mail body.

There are three ways to get stuff into the body of an email message you're composing, aside from typing it in:

Email Issues:

Mailing Lists:

A list of e-mail addresses identified by a single name such as students@hccfl.edu is called a mailing list.  When an e-mail message is sent to the mailing list name it gets sent to all the addresses in the list.  Each list also has a special email address for configuring your use of the list (for example, add or remove yourself from the list) and another address for the (human) owner of the list.  A good resource for learning about mailing lists is “Understanding Mailing Lists” by Harley Hahn (the author of our textbook).

Configuring an MDA:

Users can configure the procmail MDA to manage mail logs, to spam check, to filter email, to automatically reply to some mail, etc.  See the man pages for procmailrc and procmailex.  The file ~wpollock/.procmailrc contains examples of MDA return-receipts and spam filter using regular expressions, spamassassin (www.SpamAssassin.org).

Public Key Encryption and Digital Signatures:

While you have no guarantee of privacy with your email, you are allowed (in the U.S. anyway) to protect your email by encrypting the message.  Such a message can't be read or tampered with by unauthorized parties.

The older technology for encryption is called symmetric (or shared) key: you and I share a key (password).  Qu: how do we do that securely?  Qu: what about doing business say with Amazon.com using this?

The old method is efficient but the problems are too difficult to make this technology useful on a wide scale.  A newer technology is called public key encryption.  With this method a pair of keys is made for each party.  One is kept secret (the private key) and one is published (in email messages, in flyers, on web sites, on key servers, etc.) called the public key.  To send a message to you I encrypt the message with your public key.  Only you can decrypt it since this requires your private key and only you have it.

You reply to me by encrypting your reply with my public key (which only I can decrypt, using my private key).  As you can see, four keys are used altogether.  Public key encryption is the technology behind secure web sites that we all rely on (the web sites using the HTTPS protocol).

Public key encryption is much, much slower than symmetric key encryption.  To make this technology practical, rather than encrypt a lengthy email message (body) a very large random number is generated by the sender.  This number is used as a symmetric key and the message is encrypted with it.  Only this session key gets encrypted using the public key method.

A digital signature is created for a message by encrypting it with the sender's private key.  This doesn't protect the confidentiality of the message since anyone with the sender's public key can decode the message.  (If privacy is also desired, the sender encrypts this encrypted message with the recipient's public key.)  If the sender's public key decodes the message correctly, it is strong proof that their private key was used to encrypt it in the first place.  Since only the sender has their private key, only the sender could have sent the message.  Note this encryption with the sender's private key is a digital signature; a GIF graphic of a hand-written signature is not!  (View sample digitally signed email.)

The U.S. federal government now treats digital signatures just as binding as a hand-written (or holographic) signature.  See the Electronic Signatures in Global and National Commerce (ESIGN) Act passed in June of 2000, for details.  Also many state governments have passed laws treating digitally signed emails as equivalent to documents signed holographically (by a person).

In practice this takes too long (even on modern computers) so a checksum (or digest or hash) of the message is encrypted with the private key instead, and this is appended to the (unencrypted) message.  The recipient also computes a message digest of the email, then decrypts the message digest sent with the message body and compares the two.  If the two digests differ the message was altered or forged.

Because the private keys used to encrypt email must often be made available to organizations (to comply with laws), separate sets of keys are often used for digital signatures and for email encryption.

Issue of Trust:  A public key can be digitally signed by a trusted third party (such as VeriSign).  This third party has a well-known public key.  Most web browsers and email clients come with a built-in list of such well-known public keys.

People can use PGP/GPG (Pretty Good Privacy, Gnu Privacy Guard, both written by Phil Zimmerman) to encrypt, decrypt, compute message digests, and to digitally sign messages.  PGP was written first; later versions were renamed GPG.  GPG is sometimes called GnuPG.  The standard used by any of these programs is called OpenPGP; most people use all these terms interchangeably.  (See the man page for gpg for details.)

GPG provides easy email integration with modern MUAs.  But to use this technology people must generate a pair of keys and publish their public key.  (You also need a third party to sign your key so others will trust it.)  These complexities have hampered the widespread adoption of encryption and digital signatures.  You can find easy to follow, step by step directions for using GPG with Thunderbird MUA at EmailSelfDefense.FSF.org.

The secure web sites with URLs such as HTTPS://www.example.com/ work by exchanging public keys between server and browser, which then verifies these by using the trusted third party's public key to validate the key.  Next the browser and web server exchange a session key (a big random number encrypted with the public keys) to use for symmetric key encryption for the rest of that session.  (This is an over-simplification of what really happens but should provide some idea of the process.)

Summary and Study Guide:

  1. The original (command line) Unix command to read email is mail.  The modern version of this is called mailx.  These tools are still useful to allow mail to be send from a shell script (although some modern tools allow that as well).
  2. A mail store is where email is stored on disk.  The original (and still common) mailbox format is known as MBOX.  Newer formats include Maildir and Maildir++.  (Microsoft Outlook uses a propriety file format “PST”).  Email may also be stored in databases.  Whenever your email is stored on a server, it is subject to some policy (such as quota limits).
  3. An MBOX mailbox or mail folder is a single text file containing one mail message after another.  Other types of mail stores use one folder per mailbox, or just store email in databases.
  4. The standard location for a user's inbox is /var/mail/username.  This can be changed by configuring the MDA, MDA, and MAA (all of which need to know where the email is kept).  The MUA must also know this location; the MAIL environment variable is often used for this.
  5. An email message contains three parts: The envelope, headers, and the body.  The headers plus the body constitute the actual message (that is, the envelope is not considered part of the mail message).
  6. The envelope addresses contain the list of recipients (and the sender) of an email message, not the headers.  An MTA will ignore the headers.  However the envelope is not stored (or even delivered).  Only the headers and body (the mail message) is saved.
  7. The body is the content of the mail message, and includes any attachments.
  8. The body may also include a signature block (or signature), containing the sender's contact information and possibly a legal notice.  Many MUAs (email clients) have a feature to define and automatically include a signature block on each message.
  9. When an email message is sent to several recipients on the same host, only one copy of the email message is sent to that host, which then makes and delivers copies to each recipient.
  10. For mailboxes in MBOX format, each message starts with a special header “From user date”.
  11. The headers and the body are separated by a blank line.  There are many standard headers used such as From:, To:, Cc:, Subject:, etc.  (However there is no Bcc: header.)
  12. An email address has two parts separated with the at-sign: “username@hostname”.  The “@hostname” part is optional; localhost is assumed.
  13. The hostname in an email address should be a valid hostname, IP address, or some name defined in a DNS MX record (often a domain name).
  14. An MTA is commonly called a mail server.  It accepts email from an MUA or another MTA and either rejects the email, hands it to an MDA for local delivery, or relays the email to another MTA.
  15. An MDA (the delivery agent, often included with the mail server software) accepts email from the local MTA (or in rare cases, from a local MUA) for final delivery processing.  The most common action is to simply append the email to the user's inbox, but other processing is possible.
  16. A MAA provides remote users access to their mailboxes.  The user's MUA must be configured to fetch email from the MAA or from a local mailbox.
  17. Each MTA that relays a message will add “Received:” header to the front of the email.
  18. Email servers use a store and forward design.  If some message can't be delivered at once an MTA will keep trying for a time, before eventually giving up.  In such cases you will receive a bounce notice (and email message) from some MTA.  Often the sender of such messages is identified as mailer-daemon
  19. Standard protocols are used to allow communication between different vendor's systems, and between different parts of the email system.  The public protocol used by MTAs is ESMTP.
  20. Email was designed to send ASCII text only, and any non-text messages must be encoded or they may (but not definitely) become corrupted.
  21. The uuencode and uudecode utilities were common, now MIME's Base-64 encoding is used.
  22. MIME defines formatting codes that a MIME-compliant MUA will see to display email with various styles.  These codes aren't much used, however MIME defines a multi-part message that can contain attachments.  These attachments can include HTML or other data, which will be encoded and decoded automatically by your MUA.
  23. A common command line MUA called alpine has an easy to use menu-driven interface.  Although it supports attachments, there are several ways to copy files (or the output of commands) into the body of an email message.
  24. Spam is unwanted and unsolicited email, sometimes called UBE or UCE.  Today about 80%–90% of all email received by an MTA is considered spam.  Good (non-spam) email is sometimes called ham.  One type of spam is email that seems legitimate, and tricks you into clicking a link (in the email body).  This is called phishing.
  25. There is no expectation of privacy for email sent across the Internet (or even within a single company.  In the U.S. and some other countries it is legal to use encryption technology to ensure privacy.
  26. Legal and ethical issues of email vary from country to country and culture to culture.
  27. A mail-bomb may be a denial of service in which an attacker fills a victim's mailbox, preventing any other emails from being delivered. 
  28. Another meaning of mail-bomb is an email with malicious software in it, that runs when you read the email.
  29. A sender can request a return-receipt from the recipient's MTA or MDA (a delivery receipt) or from the recipient's MUA once the email is viewed (a read receipt).  The request is just some extra headers in the email message.  The recipient can respond or ignore these, or even send them for every email (even when not requested).
  30. A mailing list is a single name that represents many email addresses.  Sending email to this mailing list name will send the message to all addresses in the list.
  31. Today privacy of emails is ensured with encryption.  One technology where the sender and recipient share a single password, called a key, to encrypt and decrypt a message is known as symmetric or shared key encryption.
  32. A modern technique is to have each party (sender and recipients) generate a pair of keys each.  One of these keys is kept secret and safe (the secret key, the other is freely shared (the public key).  This technology is called public key encryption.  To send a private message from user A to user B, user A will encrypt the email (body) with user B's public key.  User B uses the private key to decrypt the message.
  33. In practice public key encryption and decryption is very slow, so these methods are used to exchange a shared key between the sender and recipient, and to encrypt the message with this shared key.  This shared key is just a huge random number, called a session key, and is used once only.
  34. A digital signature is an encryption of a message with the sender's private key (before encrypting with the recipient's public key).  Keys are normally digitally signed by a trusted third party (e.g., VeriSign) so the recipients can have confidence senders are who they claim to be.
  35. To speed up digital signatures, a hash or message digest (similar to a checksum) of the message is encrypted with the private key, not the whole message.
  36. Popular encryption software was called PGP, which later evolved into Gnu privacy guard, or GnuPG or just gpg.