Troubleshooting Web Page Problems

Some common problems with FTP, character encoding, and fonts

FTP Issues

FTP has two modes of operation: text and binary.  In text mode some characters may be changed when transferred.  (This is considered a feature and not a bug since text files have different end of line conventions on different platforms.)  For most files but especially graphics, applets, and other media, you need to use binary mode.  Oddly for text files the mode usually doesn't matter.  So make sure you uploaded your files in binary mode.

Another problem is having connection issues.  These can occur when you computer or your ISP has security barriers in place, such as virus scanners and firewalls.  (It is also common to make a typo in the hostname, username, or password, so always check that first!)  FTP is a strange protocol, invented before security was a big concern.  It requires two separate connections between the two computers.  The normal mode of operation is called active mode.  In active mode the second connection is made from the server back to your computer.  Most firewalls will block this!

To work around this issue you must use FTP in passive mode.  In passive mode your computer makes both required connections to the server, which most firewalls will allow.  So make sure passive mode (or passive transfers) is selected before trying to connect.

Character Encoding Issues

Another issue is the character encoding used in your web page.  All text is actually represented by a set of numbers, on all computers.  Which number represents which character is called the encoding or the character encoding.

When you use a text editor such as Notepad (or TextEdit) the text files are encoded using some platform default encoding.  On some modern systems this default is Unicode or UTF-16.  Sadly this doesn't match the default encoding used by most web browsers!  The result is the numbers are interpreted incorrectly and you see all sorts of junk on the screen.

There are many hundreds of different encoding schemes used in the world today.  Some of the common encodings are UTF-8 and 8859-1 (also called ISO-8859-1 or ISO Latin I). Windows systems have always used Microsoft encodings that Microsoft and IBM calls code pages.  By default Windows XP uses an encoding called CP-1252 or Windows-1252, and old DOS systems used CP-437.

Microsoft likes to call the default Windows encoding ANSI, perhaps to pretend it is some sort of national standard encoding.  (I guess it is sort of standard considering the number of Windows systems in use in the world today.)  I found this information at scripts.sil.org/IWS-Chapter03:

When Windows was being developed, the American National Standards Institute (ANSI) was in the process of drafting a standard that eventually became ISO 8859-1 Latin 1.  Microsoft created their codepage 1252 for Western European languages based on an early draft of the ANSI proposal, and began to refer to this as the ANSI codepage.  Codepage 1252 was finalised before ISO 8859-1 was finalised, however, and the two are not the same: codepage 1252 is a superset of ISO 8859-1.

Later, apparently around the time of Windows 95 development, Microsoft began to use the term ANSI in a different sense to mean any of the Windows codepages, as opposed to Unicode.  Therefore, currently in the context of Windows, the terms ANSI text or ANSI codepage should be understood to mean text that is encoded with any of the legacy 8-bit Windows codepages rather than Unicode.  It really should not be used to mean the specific codepage associated with the US version of Windows, which is codepage 1252.

I don't currently have a Mac or Vista but I am seeing a large number of student web pages encoded as Unicode (UTF-16) and I suspect that is the new default on at least one of these platforms.  Using a different encoding than the web browser expects will likely make your page look bad (or completely unreadable).

The fix is very simple:  Choose Save As... in Notepad and select an encoding such as UTF-8 or ISO-8859-1.  Then re-upload your web pages, making sure to use the binary mode transfers option.

It is possible to add an HTML tag to a web page to indicate the encoding used.  However some web servers over-ride that and tell the browser this page uses the XYZ encoding so setting it in the web page won't always help.  To indicate the encoding used on some web page, add the following tag in the HEAD section of the page:

    <meta http-equiv="Content-Type" content="text/html; charset=encoding">

And replace encoding with utf-8, iso-8859-1, windows-1251, utf-16, or whatever encoding you used to create that web page.  The official list of encoding scheme names can be found at: www.iana.org/assignments/character-sets.

To view a page that has a weird encoding you can tell the browser to use that encoding.  Under the View menu of your web browser you can change the encoding used by the browser.  When I see a page that doesn't look right I try ISO-8859-1 or UTF-8 and usually one of those will work fine.  UTF-16 uses two bytes per character, not one.  So when you see every other characters is a weird character (On my system a black diamond with a question mark in it) it is likely that it was encoded as UTF-16 and your web browser is set to iso-8859-1, utf-8, or Windows-1252.

(If your web pages look normal on your system it is because the web browser uses the system default encoding when viewing local files.  Once you upload your web pages the default encoding is set by the web server instead, usually UTF-8 or ISO-8859-1.)

Font Issues

A font (for the purpose of this discussion) is a collection of tiny graphics, each associated with a number in some encoding.  For example most fonts associate the number 65 with an upper-case letter A.  Since there are potentially millions of characters, a given font only has graphics for some subset of those characters (a few hundred).  If you see a box or a weird question mark symbol it is sometimes because you used some character that the current font doesn't have a graphic for.

This can be a problem since not all users have the same fonts installed.  In that case a web browser will substitute the unknown font for one that is installed.  So if you use a fancy font in a web page and it looks fine on your screen, it may look awful on some other user's system if they don't have those fonts installed!

Fonts generally are not free, so Microsoft, Red Hat, Apple, and other computer vendors pay a license fee for the fonts they bundle on their systems.  The result is different systems almost always have different sets of fonts installed on them.

The best advise is to use fewer fonts, ones that you believe will be available to your audience.  Provide an alternative font and make sure your web pages look okay in that default font.

This isn't intended as a full discussion of fonts but you should know there are font families that are fonts with similar characteristics.  You can specify the family to use if some specific font is not installed, and the system will pick an appropriate one.  Here's an example of specifying styles for paragraphs.  The style for paragraphs says to use the Georgia font, and if not available try Times New Roman instead, and if that isn't available either, to pick some default font in the serif font family.

<style>
p    { font: Georgia, "Times New Roman", serif; }
</style>

(The above goes into the HEAD section of a web page.)

See Fonts.htm for some more details on fonts.

URL Encoding Issues

Many characters that are legal in a filename are not legal in a URL or web link.  The most common problem is with spaces in the filenames.  It is easier to just use letters and digits (plus the extension) for naming files, then you don't need to worry.  (While many web browsers are forgiving about such errors and will try to guess what you meant, not all browsers are so nice.)

If you do include any unusual characters in your filenames, they should be encoded using what is called URL encoding or sometimes percent encoding.  You simply replace each special character with a percent sign followed by two hex (hexadecimal) digits.  The two digits indicate what the character was.  For example if you have saved an image file with the name New York.gif, the space must be encoded and the IMG tag would look something like this:

<IMG src="New%20York.gif">

You can view this URL encoding reference from w3schools.com for a list of characters and their encoded equivalence.  (Note even normal letters and digits can be encoded, but there is no point to doing that.)

Image Tag Issues

A common problem is having images not show up when you view your web page.  Here are several common reasons images might not show up:

  1. Using the wrong filename in an IMG tag.  If the image file is named foo.gif then you must have an IMG tag like this:
           <IMG SRC="foo.gif">

    If the file is really named Foo.gif, foo.jpg, foo.gif.gif, or anything else, the web browser won't be able to find it.  The file name used in the IMG tag must exactly match the file's actual name.

  2. The image isn't in the same folder as your HTML file.  If the images are on your desktop when your images.html file is in (say) My Documents, the images won't be found.
  3. Using a full (complete, or absolute) pathname or URL instead of just the filename.  You should only use the filename of the image when all the files are in the same folder.  Some students use C:\Documents and Settings\user\Desktop\image project\foo.gif, which is an absolute pathname.  This would be a big mistake!  The web page won't work when uploaded to Blackboard.com, and then downloaded to your instructor's computer.  Just use the name of the file itself.
  4. Something else might be wrong with your HTML file.  In that case, a web browser may give up before loading any images.  For example, missing a closing quote mark will fool most web browsers.
  5. Using weird characters in the names of the downloaded image files.  Many websites use web page and image files with bizarre names.  But you need not use the original name; you can rename the image files when you download them (only don't change the extension, usually one of .gif, .jpg, or .png).

    Use a simple, short name, that contains nothing except letters and digits (and the extension).  Note some folk have Windows set to hide extensions, so their files look like they have the name foo.gif, when they really have the name foo.gif.gif.  There is a way to turn off this Windows feature, so you can see the entire name's of files.