The term font
is vague and different people use it in
different ways.
For now assume a font an array of tiny
graphics created by an artist to
share a particular look, and to map
to letters, digits,
and other characters.
(For the real story search the Internet and visit
Unicode.org and see the
javadoc API comments for java.awt.Font
.)
These graphics are called glyphs
.
Glyphs are to characters what numerals are to numbers:
a visual representation of an abstract concept (e.g. the
letter A
).
Many different glyphs can represent the same character;
they just look different.
Type designers knew some things about how humans read text, and
devised serif
fonts which are letter shapes
composed of lines (or strokes
) of varying
thickness and small extra lines on the ends of the strokes.
Text of such fonts is much easier to read and pretty much all books
and magazines use serif fonts for body text.
(As should you!)
Text without the extra lines, and often drawn with lines of constant
thickness, are called sans-serif
(sans
is
French for without
) and are used for attention-grabbing such
as for headings and captions.
In the early computer era usually a single screen font was built into
terminals.
Printers were based on daisy-wheel or line-printer
technology, again that supported a single font.
These early computer screens and printers were limited to drawing
each character in the same sized rectangular block.
Such fonts are called mono-spaced
since all
characters take up the same amount of horizontal space.
This leads to an un-even appearance as fat letters such as
'm' take the same space as skinny letters such as 'i'.
As the technology grew more sophisticated computers and printers
became capable of displaying traditional fonts called
proportional
.
In these fonts the space between the characters is the same, giving
the text an even appearance.
(Are you reading this in a mono-spaced or proportional font?
Look at this to decide: MMMMMMllllll
.)
So a font can be either proportional or mono-spaced.
It can have serifs or be sans-serif.
That's four possibilities, but fonts can have other attributes
such as heaviness of the strokes (e.g., bold
) or if the
letters are straight (roman
) or slanted (italics
).
Early Unix systems used X Window
fonts that were named for
each of 14 possible font attributes.
You could find a bold, 12 point (one point
is
roughly 1/72 of an inch) font by doing a directory listing for:
-*-*-bold-*-*-*-12-*-*-*-*-*-*-*
which might find the font file:
-adobe-utopia-bold-r-normal--12-120-75-75-p-70-iso8859-1
Nowadays fonts have names such as Helvetica
or
Bookman DemiBold
which is much less helpful.
When web designers or Java programmers set a font to use, they
typically don't know what fonts are available on the user's
system.
So if you guess to use a font Ariel
it may or may not
be available.
However all home computers ship with a set of fonts standard for
that platform.
To solve the problem for Java programmers, for every platform Sun
supports they picked 3 available fonts (these are the actual fonts
installed and are called physical
) and gave them
names programmers can use in their programs.
The names are called logical
font names since the
names reflect the type of the font:
Monospaced
(usually sans-serif too, but not necessarily), SansSerif
(a proportional font), and Serif
(also proportional).
So in a program you can chose the system-specific mono-spaced
font for code listings, sans-serif for headings and captions,
and serif for body text.
And you don't need to know the real or physical
name for
that font.
Most home computer systems also have a distinct look and feel to
the pop-up dialog boxes and other system elements (e.g., window
titles).
You can use these as well as Sun has kindly defined
the system standard fonts for dialog boxes as
and
Dialog
logical font names,
so you can make your dialogs appear native.
DialogInput
Remember each font is a collection,
an array or vector
of graphics known as glyphs
.
(It's more than that really.)
Each glyph is identified by a number.
For example a capital A
glyph in any font that has such
a glyph is identified by the number 65.
Of course anyone can make their own collection of glyphs as a font,
and identify any glyph with any number.
For Java and most software today the mapping of numbers to glyphs
is the one defined by the Unicode standard.
The numbers are called code points
.
Unicode had defined numbers for many thousands of glyphs and it is unlikely you have a single font file that has every glyph defined by Unicode. This can be a problem since you don't always know if the font you're using has a glyph for all the symbols, arrows, smiley faces, Greek letters, math and engineering symbols, etc., that you might want to use. If that is a problem you can find a font that can display the characters you need. In Java use code such as this:
for (Iterator<Font> i = fontList.iterator(); i.hasNext(); ) { Font f = i.next(); if ( ! f.canDisplay( '\u25B6' ) ) i.remove(); }
(See UnicodeSymbols.java for a sample applet with source that does this.) Fonts will claim they can display a character if it falls in the covered range of that font, even if there is no glyph for it!
There are a couple of other issues you should know about.
One is that font sizes are usually defined in
points
, which should be about 1/72 of an inch.
This unit worked well with early font technology since dot-matrix
printers and computer monitors had 72 pixels to the inch.
(Horizontally anyway; monitors often use rectangular pixels that
are taller than they are wide.)
A shortcut
was taken for fonts where the font designers
assumed 1 point = 1 pixel.
Todays monitors can use much smaller pixels (often 1/96 of an inch)
and they are spaced closer together.
This is called the monitor's DPI
(dots per inch).
This is why when you increase a monitor's resolution, most fonts
come out looking tiny.
Some fonts are smart enough to correct for that, so when you specify
a font size that's what you'll see.
For other fonts you (the programmer or web designer) must manually
adjust for the different DPI of screens and even of
printers.
(Java AWT toolkit provides a method to find the
DPI value, allowing a Java programmer to adjust manually
if needed.)
Another point about using font names:
You may not be able to pick a single physical font that
contains all the Unicode charaters, or even a significant number
of them, on some (older) systems.
This is because many physical fonts only define 256 glyphs.
In this case the logical font name may actually refer to several
physical fonts that are stitched together
to define lots
more glyphs than any one standard font available on your system.
Fore example, with Java 6 on Windows XP, the
file defines the logical font fontconfig.properties
serif
to use the physical font
Times New Roman
for the standard 256 aplhabetic glyphs, and
the font MS Mincho
for any Japanese glyphs.
By using the logical font name Serif
you can use
specify any Latin, Chinese, Hebrew, Japanese, or Korean
glyph and it will display correctly, even though there isn't
a single physical font that contains all those glyphs in
the Windows XP standard set of fonts.
If you used the physical font Times New Roman
and your
text contained the Unicode number for some Japanese glyph,
it wouldn't display correctly.
(Ususally the system displays a square or question-mark in these
cases.)
Still another issue is that different font files store the glyph
data in different formats.
Your software must be able to read the format or it can't use
the font.
Currently Java can read TrueType
and OpenType
font formats, but probably not other formats
such as PostScript Type 1
fonts.
On the other hand older Unix/Linux systems seems to only
recognize PostScript
fonts and not TrueType
or
OpenType
.
(Today most systems can use OpenType.
See www.prepressure.com/fonts/basics/history for a quick
but through history of computer font formats.)
In addition, not all font files use Unicode to label the
glyphs; most software needs the Unicode code points
to identify the glyphs.
For this reason you may have installed some font and find
that some software can't use it while other software can.
A final issue is one of encoding text.
Text is composed of a series of numbers that identify glyphs.
This text doesn't change if you change the font you use to render
(i.e., draw) the glyphs.
The problem is the numbers take up 4 or more
bytes in Unicode.
So how should the text string
be stored?
It might be 4 bytes per number, or two, or
even one, with special rules to handle large numbers.
ABC
This is called the text encoding
.
Internally Java uses two byte numbers, with a special
convention to represent characters defined in Unicode with numbers
larger than 65,536 (the biggest value you can store in two
bytes).
This is why the Java data type
is a two
byte value.
However most operating systems historically used one byte numbers.
This is because before Unicode, most Western fonts contained fewer
than 256 glyphs, more than enough for the latin alphabet commonly used
in the US and the UK.
When reading or writing text a program must pick the
proper encoding.
char
Unfortunately there is no easy way to tell what encoding to use, or what encoding is used. If you use a web browser the web page (which is just text) contains a header stating what encoding is used. Try changing that and see the results, especially for curly quotes and bullets.
A character
is just an abstract minimal unit
of text.
It doesn't have a fixed shape (that would be a glyph
of some font), and it doesn't have an intrinsic value.
The letter A
is a character and so is €
(the symbol
for the common currency of Germany, France, and numerous other
European countries).
A glyph is an element of writing. Two or more glyphs may represent the same symbol, called a grapheme or character. Glyphs may also be ligatures (compound characters) or diacritics (accent or other marks).
A character set
is a collection of characters.
For example, the Han
character set is the set of
characters originally invented by the Chinese, which have been used
to write Chinese, Japanese, Korean, and Vietnamese.
Other character sets you might have heard of include ASCII
,
ISO Latin I
(also called by its number,
ISO-8859-1
),
Unicode, or Microsoft's cp-1252
.
A character set is often defined by the symbols used in some
writing system (or script), such as English.
A code point
is a number used in a character
set to identify each charater.
A character set with such numbers is called a coded character
set
.
Code point numbers are usually referred to simply as code
points
.
Note that a coded character set defines a range of code points
but may not assign characters to every code point in that range.
(So if you used a for loop to generate all Unicode copde points and
displayed them, some are undefined no matter what font you use.)
This assignment of numbers to characters is sometimes called an
encoding, but that term has other uses.
A typeface is a design for a set of glyphs for one or
more character sets in one or more sizes.
All the glyphs in a typeface are designed with stylistic unity.
Put another way, each typeface comprises a coordinated set of
glyphs.
(For example Arial
is a typeface.)
A font is traditionally defined as a set of glyphs in a single size and style of a particular typeface, for some character set.
A font style usually refers to a charactistic of a font
such as italics or bold.
In fact a typeface is a collection of fonts all with
the same design; fonts are sold or distributed in sets named for
the typeface; for example Lucidia
is a set of eight fonts.
Because of this a typeface today is usually called a
font family instead.
After the introduction of computer fonts a broader definition of
font
evolved.
Today a font
is no longer size-specific but still refers to a
single style.
Today (and in the computer industry) the term font means the same
thing as typeface.
For example you could today refer to Arial
as a font or as
a typeface, but few outside of the print industry use the term
typeface anymore.
(The term font family is still commonly used.)
Early Unicode versions defined fewer than 65,000 characters and
code points, so using an unsigned two byte number for char
worked well.
Unicode version 4 defined 96,382 characters
and code points, The first version to require more than 2 bytes
each.
The set of code points that still fit into a two-byte word are called
the Basic Multi-lingual Plane
(or BMP).
The others are called supplimental characters
.
(Remember this only applies to Unicode.)
Unicode is an evolving standard. The current version (5.1 as of 2009) has defined 240,295 code points, of which 100,713 are assigned characters and the rest are private use characters and non-characters (code points with specific meanings but which don't correspond to a character).
In a string of characters you need to record the Unicode code points.
However few (or no) operating systems can deal with
Unicode's 4 bytes per charater code points.
Even Java only will use two byte numbers.
A character encoding scheme
(often just called an
encoding is used to translate a series of code points (that
is a string of text from the Unicode character set) into a series
of code units
that are one to four bytes for each
code point.
For example the UTF-32 character encoding scheme
uses one 4 byte code unit per code point; basically this stores the
raw code points.
The UTF-16 encoding scheme uses 2 byte code units.
It stores code points from the BMP as is
.
The supplemental code points are translated into pairs of
code units.
The first code unit of the pair starts with 55296
(0xD800) to 57343 (0xDFFF).
(So when reading a file in UTF-16 you can look
at any code point and see if it is from the BMP or the
start of a 2 code unit (4 byte) value.
Most common is the UTF-8
character
encoding scheme, which uses 1 byte code units and represents
each charaters with 1 to 4 code units.
Other character encoding schemes are common as well, such as ISO-8859-1. (Note that encoding schemes often use the same name as a standard character set; this is not a coincidence!)
Microsoft created a set of fonts that it hoped would be widely
distributed with all operating systems.
Known as the
core web fonts
these are included with Windows and
Mac OS X, and they are freely downloadable for Linux.
The collection includes 10 typefaces:
the popular Verdana and Georgia, reworked versions of Times and Courier,
Trebuchet MS, Andale Mono (has distintive glyphs for commonly
confused letters such as oh and zero), Impact, the Helvetica-esque
Arial, the Webdings dingbat font, and the seldom-used Comic Sans.
Besides these the JRE includes the Lucidia family of fonts.
These typefaces were specifically designed for screen use and have since become the most commonly used typefaces on the Web. While quite servicable, such a small set of fonts is limiting to designers. Newer web browsers support downloadable fonts using CSS or JavaScript, such as those from openfontlibrary.fontly.org.
There is a whole lot more to the story including ligatures, kerning, leading, and other fascinating (to me anyway) facts and history. (Did you know that originally printers (human ones) traveled with cases containing little wooden or lead font blocks? The capital letters were used much less often then the others and were stored in the top or upper part of the case while the rest were kept in the more convenient lower part of the case, and that's how we got the terms lowercase and uppercase letters. Is that interesting or what?)
Modern AWT does include classes and methods to list all available fonts installed so you can look for specific (physical) fonts. However in AWT you are limited to only using the fonts chosen for the logical font names! With swing you can use any font available:
import java.awt.*; public class ShowFonts { public static void main ( String [] args ) { Font[] fonts = GraphicsEnvironment.getLocalGraphicsEnvironment() .getAllFonts(); for ( int i = 0; i < fonts.length; ++i ) { System.out.print( fonts[i].getFontName() + " : " ); System.out.print( fonts[i].getFamily() + " : " ); System.out.println( fonts[i].getName() ); } System.out.println( "\n\n\tAvailable Fonts:\n" ); String[] names = GraphicsEnvironment.getLocalGraphicsEnvironment() .getAvailableFontFamilyNames(); for ( int i = 0; i < names.length; ++i ) System.out.println( names[i] ); } // end of main }
Still it may be a problem to not know the actual font used by some
logical font name, as different proportional fonts can do line breaks
in different places and mess up the carefully planned appearance of
your application or applet.
For this reason the JRE ships with a set of related
fonts called Lucida
.
These physical fonts are available in all Sun JREs
and include mono-spaced, Sans-Serif, and Serif versions.
(Look in .../jre/lib/fonts
on your system.)
In short Sun has identified three of the fonts on each platform and given them logical names. You can use one of these three or the platform-specific dialog fonts (making five logical font names in all), or pick some actual font name (a physical font) and hope it is available. It will be if you pick Lucida and you have a Sun JRE.
Actually using fonts in Java is easy:
Font titleFont = new Font( name, style, size );
where name
is a logical font name or the name of a
real font installed on your system, stye
is one of:
Font.PLAIN
Font.BOLD
Font.ITALIC
Font.BOLD+Font.ITALIC
(or equivalently, Font.BOLD|Font.ITALIC
)
and size
is the size specified in points, which is
probably really pixels on most monitors.
Note the size is the average height of the alphabetic glyphs.
The styles are limiting, you can't specify a demi
weight or
slanted
instead of italic in Java.
However most physical font files are named for the actual
style: Regular, DemiBold, or Bold.
So you can pick a physical font name incuding the whole style
part of the name.
OpenType fonts and CSS font properties
use a system known as
PANOSE
to specify font characteristics.
For example, the weight of a font can be specified as one of
these 9 values (from lightest to heaviest): 100, 200, 300, 400,
500, 600, 700, 800, 900.
400
usually corresponds to a font's normal
weight,
but there is no standard mapping of terms such as bold
or
demi
to these numbers.