A Detailed Look at Java's Polymorphism Implementation

A normal (non-polymorphic) method's address is determined at compile time, and the bytecode instruction to invoke it can call the method directly.  This is sometimes called early binding (or confusingly, static binding) because a method name is bound to a memory address at compile time.  This is efficient but it isn't always convenient!  Sometimes it isn't clear what the type of some variable should be until we run the program, because it might depend on user input, random numbers, or other external data such as from a file.  (Alright, in Java we should technically say “the type of the object some reference variable refers to”, but that's too long.)

Consider this code:

// Written 12/2007 by Wayne Pollock, Tampa Florida USA,
// From an idea posted in comp.lang.java.programmer on 12/23/07
// by Michael Jung ("Re: Polymorphism in Java SE?")

import java.text.NumberFormat;

public class PolymorphismDemo
{
    public static void main ( String [] args ) throws Exception
    {
        if ( args.length == 0 )
        {   System.out.println(
               "Usage: java PolymorphismDemo <some-number>" );
            return;
        }
        Number num = NumberFormat.getInstance().parse( args[0] );

        System.out.println( "The number " + num.toString() +
           " has class " + num.getClass() );
    }
}

The type (or class) of the Number object created (on line 16) depends on whether the argument is an integer (in which case it's a Long object) or a floating point number (in which case it's a Double object).  At compile time there is no way to know which toString method to use!  It depends on which type of object the variable num refers to, either Long.toString or Double.toString.  And in this program there's no way to know that until run time:

C:\Temp>javac PolymorphismDemo.java

C:\Temp>java PolymorphismDemo 123
The number 123 has class class java.lang.Long

C:\Temp>java PolymorphismDemo 123.456
The number 123.456 has class class java.lang.Double

So what can the compiler do?  When compiling the main method it can't bind the method name toString (used on line 18) to an address.  Instead the compiler defers the binding until run time, by using bytecode that will look up the address of the correct toString method.  This is why polymorphism is also known as late binding, delayed binding, or dynamic binding.  (In some languages such as C++ polymorphic methods are known as virtual methods or functions.)  Without polymorphism this program would be more complicated; you'd have to use some custom method to convert the args[0] String to a number, that also somehow returns a type name.  You would then need a switch statement or an if chain to test the type name and call the right method yourself.  (I.e., either Integer.toString(x) or Double.toString(x).)  This sort of code is very ugly, easily broken, and hard to maintain or enhance.  That is the main reason why Java supports polymorphic methods.

How It Works

The addresses of an object's polymorphic methods is stored in a method table in the object.  When invoking some polymorphic method at runtime the method name is looked up in this table to get the address.  A method table contains the names and addresses of the object's dynamically bound (polymorphic) methods.  The method table is the same for all objects belonging to the same class, so is stored in the Class object (for the object's type, here Integer or Double).  (In other languages method tables are called vtables.)

Method tables are not part of the language but might be used in some implementations.  (Different JVM vendors are free to implement polymorphism anyway they please as long as the end result is the same.)  The Sun JVM mixes method table entries in an object's “constant pool”, which can be seen using the command “javap -verbose foo”.  (Since all objects of the same class will have the same method table, the JVM may keep it elsewhere.)

It is illustrative to see how the system constructs a method table for some class such as Integer.  Initially the method table is empty.  Then the method table is filled with the polymorphic methods in the most distant ancestor class, usually the Object class:

Integer Method Table part 1
Method Name Address Comment
Object.toString 111 Object.toString method address
... ... 10 Additional methods

This list is added to (and existing entries modified) by the polymorphic methods in the next most-distant ancestor class, here its the Number class.  If you look at the API (JavaDocs) you'll find the Number class doesn't over-ride any methods but does add six new ones to the table.  So the toString entry in the method table is left unmodified:

Integer Method Table part 2
Method Name Address Comment
Object.toString 111 Number.toString method address
Number.intValue 222 Number.intValue method address
... ... 15 Additional methods

This continues until the method tables of all parent classes have been merged.  Finally the method table is updated again with the Integer class' polymorphic methods.  Now toString is over-ridden:

Integer Method Table part 3
Method Name Address Comment
Object.toString 333 Integer.toString method address
Number.intValue 444 Integer.intValue method address
Integer.parseInt 555 Number.longValue method address
... ... Additional methods

The "internal" method names in the table includes the original class name.  Over-ridding some method merely changes the address in the method table slot but doesn't change the name.  This is why the javap output shows the polymorphic method name as Object.toString and not merely toString or Integer.toString.

Take a look at the bytecode for this method call (use “javap -verbose PolymorphismDemo”), which reads:

   40:  invokevirtual   #11; //Method java/lang/Object.toString:()Ljava/lang/String;

The “#11” refers to a method name in the method table (as mentioned above this is an index into the “constant pool”), in this case the method named Object.toString that returns a String).  The invokevirtual bytecode instruction causes the JVM to treat the value at #11, not as an address (as it would be for early binding), but the name of a method to look up in the method table for the current object.  And that's how it works!