Serialization

Using a DataOutputStream, you could write an application that saves the data content of your objects one at a time as simple types. However, Java provides an even more powerful mechanism called object serialization that does almost all the work for you. In its simplest form, object serialization is an automatic way to save and load the state of an object. However, object serialization has greater depths that we cannot plumb within the scope of this book, including complete control over the serialization process and interesting twists such as class versioning.

Basically, an instance of any class that implements the Serializable interface can be saved to and restored from a stream. The stream subclasses, ObjectInputStream and ObjectOutputStream, are used to serialize primitive types and objects. Subclasses of Serializable classes are also serializable. The default serialization mechanism saves the value of all of the object’s fields (public and private), except those that are static and those marked transient.

One of the most important (and tricky) things about serialization is that when an object is serialized, any object references it contains are also serialized. Serialization can capture entire “graphs” of interconnected objects and put them back together on the receiving end (we’ll demonstrate this in an upcoming example). The implication is that any object we serialize must contain only references to other Serializable objects. We can prune the tree and limit the extent of what is serialized by marking nonserializable variables as transient or overriding the default serialization mechanisms. The transient modifier can be applied to any instance variable to indicate that its contents are not useful outside of the current context and should not be saved.

In the following example, we create a Hashtable and write it to a disk file called hash.ser. The Hashtable object is already serializable because it implements the Serializable interface.

    import java.io.*;
    import java.util.*;

    public class Save {
      public static void main(String[] args) {
        Hashtable hash = new Hashtable();
        hash.put("string", "Gabriel Garcia Marquez");
        hash.put("int", new Integer(26));
        hash.put("double", new Double(Math.PI));

        try {
          FileOutputStream fileOut = new FileOutputStream( "hash.ser" );
          ObjectOutputStream out = new ObjectOutputStream( fileOut );
          out.writeObject( hash );
          out.close();
        }
        catch (Exception e) {
          System.out.println(e);
        }
      }
    }

First, we construct a Hashtable with a few elements in it. Then, in the lines of code inside the try block, we write the Hashtable to a file called hash.ser, using the writeObject() method of ObjectOutputStream. The ObjectOutputStream class is a lot like the DataOutputStream class, except that it includes the powerful writeObject()method.

The Hashtable that we created has internal references to the items it contains. Thus, these components are automatically serialized along with the Hashtable. We’ll see this in the next example when we deserialize the Hashtable.

    import java.io.*;
    import java.util.*;

    public class Load {
      public static void main(String[] args) {
        try {
          FileInputStream fileIn = new FileInputStream("hash.ser");
          ObjectInputStream in = new ObjectInputStream(fileIn);
          Hashtable hash = (Hashtable)in.readObject();
          System.out.println( hash.toString() );
        }
        catch (Exception e) {
          System.out.println(e);
        }
      }
    }

In this example, we read the Hashtable from the hash.ser file, using the readObject() method of ObjectInputStream. The ObjectInputStream class is a lot like DataInputStream, except that it includes the readObject() method. The return type of readObject() is Object, so we need to cast it to a Hashtable. Finally, we print the contents of the Hashtable using its toString() method.

Initialization with readObject()

Often, simple deserialization alone is not enough to reconstruct the full state of an object. For example, the object may have had transient fields representing state that could not be serialized, such as network connections, event registration, or decoded image data. Objects have an opportunity to do their own setup after deserialization by implementing a special method named readObject().

Not to be confused with the readObject() method of the ObjectInputStream, this method is implemented by the serializable object itself. To be recognized and used, the readObject() method must have a specific signature, and it must be private. The following snippet is taken from an animated JavaBean that we’ll talk about in Chapter 22:

    private void readObject(ObjectInputStream s)
        throws IOException, ClassNotFoundException
    {
        s.defaultReadObject();
        initialize();
        if ( isRunning )
            start();
    }

When the readObject() method with this signature exists in an object, it is called during the deserialization process. The argument to the method is the ObjectInputStream doing the object construction. We delegate to its defaultReadObject() method to do the normal deserialization from the stream and then do our custom setup. In this case, we call one of our methods named initialize() and, depending on our state, a method called start().

Using a custom implementation of readObject() and a corresponding writeObject() method, we could take complete control of the serialized form of the object by reading and writing to the stream using lower-level write operations (bytes, strings, etc.) instead of delegating to the default implementation as we did before.

We’ll talk a little more about serialization in Chapter 22 when we discuss JavaBeans.

SerialVersionUID

Java object serialization was designed to accommodate certain kinds of compatible class changes or evolution in the structure of classes. For example, changing the methods of a class does not necessarily mean that its serialized representation must change because only the data of variables is stored. Nor would simply adding a new field to a class necessarily prohibit us from loading an old serialized version of the class. We could simply allow the new variable to take its default value. By default, however, Java is very picky and errs on the side of caution. If you make any kind of change to the structure of your class, by default you’ll get an InvalidClassException when trying to read previously serialized forms of the class.

Java detects these versions by performing a hash function on the structure of the class and storing a 64-bit value called the Serial Version UID (SUID), along with the serialized data. It can then compare the hash to the class when it is loaded.

Java allows us to take control of this process by looking for a special, magic field in our classes that looks like the following:

    static final long serialVersionUID = -6849794470754667710L;

(The value is, of course, different for every class.) If it finds this static serialVersionUID long field in the class, it uses its value instead of performing the hash on the class. This value will be written out with serialized versions of the class and used for comparison when they are deserialized. This means that we are now in control of which versions of the class are compatible with which serialized representations. For example, we can create our serializable class from the beginning with our own SUID and then only increment it if we make a truly incompatible change and want to prevent older forms of the class from being loaded:

    class MyDataObject implements Serializable {
        static final long serialVersionUID = 1; // Version 1
        ...
    }

A utility called serialver that comes with the JDK allows you to calculate the hash that Java would otherwise use for the class. This is necessary if you did not plan ahead and already have serialized objects stored and need to modify the class afterward. Running the serialver command on the class displays the SUID that is necessary to match the value already stored:

    % serialver SomeObject
     
    static final long serialVersionUID = -6849794470754667710L;

By placing this value into your class, you can “freeze” the SUID at the specified value, allowing the class to change without affecting versioning.