Chapter 19 Persistence

What Is Persistence?
Forms of Persistence (in Java)
Implementing a Simple File-Based Persistent Store
The PersistentJava (PJava) Project
Summary

Persistence in an object-oriented programming language that deals with the ability of objects to exist beyond the lifetime of the program in which they were created. This chapter addresses the topic of persistence from a number of perspectives.

First, it looks at what persistence is and what it means for Java objects to be persistent. An overview of several forms of persistence is presented.

Then the chapter delves into implementing file-based persistence, a strategy in which the programmer does most of the work to store objects persistently in a file. A Persistent framework is also introduced to provide developers a framework in which to implement persistence in their own classes.

Finally, the chapter covers the subject of Persistent Java (PJava), a research project at the University of Glasgow. This project's stated goals include building a prototype persistent storage interface for implementing orthogonal persistence in Java. An overview of persistent stores is presented prior to the discussion of PJava.

What Is Persistence?

Persistence describes something that exists beyond its expected lifetime. As applied to an object-oriented programming language, persistence describes objects that exist for an extended period of time, often beyond the lifetime of the original program that created the objects.

Object Lifetime

New Java programmers learn that objects have a lifetime. An object begins its life when created by the new operator (for example, new String("hi")). After it is created, the object exists until destroyed by the Java Virtual Machine's garbage collector. (An object can be garbage collected only when the Java program no longer holds a reference to the object.) Objects can also be destroyed implicitly, when the Java program ends. This code snippet demonstrates the essential concepts of Java object lifetimes:

{ Date d = new Date(); // Date object starts its life System.out.println(d.toString()); } // Date object is no longer reachable, and may be destroyed

In this example, a new Date is created within a program block ({}) and stored in a variable (d) local to that block. Upon reaching the ending curly brace (}), the local variable d exists no longer. From that moment, the Date object that was created is no longer reachable and may be garbage collected.

Persistence as Extending an Object's Lifetime

Persistence is a way to extend the lifetime of an object beyond the lifetime of the program that created it. To understand why it is useful to have persistent objects, consider an AddressBook class that contains names, addresses, and telephone numbers:

public class AddressBook { public String[] names = null; public String[] addresses = null; public String[] phonenums = null; }

A person writes information in an address book so that it is available at a later date, when the information is needed. Most people are unlikely to remember addresses and telephone numbers, so they write that information into a book. If you try to use the AddressBook class to represent a real address book, you will find that it does not support the "save it now, use it later" paradigm. All instances of the AddressBook class are destroyed when the Java program ends.

To be useful, an AddressBook object must exist for an extended period of time. It must be persistent (probably for years). Every time the user looks up, adds, or modifies address information, the AddressBook object is needed. Because the program that uses the AddressBook isn't always running, the AddressBook must be preserved during the time the program is not running.

Persistence is usually implemented by preserving the state (attributes) of an object between executions of the program. To preserve state, the object is converted to a sequence of bytes and stored on a form of long term media (usually, a disk). When the object is needed again, it is restored from the long term media; the restoration process creates a new Java object that is identical to the original. Although the restored object is not "the same object," its state and behavior are identical. (Object identity in a persistent system is an important issue, and is discussed in greater detail later in this chapter.) The following example outlines an API for a helper class that might be used to provide save and restore capabilities for AddressBook objects:

class AddressBookHelper { public static void store(AddressBook book, File file) {...} public static AddressBook restore(File file) {...} }

To save an AddressBook to a file, you must explicitly write a few lines of code to store the object. The code might look like the following:

File output = new new File("address.book"); // persistent media AddressBookHelper.store(addrBook, output);

Restoring an AddressBook from a file would look similar:

File input = new File("address.book"); // persistent media AddressBook addrBook = AddressBookHelper.restore(input);

Forms of Persistence (in Java)

There are several forms of persistence available to Java programmers. The forms discussed in this chapter include file-based persistence, relational databases, and object databases. These forms of persistence differ in several categories, including: logical organization of an object's state, the amount of work required of the application programmer to support persistence, concurrent access to the persistent object (from different processes), and support for transactional commit and rollback semantics.

Files

Files are often used to store information between invocations of a program. Data stored in a file may be simple (a text file), or it may be complex (a circuit diagram). In daily use of a computer, you often interact with objects that are stored in files (word processing documents, spreadsheets, network diagrams, and so on).

Files can be used as the basis for a persistence scheme in Java. Although Java 1.0 does not support a built-in mechanism to store objects in files, Java 1.0 does provide a portable streaming library (DataInput and DataOutput). This library makes it easier for the programmer to save and restore objects.

A file-based persistence mechanism requires the programmer to put a bit of work into achieving persistence. The programmer must choose an external representation of the object, and write the code that saves and restores the objects.

Usually, concurrency control and transactional semantics do not apply to file-based persistence. Storing objects in files is usually appropriate for single-user applications that follow the File/Open… and File/Save model.

Note

Just before this book went to press, JavaSoft introduced a new API that simplifies the process of storing objects in files (and streaming objects across the network). nformation about the Object Serialization API can be found at http://chatsubo.javasoft.com/current/. These Web pages claim that Object Serialization will be part of Java 1.1.

RDBMS

Relational database management systems (RDBMS) can also store persistent objects, but the characteristics of a relational database are different from file-based persistence. A relational database is organized into tables, rows, and columns, rather than the unstructured sequence of bytes represented by a file. An effort is under way to standardize the use of relational databases in Java (the JDBC API).

There are two major ways to store objects in a relational database. The first option is to interact with the database on its terms. The JDBC API provides interfaces that directly represent relational database structures. These structures can be used and manipulated as is. The other option is to write your own Java classes and "map" between the relational data structures and your classes. This type of mapping is a well-understood problem for which many commercial solutions are available (Java implementations will no doubt be available soon).

When using a relational database, unless you are using a tool to perform database-to-class mapping, you must write a large volume of code to interact with the database. Managing objects in the database requires you to write SQL statements (inserts, updates, deletes, and so on), which are forwarded to the database through the JDBC API.

Although using a relational database is more work, there are a few benefits. Relational databases usually support concurrency control and transactional properties. Multiple users can access the database without stepping on each other's changes, because the database uses locks to safeguard access. Additionally, almost all relational databases support ACID properties (atomicity concurrency isolation durability). These properties protect the integrity of the data by assuring that blocks of work (referred to as transactions) either complete successfully or are rolled back without affecting other users.

Note

As this book goes to press, the JDBC API was just officially standardized. Although few vendors are shipping products that support the API, almost all relational database vendors have publicly committed to providing implementations of the JDBC API.

ODBMS

Object database management systems (ODBMS) support persistence in a different manner than file-based persistence and relational databases. The philosophy behind object databases is to make the programmer's job simpler. Object databases (as the name implies) store objects; the programmer does not have to write SQL statements or methods to package and unpackage objects-the object database interface usually takes care of those details.

Object databases usually support concurrency control and ACID properties, like relational databases. They provide for concurrency access to the database, and they also provide commit and rollback transactional control. (Object databases are covered in greater depth later in this chapter, in the Persistent Java section.)

Note

As this book went to press, there were no commercial object databases available for Java. Three vendors (Versant, O2, and Object Design) had publicly stated their intent to release Java object database products, but none was available. On the academic front, the Persistent Java project was nearing completion of its first implementation (see the Persistent Java section, later in this chapter).

Implementing a Simple File-Based Persistent Store

This section presents an example of how to implement a simple file-based persistent store (that you can use to add persistability to your classes). First, the section looks at how to read and write primitive data using standard classes and interfaces provided by Java. Then it looks at how to read and write whole objects, not just primitive data types. Finally, it discusses how to apply these new interfaces to make your classes persistent.

IO Helpers-`DataInput` and `DataOutput`

Before discussing how to store whole objects in files, it is important to learn how to store primitive Java data values in files (int, float, String, and so on). The java.io package provides two interfaces (DataInput and DataOutput) that contain a standard API for reading and writing primitive Java types. Table 19.1 provides a summary of the methods in DataInput and DataOutput.

Table 19.1. The DataInput and DataOutput APIs.

Data Type DataInput DataOutput

boolean readBoolean() writeBoolean()

byte readByte() writeByte()

char readChar() writeChar()

short readShort() writeShort()

int readInt() writeInt()

long readLong() writeLong()

float readFloat() writeFloat()

double readDouble() writeDouble()

String readUTF() writeUTF()

Note

Even though String is not strictly an elemental data type (it is a class), DataInput and DataOutput define an API for reading and writing Strings. The primary reason is that the String data type is a major part of the language-DataInput and DataOutput without String support would be a less-than-functional solution. The String data type is also handled differently; Strings are encoded in a way that compacts the representation, when possible.

The DataInput and DataOutput interfaces are simple to use. The following example demonstrates a few of the DataInput and DataOutput methods:

class Person { String name = null; int age = 0; ... void write(DataOutput out) { out.writeUTF(name); // write the name string out.writeInt(age); // write the age } ... void read(DataInput in) { name = in.readUTF(); // read the name string age = in.readInt(); // read the age } }

DataInput and DataOutput provide a platform independent solution for the data representation problem. Data written to a file (or socket) on one platform can be read by Java programs on different platforms, as the representation of the data types is standardized. An int or String written to a file on a Windows NT machine can be read from that file on a Solaris machine, Macintosh, and so on. If Java did not provide a standard interface for data formatting, every programmer would solve this problem independently. The result would be a Tower of Babel, which would make communicating between Java programs problematic (especially because Java is targeted for the network computing industry).

Sun has solved the data representation problem before. Years ago, Sun created the eXternal Data Representation (XDR) format, and an accompanying C library. XDR was created to provide a standard format for data interchange over networks, and to serve as the data format for Remote Procedure Calls (Rpc). Today, XDR is still widely used.

Although similar to XDR, the format required by DataInput and DataOutput is not identical to XDR. Java's solution is less complicated, and more compact. The DataInput/DataOutput format requires that:

Data is represented in binary form (not ASCII), for compactness.
Data is represented in network byte-order (big-endian).
For elemental data types, data is stored in exactly the same number of bytes as guaranteed by the JVM-that is, a byte is stored as one byte; a char, as two bytes; an int, as four bytes, and so on.
No padding or byte-alignment is required.
Strings are encoded using a special format that reduces the number of bytes written (especially if you are using the Latin character set).

Primitive data types can be written to or read from files, sockets, or any type of stream using the DataInput and DataOutput interfaces.

When reading and writing files, there are two implementations of the DataInput and DataOutput interfaces to choose from (in the java.io package). The RandomAccessFile class implements both DataInput and DataOutput. The more frequently used classes are DataInputStream (which implements DataInput) and the DataOutputStream (which implements DataOutput). To write data to a file, you should use a DataOutputStream as a filter over a FileOutputStream (see Chapter 5 "Building Special-Purpose I/O Classes," for more information on filters). Here's an example:

void write(File file, String s, int i, float f) { // first open the FileOutputStream FileOutputStream fileout = new FileOutputStream(file); // then open the DataOutputStream "on top of" the // FileOutputStream that's already open DataOutputStream dataout = new DataOutputStream(fileout); // then write to the DataOutputStream, which will be // streamed "into" the FileOutputStream dataout.writeUTF(s); dataout.writeInt(i); dataout.writeFloat(f); dataout.close(); }Reading from a file is as simple as the last example. You open a DataInputStream over a FileInputStream and make calls to the DataInput reading methods.

The `Persistent` Framework

The java.io package supplies the necessary classes to read and write primitive data. But what about reading and writing entire objects? Although DataInput/DataOutput is a powerful concept (the portable data format), these interfaces do not contain methods to read or write entire objects. Objects seem to be "left as an exercise for the reader." This author decided to take up the challenge and implement a simple framework for reading and writing objects. The interfaces and classes in this framework are present on the accompanying CD-ROM. Feel free to use the provided framework in your code.

Note

Just before this book went to press, JavaSoft announced the (alpha) availability of the Object Serialization API. The API, which is scheduled to be part of Java 1.1, is a framework for reading and writing Java objects. Object Serialization is very similar to the Persistent framework presented in this chapter. By learning the Persistent framework, you will also be learning about Object Serialization.

You have already encountered the concepts that go into reading and writing primitive data. DataInput and DataOutput can handle the streaming of primitive types, but they do not handle class types. In order to stream class types, we need a new concept-the concept of "a class whose instances that can stream themselves." This can be generalized in an interface, called Persistent:

import PersistentInput; import PersistentOutput; import java.io.IOException; /** * Persistent interface. Provides a class with the ability to write * itself to a stream, and to read itself from a stream.<p> * * @see PersistentInput * @see PersistentOutput * @author Eric R Williams */ public interface Persistent { /** * Writes self to the specified output stream.<p> * * @param out the persistent output interface to write self to. * @exception IOException if an I/O problem occurs. */ public void write(PersistentOutput out) throws IOException; /** * Reads self from the specified input stream.<p> * * @param in the persistent input interface to read self from. * @exception IOException if an I/O problem occurs. */ public void read(PersistentInput in) throws IOException; }

Note

Note the use of javadoc-style comments in the preceding example. Documenting your code using the javadoc standard format is always a good idea. This format helps you produce on-line documents describing your code, and it is generally expected by other developers. For the remainder of this chapter, however, the javadoc-style comments have been removed to cut down on the size of the code listings.

The Persistent interface provides a standard way to add persistence (and streamability) to classes. To add persistence to a class, implement the Persistent interface in that class. There are only two methods to implement: one to write the object to an output stream (write(PersistentOutput)) and one to read the object from an input stream (read(PersistentInput)).

If you examine the Persistent interface, you encounter two additional classes: PersistentOutput and PersistentInput. They are actually not classes, but interfaces. These interfaces extend the DataInput and DataOutput interface models to provide support reading and writing Persistent objects, as follows:

import Persistent; import java.io.DataOutput; import java.io.IOException; public interface PersistentOutput extends DataOutput { void writePersistent(Persistent obj) throws IOException; }

PersistentOutput defines an API that extends the DataOutput interface and adds a new method (to write Persistent objects). The new method, writePersistent(Persistent), is declared in a style consistent with the other methods declared in the DataOutput interface.

A similar interface is defined to extend DataInput-the PersistentInput interface:

import Persistent; import java.io.DataInput; import java.io.IOException; public interface PersistentInput extends DataInput { Persistent readPersistent() throws IOException; }

These three interfaces-Persistent, PersistentInput, and PersistentOutput-form a framework that makes it easy to add persistence to your classes. There are two additional classes in the Persistent framework, PersistentInputStream and PersistentOutputStream; these classes are discussed in detail in a later section.

Using the Simple Persistent Store

Now that you have been introduced to the Persistent framework, let's examine how to apply that framework to make objects persistent. This process involves modifying a class that you have already written, to add the Persistent interface to that class. We will use a simple class created to demonstrate the Persistent framework, the Shape class. The original code for Shape (without persistence) is listed below:

import java.io.*; import java.awt.Point; public class Shape { private Point[] vertices; private String name; public Shape(Point[] vertices, String name) { this.name = name; this.vertices = vertices; } public Shape(int size, String name) { this.name = name; vertices = new Point[size]; for (int i=0; i<size; i++) { vertices[i] = new Point(0, 0); } } public Point getPoint(int pos) { return vertices[pos]; } public String getName() { return name; } }

Shape is a simple class; it has only two attributes, a name and an array of points (the boundaries of the shape). The Shape class depends on java.awt.Point to represent Point objects.

To add persistence to the Shape class, we need to make a few changes to the class source code:

Add "implements Persistent" to the class declaration line
Add a no-parameter constructor (the reason for this will be discussed later)
Code the write(PersistentOutput) method, which is required by the Persistent interface
Code the read(PersistentInput) method, which is also required by the Persistent interface

The first two items on this list are trivial. They involve minor changes to the class. The latter two items are more involved tasks.

Before we start coding the read() and write() methods, we need to choose an external format for the Shape class. The external format is a specification of the order and structure of the object's attributes. One convenient notation used to express this format is similar to C struct declarations. (This notation is used in the Java Virtual Machine Specification to describe the layout for Java .class files.) We can represent the Shape class using the following structure:

int vertex_count; struct { int x; int y; } vertices [vertex_count]; String name;

This notation specifies that the first element in the format is labeled vertex_count and is an int. The second element is labeled vertices; it is an array of length vertex_count (which was already specified). The array is composed of a compound structure containing two ints, x and y, respectively. The last element is a String, labeled name. In this notation, the labels exist for human consumption only-they are not included in the stored objects. Labels help readers of the format understand what data is being represented.

Once you choose an external format for the Shape class, you can begin to construct the routines to read and write a Shape. Here is an implementation of the write(PersistentOutput) method:

public void write(PersistentOutput out) throws IOException { out.writeInt(vertices.length); // write # of points for(int i=0; i<vertices.length; i++) { // write each point out.writeInt(vertices[i].x); out.writeInt(vertices[i].y); } out.writeUTF(name); // write shape name }

Only two of the DataOutput interface methods are used in this example: writeInt() and writeUTF(). As you can see, this method logically carries out the agreed-upon format-array length, followed by the array of points, and then followed by a string. The process of writing an object to a file is not difficult; it is expressed in about five lines of code.

The following is an implementation of the read(PersistentInput) method:

public void read(PersistentInput in) throws IOException { vertices = new Point[in.readInt()]; // read # of points for(int i=0; i<vertices.length; i++) { // read each point vertices[i] = new Point(in.readInt(), in.readInt()); } name = in.readUTF(); // read shape name }

The read() method implements the agreed-upon format. Again, the method is short and simple to understand, using just two methods from the DataInput interface: readInt() and readUTF(). First, it reads the vertices' array size, followed by each vertex (a Point consisting of two ints, x and y), and finally reads a String, the name of the shape.

Now that we have seen the pieces, let's put it all together. The following code listing includes the Shape class (renamed to PShape), plus the additions that have been made (in bold) to support persistence:

import java.io.*; import java.awt.Point; public class PShape implements Persistent {   private Point[] vertices;  private String  name;  public PShape() {    // need a no-parameter constructor
    vertices = null;
    name = null;
  }

  public PShape(Point[] vertices, String name) {
    this.name = name;     this.vertices = vertices;  }  public PShape(int size, String name) {     this.name = name;    vertices = new Point[size];     for (int i=0; i<size; i++) {      vertices[i] = new Point(0, 0);     }  }  public Point getPoint(int pos) {     return vertices[pos];  }  public String getName() {     return name;  }  public void write(PersistentOutput out) throws IOException {
    out.writeInt(vertices.length);          // write # of points
    for(int i=0; i<vertices.length; i++) {  // write each point
      out.writeInt(vertices[i].x);
      out.writeInt(vertices[i].y);
    }
    out.writeUTF(name);                     // write shape name
  }
  public void read(PersistentInput in) throws IOException {
    vertices = new Point[in.readInt()];        // read # of points
    for(int i=0; i<vertices.length; i++) {     // read each point
      vertices[i] = new Point(in.readInt(), in.readInt());
    }
    name = in.readUTF();                       // read shape name
  }  public String toString() {     StringBuffer b = new StringBuffer(name);    for (int i=0; i<vertices.length; i++) {       b.append(" (" + vertices[i].x + "," + vertices[i].y + ")");    }     return b.toString();  } }

To validate the persistence of the above class, we need to have a test class that:

Creates a Shape object.
Writes it to a file, using a PersistentOutputStream.
Reads it back from the file, using a PersistentInputStream.
Compares the two objects.

The following class, PShapeTest, validates the persistence of PShape. (All of these classes are on the accompanying CD-ROM, so feel free to run this test.)

package COM.MCP.Samsnet.tjg; import COM.MCP.Samsnet.tjg.PShape; import COM.MCP.Samsnet.tjg.PersistentOutputStream; import COM.MCP.Samsnet.tjg.PersistentInputStream; import java.io.*; public class PShapeTest { public static void main(String[] args) { try { PShape square = new PShape(4, "SquareOne"); square.getPoint(0).move(0, 0); square.getPoint(1).move(1, 0); square.getPoint(2).move(1, 1); square.getPoint(3).move(0, 1); PersistentOutputStream out = // create a PersistentOutputStream new PersistentOutputStream( // on top of a FileOutputStream new FileOutputStream("pshape.sav")); out.writePersistent(square); // *** write the Shape *** out.close(); PersistentInputStream in = // create a PersistentInputStream new PersistentInputStream( // on top of a FileInputStream new FileInputStream("pshape.sav")); PShape shape2 = (PShape) in.readPersistent(); // *** read the Shape *** in.close(); if (square.equals(shape2)) { System.out.println("everything is ok!"); } } catch (Exception ee) { System.err.println(ee.toString()); ee.printStackTrace(); } } // main } // class

The Implementation of `PersistentInputStream` and `PersistentOutputStream`

The only missing pieces now are the classes that provide implementations for the PersistentOutput and PersistentInput interfaces. As interfaces, they are API specifications only; implementations are required if you are going to use the interfaces.

Let's start with PersistentOutput. The PersistentOutput interface is very complicated; it contains all the methods of DataOutput (approximately 14 methods), plus writePersistent(). That's a lot of methods to implement! Fortunately, reuse by inheritance comes in handy; a class that nearly matches the needs already exists. By subclassing DataOutputStream, all of the DataOutput methods defined in DataOutputStream are inherited (and do not need to be reimplemented). You only have to implement a constructor and writePersistent() method. Here's a listing of the DataOutputStream class:

import java.io.*; import Persistent; import PersistentOutput; public class PersistentOutputStream extends DataOutputStream implements PersistentOutput { public PersistentOutputStream(OutputStream out) { super(out); } public final void writePersistent(Persistent obj) throws IOException { if (obj == null) { // treat null in a special way writeUTF("null"); // write "null" as the class name } else { writeUTF(obj.getClass().getName()); // write the object's class name obj.write(this); // then write the object itself } } }

The writePersistent() method writes the string "null" if the specified Persistent object is null. Otherwise, the method writes the class name of the object (a String), followed by the object writing itself to the stream (using the write(PersistentOutput) method of the Persistent interface). The PersistentOutputStream does not have to understand the format a Persistent object uses when it writes itself to the stream. Moving the writing logic to the classes that implement Persistent is what the Persistent interface is all about.

The PersistentInputStream is slightly more complicated, but it still inherits most of its behavior from DataInputStream, as shown here:

import java.io.*; import Persistent; import PersistentInput; public class PersistentInputStream extends DataInputStream implements PersistentInput { public PersistentInputStream(InputStream in) { super(in); } public final Persistent readPersistent() throws IOException { Persistent obj = null; String classname = readUTF(); // read the class name if ("null".equals(classname)) { obj = null; // if "null", return null } else { try { // retrieve the Class object for the specified class name Class clazz = Class.forName(classname); // build a new instance of the Class (throws an exception if // the class is abstract or does not have a no-param constructor obj = (Persistent) clazz.newInstance(); // let the object read itself from the stream obj.read(this); } catch (ClassNotFoundException ee) { // catch all kinds of throw new IOException(ee.toString()); // exceptions and rethrow } catch (InstantiationException ee) { throw new IOException(ee.toString()); } catch (IllegalAccessException ee) { throw new IOException(ee.toString()); } } return obj; } }

The readPersistent() method reads the name of the object's class from the stream. If that name is equal to "null," the null value is returned. Otherwise, the method locates the Java Class object corresponding to the class name and uses the Class to create a new instance of the Persistent object. The new Persistent object then reads itself from the stream in the read(PersistentInput) method.

You might wonder about the exception handling in the readPersistent() method. Why does it have so many catch statements? They were used to keep the readPersistent() method consistent with the methods of DataInput, all of which throw only IOException. If you do not catch the listed exceptions and rethrow them as IOExceptions, the exception class names must be declared in the throws clause of the readPersistent() method-which would be inconsistent with DataInput.

Note

The object creation step in the PersistentInputStream class requires the use of the Class method newInstance(), which is Java's generic interface for creating an object, given the Class instance. To allocate a new object of a class using newInstance(), the class must have a public constructor that takes no parameters (this is the constructor method that will be invoked by newInstance()). A public no-parameter constructor was added to the PShape class to support the use of newInstance().

The PersistentInputStream and PersistentOutputStream implementation of reading and writing Persistent objects has several limitations:

If you attempt to read a persistent object for which the Java class has not yet been loaded, an exception is thrown.
Object identity is not considered. Two references to a single object are written as two objects on a PersistentOutputStream.
Cyclical data structures cause the PersistentOutputStream to enter a recursive loop, eventually exhausting stack space and throwing an exception. (An example of a cyclical structure is one in which two objects contain references to each other.)

The Persistent framework classes are simple and straightforward. In short order, you can add "persistence" to your classes; you can store objects in files or send them across a network to another computer. These interfaces and classes are not a general solution to the problem of persistence, but it's a good solution when you have to store or send simple objects. Additionally, the Persistent framework is a useful tool to teach some of the concepts of persistence.

The PersistentJava (PJava) Project

In October 1995 (the early days of Java, before the language skyrocketed in popularity), Sun funded a year-long research project at the University of Glasgow to investigate adding "persistence" to the Java programming language. The Glasgow researchers have proposed a design specification for adding "orthogonal" persistence to Java. They have also begun building a persistent storage interface to link Java to a persistent store.

Persistent Store Concepts

Few programmers are familiar with persistent stores or object databases. This brief section introduces the basic concepts involved in a persistent store.

Note

The phrases "persistent store" and "object database" are often used interchangeably. Because the authors of the PJava design refer to PJava as an "interface to a persistent store," this chapter refers to PJava as a "persistent storage" interface.

Persistent Stores Versus Relational Databases

Foremost, a persistent store is a kind of database. You are probably familiar with the term "database" (a storage pool for information). Most commercially available databases support long term data storage on disk, structural organization of the data, methods to retrieve data from the database, methods to update data already stored in the database, row or page locking to prevent concurrent access problems, isolation of uncompleted transactions from other transactions, and so on. Most persistent stores meet these criteria.

By far the most common type of client-server database system is the relational database (for example, Oracle, Informix, Sybase, DB2, and so on). Contrasting a persistent store with a relational database is a useful exercise to understand what a persistent store is and what it is not.

Relational databases are organized in tabular data structures: tables, columns, and rows. Data from different tables can be joined to create new ways of looking at the data. SQL is used to send commands to the database, such as commands to create new rows of data, to update rows, and so on. SQL commands can also be used from other programming languages, because they are sent to the database server for processing.

Relational databases, with their tabular data structures, do not mesh well with object-oriented (OO) programming languages. There are three major problems encountered using relational databases from an OO language. First, relational data structures do not provide for class encapsulation. OO programmers are encouraged to model their domain using classes, providing an API to class users, and "hiding" all data within the class. Relational structures expose all data and do not allow encapsulation by an API. Second, OO classes support a rich set of data types that are difficult or impossible to model efficiently in a relational structure. Examples include multidimensional arrays, dictionaries, and object references. Last, it is difficult to represent class inheritance in a relational database. Although it is possible, deep class inheritance trees can result in n-way joins on the database server that have poor performance.

Tools that attempt to solve the object and relational mismatch are available. These tools map relational data structures into OO classes using relatively simple rules (for example, map tables to classes, columns to attributes, and foreign key attributes to object relationships). Although some of these products have been successful, this approach has had problems. These products suffer from performance issues, particularly when complex navigation is performed through the mapped data structures. Additionally, these products limit the type-expressiveness of the language, because not all the data types expressible in the object-oriented language are easily expressible in a relational database.

Persistent stores are different from relational databases. Persistent stores do the following:

Eliminate the use of relational data structures (instead, whole objects are stored directly in the database)
Enable the programmer to write classes in a normal, object-oriented fashion to represent data that will be made persistent
Enable the programmer to take advantage of more data types than is possible when using a relational database
Provide a simpler interface than a relational database interface

Creating and Using Persistent Objects

Different persistent storage interfaces have different methods for creating persistent objects (or making existing objects persistent). Some interfaces require the programmer to specify whether an object is to be persistent at the time an object is created. Other persistent stores implement a concept referred to as persistent roots. Persistent root objects are explicitly identified as objects that are persistent; any object that is referred to by the persistent root is also considered persistent. All objects that are reachable in this fashion (from the persistent root) are also considered to be persistent and are saved in the persistent store. This concept is called persistence via reachability.

Retrieving objects from a persistent store is significantly different from retrieving data through SQL. When using SQL, the programmer must explicitly request data (using SELECT statements); but with persistent stores, programmers seldom make explicit queries for objects. Persistent stores usually provide a mechanism to request only "top-level" objects, either through direct query or through a request for a particular persistent root.

Persistent storage interfaces almost universally employ a process known as swizzling (or object faulting) to retrieve objects from the database. Objects are retrieved on the fly, as they are needed. After obtaining a reference to a top-level object, programmers normally use that object to access related objects. When attempting to access an object that has not yet been retrieved from the database, the object is swizzled in. The attempt to access the object is trapped by the database interface, which then retrieves the object's storage block from the database, restores the object, and then allows the object access to continue.

Finally, persistent stores usually have a mechanism to identify objects uniquely: the object ID. Every object in a persistent store is assigned its own unique object ID, which can be used to differentiate objects of the same class whose values are equal.

PJava Design

The first Persistent Java design, known as PJava0, was published in January 1996. An additional paper (Atkinson, et al. '96) was published in February and describes the design issues of PJava0. (Both of these papers are available from http://www.dcs.gla.ac.uk/~susan/pjava.) The PJava0 design goals, principles, and architecture are outlined in the following sections.

Project Goals

The stated goal of the PersistentJava project is to provide orthogonal persistence in Java. The PJava researchers are creating a persistent storage mechanism that can store objects of any type in the persistent store. This is the operating meaning of "orthogonal"-the independence of the persistence from data type. Any object, without respect to type, can be made persistent.

Many persistent stores and object databases do not support orthogonal persistence. Orthogonal persistence is extremely hard to implement in most programming languages. It means that the programmer can write code without considering that they might be dealing with persistent objects. This forces the persistent storage interface to be extremely flexible in how it deals with data types. Additionally, this makes implementing a programming-language independent database server difficult because a very tight binding is made to one language's type system.

The Glasgow team has set out with a goal of orthogonal persistence; doing so has implications they must handle. Any object, be it of a user-defined or system-defined class, can be persistent. Persistent objects can include Object, Panel, SecurityManager, Button, Class, Hashtable, and so on.

An additional goal of the research project is the building of a prototype application that uses the prototype persistent storage interface. The application is referred to as Forest, a distributed software configuration management and build system ([Atkinson, et al. 96] Atkinson, Daynès & Spence. Draft PJava Design 1.2. Department of Computer Science, University of Glasgow. January 1996).

Design Principles

The PJava team used several principles to guide their design:

Data type independence from persistence (orthogonal)
Persistence through reachability from persistent roots
No changes to the Java language
Support for different styles of transactions
Persistence without modification to existing Java code
Flexibility, to allow for integration with multiple persistent stores

The PJava team intentionally left out one potential design goal: "No changes to the Java virtual machine." In fact, the team has actively pursued the modification of the JVM; it is a central part of the architecture (and probably the only feasible way to implement orthogonal persistence). Unfortunately, JavaSoft has stated that they will not incorporate the PJava changes into the commercial JVM, effectively relegating PJava to the academic community for the time being.

The foremost point to remember about the design of Persistent Java is that it does not require the programmer to change any existing classes. It does not require the programmer to use a "special" version of the system classes. It does, however, require the programmer to use a customized virtual machine.

Storing and Retrieving Objects

One of the first things you want to know as the user of a persistent store is how to make objects persistent. How do you store objects in the database? Persistent Java incorporates the concept of a persistent root. The Draft PJava Design 1.2 document states that an early revision of the design included a PersistentRoot class-objects of type PersistentRoot (or a subclass thereof) have the property of "being a persistent root." However, the design was changed; any object may be registered as a persistent root, thus making the "root" property independent of data type.

Here is an example of how to make an object a persistent root in PJava0:

// make obj a persistent root (pstore is a PJavaStore) pstore.registerPRoot("root-1", obj);To retrieve a persistent root from the database, follow this example:
// get the handle for all Open Orders Orders[] orders = (Orders[]) pstore.getPRoot("OpenOrders");

Note

The previous code example is the only PJava code sample included with this book. As this book goes to press, the PJava0 implementation has yet to be completed. It is scheduled to be completed during the Summer of 1996.

Recall from the earlier discussion of persistent roots (in the section "Persistent Store Concepts") that roots are only the starting point for the identification of persistent objects. By adding a single persistent root to the database, you may be adding thousands of objects to the persistent Java store.

Now you can store root objects in the persistent store and retrieve them. But how do you access other objects? Does a similar "ask the database for the object" interface exist? The answer is both yes and no. When you use a root object to access related objects, you call methods on and retrieve the attributes of those objects. When you attempt to access a related object that has not yet been brought from the database, the modified virtual machine intercepts this action, bringing the object from the database for you. You are not required to do anything special. Use objects as you normally would-the object retrieval mechanism is transparent.

The PJava virtual machine (a modified JVM) performs work that is not visible to the programmer-the VM monitors access to objects. When an attempt is made to access a persistent object that has not yet been accessed, PJava goes into action. Part of the PJava system is called upon to retrieve the object. It determines whether the storage block containing the object has already been loaded; if not, it makes a trip to the persistent store. When the object's storage is loaded, PJava converts the byte-oriented storage into a Java object. The PJava VM then allows your code to continue accessing the object. This mechanism of transparent object retrieval is often called swizzling, or object faulting (a legacy of certain object databases that perform this operation using OS page faulting mechanisms).

Transactions

The next thing you might want to know about PJava is how to begin and end a transaction. The designers of PJava wanted to allow multiple transaction styles, so they created a transaction root class, TransactionShell. This class has two provided subclasses: NestedTransaction and OLTPTransaction, but the programmer can subclass TransactionShell to create new transactional styles.

Transactions in PJava can either be launched synchronously (that is, in the same thread) or asynchronously (in a different thread) by invoking the start() method of the transaction object. The TransactionShell class executes the user's transaction logic through a Runnable object, whose run() method is invoked as the "main" method of the transaction. To obtain the result of the transaction (whether it succeeds or fails), call the claim() method. If you want to stop an asynchronously running transaction, you can invoke the kill() method on that transaction.

In PJava, you can run one transaction nested within another transaction using the NestedTransaction class. Nested transactions enable you to perform updates in a child transaction without affecting the state of the parent transaction. A child transaction that completes successfully passes all it updates (the modified objects) to its parent transaction. If the child transaction aborts, none of its updates are ever reflected in the parent transaction. You also can spawn parallel, independent NestedTransactions. In this case, each of the sibling transactions is isolated from each other, and can commit or abort independently.

An additional transaction class, the OLTPTransaction, also is available. An OLTPTransaction is a traditional transaction style that cannot be executed asynchronously and cannot be nested.

State of the PJava Project

I would like to thank Susan Spence, a researcher on the PJava project, for providing me with much of the following information.

The PJava project began in October 1995, when it was funded for one year by Sun. The first phase of the project is expected to be complete by October 1996, when the Glasgow team expects to begin an additional two years of work on the project. As of the time of this writing (May 1996), partial funding for the additional two years has already been obtained.

By the time this book is published, an implementation of PJava0 may be available. The implementation will contain basic support for persistence via reachability and a default transaction model. Platform availability will be limited to Solaris, and distribution may be restricted due to a lack of support funding.

During the summer of 1996, the PJava team will begin designing PJava1, which will likely contain more advanced transactional models, support for distributed databases, and database garbage collection. Up-to-date information about the PJava project can be found at http://www.dcs.gla.ac.uk/~susan/pjava.

Summary

Persistence involves extending the lifetime of an object beyond the lifetime of the program in which it was created. In this chapter, you have seen several possible ways to implement persistence:

Saving the representation of an object directly to a file using the DataOutput and DataInput interfaces
Using the Persistent framework that has been provided with this chapter
Using some form of database library (for example, JDBC)
Using a persistent store, like the one being created by researchers at the University of Glasgow

Data Type	DataInput	DataOutput
`boolean`	`readBoolean()`	`writeBoolean()`
`byte`	`readByte()`	`writeByte()`
`char`	`readChar()`	`writeChar()`
`short`	`readShort()`	`writeShort()`
`int`	`readInt()`	`writeInt()`
`long`	`readLong()`	`writeLong()`
`float`	`readFloat()`	`writeFloat()`
`double`	`readDouble()`	`writeDouble()`
`String`	`readUTF()`	`writeUTF()`

Chapter 19

Persistence

CONTENTS

What Is Persistence?

Object Lifetime

Persistence as Extending an Object's Lifetime

Forms of Persistence (in Java)

Files

RDBMS

ODBMS

Implementing a Simple File-Based Persistent Store

IO Helpers-DataInput and DataOutput

The Persistent Framework

Using the Simple Persistent Store

The Implementation of PersistentInputStream and PersistentOutputStream

The PersistentJava (PJava) Project

Persistent Store Concepts

Persistent Stores Versus Relational Databases

Creating and Using Persistent Objects

PJava Design

Project Goals

Design Principles

Storing and Retrieving Objects

Transactions

State of the PJava Project

Summary

IO Helpers-`DataInput` and `DataOutput`

The `Persistent` Framework

The Implementation of `PersistentInputStream` and `PersistentOutputStream`