November 12, 2003
Encapsulation or Representation
Several engineers are building a complicated software system together, and naturally, they are trying to figure out how to divide the work up. There are two different strategies that they can take.
In the abstract, you can approach any architectural division either way, although in practice usually one way fits better than the other. How can you tell which one is right? Let's take a look at an example.
Order Management by Encapsulation
Suppose you are building an order management system for a car-repair shop. Then the idea is that you have various "orders" that go through the system; each order has information about the customer, the request, the price, and so on. Similarly, there are several different things to do with an order, such as displaying them, submitting them for execution, and closing them out when they are done.
One way to divide up the work is to encapsulate the behavior or an "order" object as a formal interface, and then leave it up to the different engineers to implement the order objects using whatever data representation they choose. In this classical, often-advocated object-oriented approach, the process begins with the engineers deciding, for example, that an "order" object is just a bit of running code that supports a formal interface:
Once they have agreed upon how to encapsulate an order in the abstract like this, the engineers can then go and work independently. All the details of "what an order is" would be up to the implementor of any particular order object, so long as the order object supported the encapsulating interface.
The nice thing about this approach is that it allows a system to be enhanced with a "new kind of order" without much difficulty. For example, if we were an auto repair shop getting into the restuarant business (perhaps inspired by Dixies BBQ and Auto Repair), we could probably have some software engineers convert our order management system from being able to handle only "car" orders into one that could also handle "food" orders. Although many details of the order may be different, as long as they can implement the same "Order" interface, the system should work. Similarly, "order users" can also be enhanced fairly easily, so long as they only use the services that an order has been defined to provide.
Order Management by Representation
The other way to divide up the work is to represent an "order" as a formal data format, and then leave it up to the different engineers to build systems that exchange that data for whatever purpose they choose. They would just need to ensure, for example, that an "order" is a document that matches a formal defined data format:
Once the representation of an order has been agreed upon, the engineers can then go and work independently. Exactly "what should be done with an order" would be left to the implementor of any particular subsystem, so long as the orders flowing between parts of the system follow the schema correctly.
The nice thing about this design is that it allows a system to be enhanced to add "a new kind of order application" very easily. So for example, if we wanted to add an application that, say, did statistical analysis of an archive of all previous customer orders to help us forecast customer purchases (perhaps inspired by Wal-Mart's data warehousing), we could do that.
Notice that with the "representation" approach, it wouldn't matter that our engineers never anticipated that an order should be able to answer questions that might be interesting to a new application, like "What was the difference between the estimated price and the actual price?" Since the full representation of the data is exposed, it gives us full flexibility to define new behavior in the future - new queries, new patterns of update.
Choosing Between Encapsulation and Representation
Which model should be chosen for our order-entry system? It is a business decision. Each approach has strengths and weaknesses. Here is the choice facing the owners of our car-repair shop:
Most modern businesses are inspired more by Wal-Mart than by Dixies. Wal-Mart is certainly the more intuitive model - it makes sense to be prepared to enhance an auto-repair computer system with more advanced computer functionality while keeping the same auto-repair-order data. It's a little more unusual to want to keep an auto-repair computer system, while changing the kind of underlying data (not to mention the associated staff, clients, and facilities) in order to turn it into a restaurant system.
In other words, people tend to hold the data that they have as sacred and something they want to keep forever, and they tend to assign less permanence to the particular programs that work with the data. This preference comes from practical experience. Experience has shown that hot new computer systems come and go, but a successful business keeps its boring old data forever.
Maybe the underlying causes for this experience can be attributed to Moore's law; or maybe on the amazing economies of magnetic media. Or maybe the success of representation is a characteristic that is inherent to all kinds of large-scale integration. (Compare the power of written language to that of spoken language.) At any rate, over the long term, representation tends to win over encapsulation as a large-scale architectural strategy.
When use Encapsulation?
Encapsulation is still useful in many situations, of course, but it tends to be most useful within transient programs when the particular data being encapsulated is temporary. Our guidelines are:
Clearly, when defining a data structure that is tailored to implement a specific algorithm - the kind of data structures we learn about in Freshman Computer Science 101 - our data is ephemeral. In that case, our program outlasts our data, and it makes a lot of sense to subdivide the program by encapsulating any data that it works with using object-oriented interfaces.
Encapsulation, therefore, is still the main technique to use when implementing a single tightly-coupled computer system. However, as soon as we move to larger, more loosely-coupled networks of systems where individual programs can come and go while the network of data exchange remains, data representation becomes a much more powerful technique.
Encapsulation is for organizing programming in the small, and representation is for organizing programming in the large.
Why we Care
Why is this all relevant? Because it gets behind the whole architecture of XML versus Java. The world is organized into many individual systems written in object-oriented programs like Java, exchanging data-oriented messages in formats like XML.
Java is an excellent tool that can be used to define robust encapsulation for programs, and is useful for building systems in the small.
XML is a standard idiom for describing a human-readable representation of data, and is useful for connecting system in the large.
In future articles, we will delve into more of the differences between "thinking XML" and "thinking Java." In many of these cases, the differences come down to the differences between representation and encapsulation.Posted by David at November 12, 2003 06:27 AM
|Copyright 2003 © David Bau. All Rights Reserved.|