davidbau.com Encapsulation or Representation

Encapsulation or Representation

Several engineers are building a complicated software system together, and naturally, they are trying to figure out how to divide the work up. There are two different strategies that they can take.

Encapsulation. The engineers can agree upon the "behavior" they must build and the "behavior" other components of the system must support.
Representation. The engineers can agree upon the "data" they are allowed to produce and the "data" they need to consume.

In the abstract, you can approach any architectural division either way, although in practice usually one way fits better than the other. How can you tell which one is right? Let's take a look at an example.

Order Management by Encapsulation

Suppose you are building an order management system for a car-repair shop. Then the idea is that you have various "orders" that go through the system; each order has information about the customer, the request, the price, and so on. Similarly, there are several different things to do with an order, such as displaying them, submitting them for execution, and closing them out when they are done.

One way to divide up the work is to encapsulate the behavior or an "order" object as a formal interface, and then leave it up to the different engineers to implement the order objects using whatever data representation they choose. In this classical, often-advocated object-oriented approach, the process begins with the engineers deciding, for example, that an "order" object is just a bit of running code that supports a formal interface:

interface Order { show(); // present the order to the user for editing submit(); // send the order in for execution close(); // close out an executed order }

Once they have agreed upon how to encapsulate an order in the abstract like this, the engineers can then go and work independently. All the details of "what an order is" would be up to the implementor of any particular order object, so long as the order object supported the encapsulating interface.

The nice thing about this approach is that it allows a system to be enhanced with a "new kind of order" without much difficulty. For example, if we were an auto repair shop getting into the restuarant business (perhaps inspired by Dixies BBQ and Auto Repair), we could probably have some software engineers convert our order management system from being able to handle only "car" orders into one that could also handle "food" orders. Although many details of the order may be different, as long as they can implement the same "Order" interface, the system should work. Similarly, "order users" can also be enhanced fairly easily, so long as they only use the services that an order has been defined to provide.

Order Management by Representation

The other way to divide up the work is to represent an "order" as a formal data format, and then leave it up to the different engineers to build systems that exchange that data for whatever purpose they choose. They would just need to ensure, for example, that an "order" is a document that matches a formal defined data format:

<order> <customer>12345</customer>  <date>2003-11-26</date>  <request>Change oil, check tires</request>  <estimate>45.00</estimate>  <assigned>Jim</assigned>  <charge>450.00</charge>  </order>

Once the representation of an order has been agreed upon, the engineers can then go and work independently. Exactly "what should be done with an order" would be left to the implementor of any particular subsystem, so long as the orders flowing between parts of the system follow the schema correctly.

The nice thing about this design is that it allows a system to be enhanced to add "a new kind of order application" very easily. So for example, if we wanted to add an application that, say, did statistical analysis of an archive of all previous customer orders to help us forecast customer purchases (perhaps inspired by Wal-Mart's data warehousing), we could do that.

Notice that with the "representation" approach, it wouldn't matter that our engineers never anticipated that an order should be able to answer questions that might be interesting to a new application, like "What was the difference between the estimated price and the actual price?" Since the full representation of the data is exposed, it gives us full flexibility to define new behavior in the future - new queries, new patterns of update.

Choosing Between Encapsulation and Representation

Which model should be chosen for our order-entry system? It is a business decision. Each approach has strengths and weaknesses. Here is the choice facing the owners of our car-repair shop:

If it is important for you to keep your current computer applications, but you want to free yourself to let your applications be capable of working with new kinds of data - as Dixies has done by adapting their existing business process to a new kind of business - then what you want is program encapsulation.
On the other hand, if it is important for you to keep your current data, but you want to fee yourself to use your data in new applications in the future - as Wal-Mart has done by finding new business techinques to utilize their cash register receipts - then what you want is data representation.

Most modern businesses are inspired more by Wal-Mart than by Dixies. Wal-Mart is certainly the more intuitive model - it makes sense to be prepared to enhance an auto-repair computer system with more advanced computer functionality while keeping the same auto-repair-order data. It's a little more unusual to want to keep an auto-repair computer system, while changing the kind of underlying data (not to mention the associated staff, clients, and facilities) in order to turn it into a restaurant system.

In other words, people tend to hold the data that they have as sacred and something they want to keep forever, and they tend to assign less permanence to the particular programs that work with the data. This preference comes from practical experience. Experience has shown that hot new computer systems come and go, but a successful business keeps its boring old data forever.

Maybe the underlying causes for this experience can be attributed to Moore's law; or maybe on the amazing economies of magnetic media. Or maybe the success of representation is a characteristic that is inherent to all kinds of large-scale integration. (Compare the power of written language to that of spoken language.) At any rate, over the long term, representation tends to win over encapsulation as a large-scale architectural strategy.

When use Encapsulation?

Encapsulation is still useful in many situations, of course, but it tends to be most useful within transient programs when the particular data being encapsulated is temporary. Our guidelines are:

Use encapsulation when your program outlasts your data.
Use representation when your data outlasts your program.

Clearly, when defining a data structure that is tailored to implement a specific algorithm - the kind of data structures we learn about in Freshman Computer Science 101 - our data is ephemeral. In that case, our program outlasts our data, and it makes a lot of sense to subdivide the program by encapsulating any data that it works with using object-oriented interfaces.

Encapsulation, therefore, is still the main technique to use when implementing a single tightly-coupled computer system. However, as soon as we move to larger, more loosely-coupled networks of systems where individual programs can come and go while the network of data exchange remains, data representation becomes a much more powerful technique.

Use encapsulation to subdivide tightly-coupled components of a system.
Use representation to connect loosely-coupled systems together.

Encapsulation is for organizing programming in the small, and representation is for organizing programming in the large.

Why we Care

Why is this all relevant? Because it gets behind the whole architecture of XML versus Java. The world is organized into many individual systems written in object-oriented programs like Java, exchanging data-oriented messages in formats like XML.

Java is an excellent tool that can be used to define robust encapsulation for programs, and is useful for building systems in the small.

XML is a standard idiom for describing a human-readable representation of data, and is useful for connecting system in the large.

In future articles, we will delve into more of the differences between "thinking XML" and "thinking Java." In many of these cases, the differences come down to the differences between representation and encapsulation.

Posted by David at November 12, 2003 06:27 AM

Hi,
It is an interesting concept. But I have a thought. If a team of 10 engineers were to design and develop a complex system using "Representation" focusing on produced and consumed data; how much more work (system analysis and design) would these engineers have to do in order to create a clear picture of the data being produced and consumed. If you were to start working with data directly, it would seem that you would be taking the approach of building a system from bottom-up instead of top-down. In my opinion designing a system from top-down would produce a better system in terms of sticking to proposed requirements and scope.

These are just my thoughts coming from a Jr. Developer. Please feel free to comment and point out any misunderstanding.

Posted by: Kamleshkumar Patel at November 26, 2003 11:51 PM

Should developers agree on the format of the data interchanged (representation), or an abstraction of the protocol for exchanging that data (encapssulation).

The answer is: yes.

Posted by: Steve Holiday at November 27, 2003 12:22 AM