Value Objects in Newspeak

This is a quick dump of a rough design sketch for Value objects in Newspeak, which builds upon section 3.1.1 of the current version of the Newspeak language specification.

  1. Value classes allow explicit intent. The class declaration is automatically annotated with metadata that expresses the intent for instances to be value objects.
  2. Value classes use special syntax that introduces the said metadata annotation (e.g. valueclass X instead of class X).
  3. Value classes can only be mixed in with other Value classes.
  4. Value classes can only have immutable slots.
  5. The root of the value classes is Value, which extends from Object. The Value class overrides the ==  method and delegates it to =. The Value class overrides = to compare all the slots recursively using =. The Value class overrides the asString method to give a neat stringified representation of the Value object in a JSON-like format. Value class computations for =, asString bottom out on built-in Value classes, like Number, Character, String etc. (overriding = and asString is explicitly inspired by the behavior of case classes in Scala).
  6. The Value class overrides the identityHash method to delegate to the hash method, and overrides the hash methods with some simple, yet-to-be-determined, recursive hashing algorithm (e.g. XOR-ing the hashes of all the slots).
  7. Value objects can only point to other Value objects.
  8. Value class declarations can only be nested inside other Value class declarations.
    Update 2/10/2012: Another option that seems very attractive right now would be to allow value class declarations to be lexically nested inside non-value class declarations but cut off from the non-value part of their lexical scope (the enclosing object chain stops at the outermost value class, excluding all enclosing non-value classes).
  9. This implies the enclosing object of a Value class is always a value object.
  10. Simply annotating a class declaration as “<Value>” is not enough. Syntax is required for valueclass declarations in order to ensure that Value classes always extend other Value classes. This allows a Value class with no explicit superclass clause to implicitly extend the built-in Value class, instead of Object, which is the default superclass for regular classes.
  11. The constraints on Value objects and Value classes are verified at mixin application time (the superclass is a Value class), and object construction time (all slots contain other Value objects).
  12. The enclosing object does not need to be verified at mixin application time, because the enclosing scope of a Value class declaration can be verified at compile time.
  13. Value classes are also Value objects.
  14. nil is a Value object.
  15. Value class declarations can contain nested non-value (regular) class declarations. More generally speaking, Value objects can produce (act as factories for) non-value objects.
    Update 2/10/2012: An important corollary of the above is that non-value classes enclosed in a value object are value objects themselves.
  16. Value objects are awesome! They are containers for data and the unit of data transfer between Actors in Newspeak, and also the building block for immutable data structures.
  17. Update 11/24/2011:
  1. Every class whose enclosing object is a value object is also a value object (but not necessarily a value class!).
    Update 11/27/2011:
    Justification for the above is: if multiple equivalent instances of a value class are indistinguishable, then all of the instances’ constituent parts, nested classes included, must be indistinguishable as well. Think (a == b), but (a NestedClass == b NestedClass) not – this is unacceptable!
  2. We must determine rules for when closure and activation objects are value objects, so we can safely deal with simultaneous slots in value classes (at construction time, the closure object that captures each simultaneous slot initializer must be a value object, then at lazy evaluation time, the result must be a value object, otherwise an exception is thrown and the simultaneous slot is not resolved).
    Update  2/10/2012: One alternative that comes to mind but does not seem very attractive would be to have special syntax for closures that are value objects, say {{ … }} denotes a closure that is always a value object and has no access to enclosing mutable state.
    A more attractive alternative would be to extend the syntax for object literals to support value object literals. All of a sudden object literals appear much more important than before. For example, value-object closures and/or object literals make it possible to build  a Scala-like parallel collections library on top of actors.
    Actually the above is not quite correct: a Scala-like parallel collections library in newspeak would benefit more from value class literals that can be nested inside non-value classes

References and Actors

In E, references are distinct from the objects they designate. This might seem apparent, but it is not necessarily so. In traditional languages like Java, first-class references are almost indistinguishable from the objects they designate. They are internally represented as 4- to 8-byte pointers and while there is a distinction between reference equality (two references pointing to the same object) and object equality (two distinct objects with identical/indistinguishable contents and behavior), there is not much else to worry about.

In E, however, the rabbit hole goes deeper. There are multiple *types* of references. The word type might be a misnomer but I find it as one good way to think about references. In the E thesis, there is no discussion of types of references, but rather states that a reference can be in:

  • Local Promise
  • Remote Promise
  • Near Reference
  • Far Reference
  • Broken Reference

In this discussion reference type is a synonym for reference state. The reason the term state seems more appropriate is that a single reference goes through several transitions between states in the course of its lifetime. In other words, a reference can switch types.

The problem this poses for an implementation is that references in different states hold different information. A near reference is the simplest case, the familiar reference from Java – it holds the address of an object within the current VM’s heap. A promise however, holds an unbounded list of pending messages and whenResolved listeners. A far reference holds whatever information is necessary to transmit messages to its target object, including potentially a distinct queue of messages pending delivery. A broken reference holds exception information regarding the reason for the reference breakage.

Classes are a natural way to think about implementing each different state of a promise – the information and behavior for each state of a reference is represented by a distinct class. The problem arises when the reference needs to switch states, therefore the need arises for an object to change its class dynamically, which is not traditionally available functionality in object-oriented programming languages.

Another problem is the possibility that references might chain. In other words, the possibility that a reference might point to another reference, instead of directly pointing to an object. A promise might get resolved to another promise. Or, even more disturbingly, a far reference might point to a promise reference. This possibility of chaining is actually excluded in the reference states model presented by Miller. Instead of a promise resolving to another promise, the promise reference simply makes a state transition, or in other words, the reference becomes the other promise, instead of pointing to it. In a similar fashion, a promise will get deserialized as a promise for the same result as the original promise, instead of being deserialized as a far reference to a promise (which would introduce chaining of references).

This, in essence, leads to an important conclusion. The serialization/deserialization implementation must include special handling logic for references in different states. For instance:

  • Near reference might have to be deserialized as a far reference
  • Near promise is deserialized as a remote promise linked to the same resolver as the original near promise

etc.

Furthermore, the resolution logic must handle the distinct cases as well, in order to handle the different state transitions possible that originate from the promise states:

  • Become another promise (for a new result)
  • Resolve to and become a far or near reference

As already discussed, the most natural way to implement the different reference states is as classes. At this point the notion of reference states as types comes into the picture. References are objects in our runtime VM but they are a distinct type of proxy objects that provide special services and require distinct treatment from regular application objects. As explained above, for deserialization and resolution purposes, we need to be able to distinguish between a near reference to an application object and a near reference to a reference object (since fundamentally the runtime VM only provides primitive support for near references, and the other reference states are reified as regular objects). Furthermore, it is clear from the examples above that we also need to be able to distinguish between the different *types* of reference states (Local Promise vs Remote Promise vs Far Reference etc).

Since in Newspeak, there is no global namespace for classes, and at runtime all classes are simply a dynamic aggregation of mixin applications, we cannot test the class names of objects. Conceptually this would be equivalent to having some sort of type system anyway. But this is exactly what we need – to be able to distinguish between different types of objects (one per reference state), albeit a very restricted set.

Since the Past and Actors libraries are a core part of the language, I propose to meet the need for type checking using the following idiom, which is slightly different from the is* message idiom for arbitrary objects, already implemented in the NewseakObject doesNotUnderstand: protocol. The idea is that since the Past and Actors libraries are singleton modules and the sole managers of instances of objects that represent reference states, and not likely candidates for extension by applications, we can simply test for class equality like this:

(obj class = Promise)

where obj is an object whose type is being tested and Promise is a reference to the class instance local to the current singleton module instance. Naturally,

Promise new

is exclusively used to construct Promises from within the Past module, for example.