In our service-oriented world, users need the same experience on any device, whether mobile phone, office PC, or Internet café.

Moreover, they want the same experience any time they access applications, offline or online. For developers, this means tackling multi-tier, distributed, and concurrent programming. LINQ 1.0 radically simplified multi-tier programming with unified query and deep XML support. TESLA is a broad engineering program by the authors to extend the success of LINQ with external relationships, reshaping combinators, assisted tier-splitting, and join patterns.

LINQ 1.0: Standard Query Operators and Deep XML Support

It seems that developers spend the majority of development hours and computer cycles just plumbing: reshaping, reformatting, wrapping, connecting, serializing, and marshaling from one data model to another just to interface with supplied technologies. Most developers would rather spend their time and energy on presentation, business logic, data analysis, and data mining: the parts of applications that generate value.

To mitigate this situation, some academics and industry experts recommend picking one of the data models-XML, JSON, objects, entities, or relations-as the universal one and mapping all the other data models into it. Other experts prefer to invent grandiose new data models that purport to encompass all the other data models.

One Query Language, Many Data Models

We (the authors of this paper) do not believe in either of these “universalist” approaches. The allure of a unified data model is naïve in practice. The proposals, especially with their mapping solutions, are at least as complicated as the jumbles they contend to replace and their learning curves just add more developer cost.

It is preferable to unify at the level of operations, that is, to define a small, universal set of Standard Query Operators (SQOs)that work the same way on all data models. These operators are limitlessly composable-while the number of basic operators is small, the number of combinations is unlimited. That way, one small set of primitives covers all scenarios. Furthermore, SQOs can just as easily transform data from one data model to another, say from XML to objects, or entities to JSON, or relations to XML and back, etc., in a regularized way.

It is preferable to unify at the level of operations, that is, to define a small, universal set of Standard Query Operators (SQOs) that work the same way on all data models.

This is classic engineering by separation of concerns. Separate the data models from the query languages, look for common semantics amongst the query languages, and capture them in a higher-level definition that each data model can implement in its own natural way.

In current practice, every data model comes with its own idiosyncratic, tightly coupled query language. XML comes with XQuery; objects require hand-written query logic; and the data tier has SQL. Consider the following three ways to write the same example query. In XQuery, on the client, one writes:

from $c in customers/customer
where $c/city == "Seattle"
return <result>{$c/Name}{$c/Age}</result>

On the middle tier, one would write in C#:

class Result { string Name; int Age; }
...
var r = new List<Result>();
foreach(var c in Customers)
if(c.City == "Seattle")
    r.Add(new Result{c.Name, c.Age});

And, on the data tier, one would write SQL:

Select Name, Age
From Customers
Where City = "Seattle"

In each case, the meaning of the query is the same:

“Give me the collection of name, age pairs from customer records such that the customer’s city is Seattle.”

It is intolerable to impose the burden of managing such superficial and gratuitous differences on the developer. It doesn’t matter whether, under the covers, Customers are objects, rows, XML elements, or any other kind of collection, and developers should insist on writing the same query the same way. This is exactly what LINQ 1.0 query comprehension syntaxdoes. All three queries look the same in C# 3.0:

from c in Customers
where c.City == "Seattle"
select new { c.Name, c.Age }

Visual Basic 9 has a very similar syntax. The compiler simply translates comprehension syntax into nested calls of the SQOs, and the SQOs, in turn, call methods in the interface IEnumerable. Only the implementations of the various IEnumerables for the various types of Customers know whether to scan elements, iterate over objects, or join over tables.

Mathematicians recognized this “design pattern” decades ago. Each data model defines a monad and the query is a monad comprehension. Don’t let the terminology intimidate; monads are very simple structures and monad comprehensions are just like the “set” comprehensions you learned in school, for example, “the set of all integers x such that x is divisible by 5.” Just as sets work the same whether they contain integers, apples, or even other sets, so monads work the same whether they contain objects, elements, or rows. IEnumerable captures monadic “collection-ness” and hides the differences. Any class that implements IEnumerable automatically supports the SQOs.

Operators and Syntax

The following six SQOs are the core of LINQ 1.0. These are precisely the monadic primitives for filtering, transforming, joining, grouping, and aggregating over arbitrary collections of arbitrary types (abbreviating IEnumerable as IEn):

static class Sequence
{ static IEn<T> Where<T>
    (this IEn<T>, Func<T, bool>) { }
  static IEn<S> Select<T, S>
    (this IEn<T>, Func<T, S>) { }
  static IEn<S> SelectMany<T, S>
    (this IEn<T>, Func<T, IEn<S>>) { }
  static IEn<V> Join<T, U, K, V>
    (this IEn<T>, IEn <U>,
      Func<T, K>, Func<U, K>, Func<T, U, V>) { }
  static IOrderedSequence<T> OrderBy<T, K>
    (this IEn<T>, Func<T, K>) { }
  static IEn<IGrouping<K, T>> GroupBy<T, K>
    (this IEn<T>, Func<T, K>) { } }

Consider the first one, Where<T>, with T representing a type such as Int or String or Customer. Where takes an IEnumerable<T> and a Func<T, bool>. This Func is a predicate function that takes a T and returns a Boolean. Where returns a new IEnumerable<T> that contains only the elements of its input IEnumerable<T> that satisfy the predicate function.

Read the rest of the operator definitions in a similar fashion and look up details and examples in any number of places on the Web such as http://www.codeplex.com/LINQSQO and http://www.hookedonlinq.com. There are a few more convenience operators in the SQO specification, but these six are the kernel set.

All the SQOs are static methods that take any IEnumerable<T> as their first argument and return an IEnumerable of another potentially different type S. Because they are static methods with no implicit this parameter, you can write SQOs for a class without changing or even recompiling the class. This critical design choice ensures modularity (see sidebar, “Modularity Saves Money”). Furthermore, since every SQO just consumes and produces IEnumerables, you may compose and nest SQOs in an unlimited variety of ways. This is another critical design choice (see sidebar, Composability Saves Money). TESLA carries forward these design principles through external relationships and reshaping combinators. I’ll discuss more about them below.

Deep XML in Visual Basic 9

Sometimes, you are working with a single data model, and would like detail-control leverage from the programming language. Therefore, in LINQ 1.0, Microsoft also built deep XML integration into Visual Basic 9. If you are working directly with XML documents, you can write the query as follows:

From C in Customers.<customer>
Where C.@City = "Seattle"
Select <result>
          <%= C.<Name> %>
          <%= C.<Phone> %>
       </result>

The compiler just translates such queries into expressions over the SQOs, which, in turn, are over the LINQ-to-XML provider.

LINQ 1.0 Summary

LINQ 1.0 solved the impedance mismatch problem by reframing it from the intractable one of data models to the tractable one of query languages. LINQ 1.0 also dramatically simplified the work of XML users.

LINQ 1.0 solved the impedance mismatch problem by reframing it from the intractable one of data models to the tractable one of query languages.

TESLA: Reaching Across the Cloud in Space and Time

Much work remains to achieve the larger goal of democratizing the Cloud: making distributed, data-intensive programming easy enough for everyone. The next big issues are object-relational mapping, distributed programming, and concurrent programming. This article’s solutions for these issues are external relationships, reshaping combinators, assisted tier-splitting, and join patterns.

Co-Dependency and Non-Composability

The “last mile” of data programming, mapping one data model to another, remains painful. In particular, the mapping between relational data and objects is very thorny. Typically, one finds code generators that read relational schemata and write class definitions or one finds external mapping-specification files, perhaps in XML. At first glance, these would seem to satisfy goodness criteria like eliminating hand-written code or separating the concerns of mapping and data representation. However, the cost with both approaches is complexity. If you can configure the code generators at all, they entail a configuration language more complex than either the relational model or the object model. Similarly, the external mapping languages can be much more complex than either side. Some will argue that their complexity is unavoidable, because mapping is dealing with the difficult problem of impedance mismatch. To be rich enough for real-world scenarios, they must support the Cartesian product of features from both sides. This article argues, however, that since their complexity admittedly arises from the beginning with the union of the data models, everything from both sides, you can eliminate it by beginning with the intersection, the parts the two sides have in common. To build up richness, you can resort to the proven technique of composability: define a small number of things that fit together in an unlimited variety of ways.

External Relationships Cure Co-Dependency

Because you want ordinary object-programming dot notation to traverse primary-key and foreign-key relationships between classes in both directions, typical mapping solutions interpolate explicit properties into class definitions. See Listing 1 for a typical example. These properties introduce tight, bi-directional, internal coupling between the Customer and Order classes. Changing one may require changing the other, particularly if the primary or foreign key specifications change. Even worse, introducing a new relationship, say between Customer and PostalAddress, requires changing the Customer class, with changes possibly cascading to other classes.

In the database, coupling is usually only one way. Relational databases encode one-to-many relationships upstream from the target table (Orders) to the source table (Customers) by foreign key. However, once you map to objects, you get bi-directional coupling. Even if you only want one navigation direction, the most natural one is the other way, downstream from the source table (Customers) to the target table (Orders), ignoring foreign keys. In other words, the natural object solution defines a property in Customer referencing the collection of Orders, but this property has no overt reflection in the store.

Recall that the SQOs, as static methods, avoid gratuitous tight coupling. Perhaps you can do something similar here and-just as in LINQ 1.0-introduce convenient syntactic sugar. This is exactly what the authors have done, working with our colleague, Kasper Osterbye of IT University at Copenhagen.

Developers should have notation for declaring relationships as properties externally to the class definitions, hence to have ordinary dot notation for accessing properties bidirectionally, but without modifying or even recompiling the original class definitions. Customers and Orders do not need to know exactly how they are related or even if they are related. All that matters is that application code can get from a customer to its order and from an order to its customer.

Dot notation naturally chains, so the properties remain composable. Internally, the relationship machinery must honor the primary-key and foreign-key constraints from the relational model, but will yield much more stable and decoupled class definitions at the object level. While it is too early to propose the precise syntax, it will resemble Listing 2.

Reshaping Combinators Cure Non-Composability

Since existing mapping solutions go all the way between tables and classes, and never tables-to-tables or classes-to-classes, it is impossible to build up mappings incrementally by composing smaller, more understandable mappings in sequence. You must make all mapping decisions at one time, dramatically increasing the cost (see sidebars). Contrast that with the composability of traditional database views, which you can build up block-by-block because combining one view with another yields a new view.

To cure mapping non-composability, we define a small set of reshaping combinators. Like the SQOs, these are static methods that take a reshaping and return a reshaping, and which can combine in nearly arbitrary ways for an unlimited variety of solutions. The authors worked with our colleague, Mark Shields, to determine a small set that suffices for all practical scenarios. While it is much too early to propose specifics, a rough explanation means that you work in a simplified monad of tuples and pose an effective collection of operations like the following:

  • Adding, dropping, and renaming fields
  • Nesting and un-nesting tuples
  • Generalizing and specializing field types
  • Scalar transforms
  • Arithmetic on field data

Mapping Solved

With external relationships and reshaping combinators, you only need a very thin layer of non-programmable default mapping at the edges between the relational world and the object world. Developers will express everything else in their favorite LINQ-enabled programming languages.

Just as LINQ 1.0 solved impedance mismatch by turning it into a solved problem (monad comprehensions), TESLA will solve the mapping problem by turning it into something solvable: external relationships and reshaping combinators.

Just as LINQ 1.0 solved impedance mismatch by turning it into a solved problem (monad comprehensions), TESLA will solve the mapping problem by turning it into something solvable: external relationships and reshaping combinators.

The Cloud Reaches Across Space and Time

The other open problems we will attack in TESLA are distribution and concurrency, the new application design patterns for the Cloud.

In an increasingly distributed and service-oriented world, users demand the same experience everywhere they may be, using any device, and at any time, connected or not. For developers, this means tackling distributed programming (multiple places) and concurrent programming (multiple times). How do you get there from the familiar world of one-machine, step-by-step applications?

Assisted Tier-Splitting

For the space dimension, how can you get, with minimal agony, from a program that runs on one machine to a program that runs on two machines, and from there, to any number of machines? You want to avoid maintaining different application builds for each client platform. You would even like to delay assigning functionality to client and server until the very last responsible moment, when you know the capabilities of the client device.

We want to start with an ordinary, single-tier application, and then successively refactor the code into a distributed application. To be clear, automatic repartitioning is bad. However, there is plenty of opportunity just for good tooling for programmer-specified repartitioning.

Tier-splitting refactoring honors the converse statement of the expansion theorem from process algebra. In its forward direction, the theorem describes the parallel composition of two programs as an arbitrary interleaving of a single sequential program. We go the other way, describing a sequential program as a parallel composition of two programs, ensuring that the parallel execution of the new programs maintains invariants of the original sequential program. The transformation inserts synchronization, marshaling, and security logic using join patterns, described below.

By the converse of the expansion theorem, parallel execution of the new programs maintains invariants of the original sequential program.

To see how this works in practice, imagine a tiny “dictionary-suggest" application that proposes completions when a user types part of a word. The Suggest method in class Dictionary returns the topmost n descriptions that match a given prefix p. As a single-tier, sequential program, this is a straightforward LINQ query over a collection of dictionary entries:

IEnumerable<Entry> Suggest(string p, int n)
{
    return (from e in this.entries
    where p.Length > 0 && p.Matches(e)
    select e).Take(n);
}

The client program, at each keystroke, takes the value of the textbox i, looks up the completions of the word in the dictionary, and populates listbox o with the first 10 results, properly formatted:

var d = new Dictionary();
var i = new TextBox();
var o = new ListBox();
//... set up UI ...
i.KeyUp += delegate {
    o.Items = Format(d.Suggest(i.Text, 10));

You want to test and tweak this application purely on the client tier until satisfied with its functional behavior. To deploy the Dictionary class as a service on the middle-tier, just add a custom attribute [RunAt(Server="...")] to the class declaration:

[RunAt(Server="http://foo.baz.qux/...")]
class Dictionary {...}

Guided by this metadata, the compiler rewrites the MSIL of the client-side Dictionary class with a proxy that calls into a server-side stub to the original implementation. Client and server communicate via a stateful RPC-style protocol using JSON, XML, or S-expressions as the serialization format. The run-time infrastructure picks the format based on additional metadata in the custom attribute.

As with all distributed applications, you must worry about network latency and availability. All remote calls should be asynchronous. Let the compiler write a new overload for the Suggest method with an [Async] custom attribute:

[Async]
extern void Suggest(string p, int n,
        Action<IEnumerable<Entry>> cont);

This new overload takes an additional parameter, a continuation delegate of type Action<...>. This continuation takes as input the IEnumerable<Entry> that was the output of the old, synchronous Suggest. This is a completely automatable rewrite into continuation-passing style:

var o = new ListBox();
...
i.KeyUp += delegate {
    d.Suggest(i.Text, 10,
        entries => {o.Items = Format(entries); });

Don’t Stop at Just Two Tiers

With our colleague Dragos Manolescu, we are pursuing much more broadly distributed real-world applications. This means continuing tier-splitting recursively beyond client and server until all computation happens close to its data and resources. Let’s move the dictionary query into the data tier and, perhaps, split the client tier to run UI in Silverlight, but leave the Dictionary proxy class in the HTML DOM.

On all tiers, we do not assume there is a native .NET runtime. Instead, we target any available runtime, adopting the point of view of a pluggable compiler back end. In the near term, the minimal client will be a browser with JavaScript. Therefore, we translate IL into JavaScript with very high fidelity, by deep embedding. Similarly, in the near term, the typical server will be a cluster of commodity-hardware blades. Therefore, we have a version of LINQ where GroupBy aggressively exploits the parallelism inherent in horizontally partitioned data.

Join Patterns

Finally, it has to be easier for developers to specify synchronization and replication of program states. This is essential for occasionally connected and collaborative scenarios. In the past, Microsoft offered little more than thread pools, message queues, monitors, semaphores, and so on; that is, low-level communication primitives. Overt process notations like the Pi-calculus also seem to be off-the-mark for mainstream developers working with object-oriented programming.

We deem C-omega style join patterns to be the most attractive alternative for TESLA and we are pursuing them for .NET languages with our colleague Claudio Russo.

Join patterns are a generalization of declarative events that developers already know well. With declarative events, the developer declares a disjunction (OR-chain) of events to which a set of event-handler methods must respond. In the following example, the Visual-Basic Handles clause below specifies that the method __Click_Or_Focus must run whenever either the Button.Click or the Textbox.Focus event is received:

Sub __Click_Or_Focus _
  (ByVal S As Object, ByVal E As EventArgs)
      Handles Button.Click, TextBox.Focus
  ...
End Sub

Declarative join patterns substitute a conjunction (AND-chain) of messages as opposed to a disjunction of events. The new When clause is the parallel construction to the existing Handles clause. The When clause marks a method as one that must fire when all the specified messages have arrived. In caller code, messages just look like method calls, with asynchronous ones returning immediately and synchronous ones blocking.

For just a taste of this style of programming, consider Listing 3. __Put_And_Empty will fire when at least one Put message and at least one __Empty message arrive, and __Take_And_Contains will fire when at least one Take message and at least one __Contains message arrive.

Conclusion: Repeating the Success Formula

LINQ 1.0 broke ground for radically simplifying distributed data-intensive applications in .NET. TESLA will stretch the programming model to cover the Cloud.