For Visual FoxPro developers, data access represents the daily bread and butter. Data access is a core feature of Visual FoxPro (which after all is an xBase descendent) and nobody gives much thought to the ability to run a select-statement right within a VFP program.

Most languages however (including .NET languages such as C# or Visual Basic .NET) are not as data centric and thus do not support data manipulation as a core language concept. At least until now. But all this is about to change with the introduction of LINQ.

LINQ stands for “Language Integrated Query” and refers to the ability to execute data query commands right within .NET languages, in particular C# and Visual Basic .NET. For FoxPro developers who are also interested in .NET development, this is great news since it means that many features we normally associate only with Visual FoxPro will become available in .NET as well. For instance, consider the following Visual FoxPro command:

SELECT CompanyName, ContactName FROM Customers

This opens the customer table (DBF) and retrieves a result set that contains the first and last name fields. Similarly, .NET now supports the ability to select data. The following example also queries the customer table and retrieves the first name and last name fields for the result set:

From customer In customers
    Select CompanyName, ContactName

This was the Visual Basic .NET syntax. The same feature is also available in C#, where the exact command looks like this:

from customer in customers select new
     {customer.CompanyName, customer.ContactName};

There are a few aspects that stand out here. First, C# and VB.NET are pretty similar. In fact, both are very similar to the Visual FoxPro version, although at the same time, the syntax seems to be backward and slightly more complicated than in VFP (we will talk about that). There are also a few aspects that are not quite as obvious. For one, LINQ provides query capabilities, but it does not provide a data engine. In other words: LINQ does not support DBF files or anything of that nature. Instead, it still requires a different data source of some kind, be it a flavor of SQL Server (full server, SQL Express, or SQL Everywhere), XML, objects, or any other data source. This means that on one hand, the situation is disappointing for VFP developers because the concept of local data needs to be provided by a different mechanism. On the other hand, it is also very exciting, because there are many options for data access that are not available to VFP developers by means of a select statement (such as XML or objects).

It should also be mentioned here that the need for a local data engine will soon be satisfied with SQL Server Anywhere, a new SQL Server-like data engine that behaves much more like a VFP database than an SQL Server database. (SQL Server Everywhere shall not be confused with SQL Server Express).

Oh, and before you abandon VFP and jump right into .NET to build your applications based on this new technology, I should probably make sure everyone understands this LINQ has only been released as a CTP (Community Technology Preview) so far, and will ship as part of the next version of Visual Studio, currently code-named “Orcas”. So while this is very exciting technology, it is not available quite yet for production applications. (You can download the current CTP from www.msdn.com).

A Familiar Data Source: SQL Server

LINQ supports a wide variety of data sources, but let’s start our journey through the LINQ feature landscape at a familiar point: With a database back end. In particular, SQL Server. When it comes to accessing SQL Server from Visual FoxPro, one has a number of different options, such as Remote Views, or SQL Pass Through. All these technologies follow a similar approach: First, a connection to SQL Server is established, which is then used to fire a command against SQL Server. LINQ of course has to do the same thing, but the developer is rather removed from that aspect. LINQ simply provides a way to define a context that tells the LINQ engine how to get to SQL Server. This is the first step in connecting to an SQL Server database through LINQ, but there is another step that may seem unusual to VFP developers, but it is necessary in a strongly typed environment such as .NET: We have to tell LINQ what types of data sources we intend to talk to.

The easiest way of setting these things up is through Visual Studio. Whenever you create a Visual Studio application that includes the LINQ components (this will be the case if you install the LINQ CTP and create a new Visual Studio project based on a LINQ template), you can add a new LINQ database item to the project by picking “DlinqObjects” from the add-item dialog (Figure 1).

Figure 1: Adding a LINQ database object to a .NET project.

A LINQ Database object that has been added to a .NET project is represented by a visual design surface (see Figure 2). This design surface allows for quick and simple definition of SQL Server database objects that are available for LINQ queries. This may remind Visual FoxPro developers of the remote view designer in VFP.

Figure 2: Defining SQL Server database objects available for LINQ queries by drag&drop from the Server Explorer.

Once database objects are defined in a project, one can query from them without further need to create a database connection or anything of that kind. Here is the Visual Basic .NET code that queries from the customer table as defined by the setup show in Figure 2.

Dim db As New NorthwindDataContext()
Dim customers As Table(Of Customer)
customers = db.GetTable(Of Customer)()
Dim query = From c In customers _
    Select CompanyName, ContactName

And here is the C# version:

NorthwindDataContext db = new NorthwindDataContext();
Table<Customer>customers = db.GetTable<Customer>();
var result = from c in customers
    select new {c.CompanyName, c.ContactName};

So let’s see what these commands actually do. The first step is to tell LINQ where our data is supposed to be queried from. This is done by providing a data context. Such as context is created automatically by the LINQ database object designer. Since we dropped an object from the Northwind sample database, the name of the database context object defaults to “NorthwindDataContext”. This is the object we use here. Instantiating a data context object does not open a connection to the database right away, but it allows LINQ to do so when a connection needed. So we can think of this as a conceptual connection, even though the actual mechanism is different and more optimized than a regular connection.

The second step then is to define the table we are interested in. We can do so by instantiating a LINQ “Table” object. However, we don’t just want any table, we want a very specific table which has been defined when we dropped the customer table onto the designer. The name of the table object we want in this example is “Customer”. Internally, “Customer” is actually a class that was automatically created for the purpose of providing a map from the relational world of SQL Server, to the object-oriented world of .NET. We use .NET’s “generics” feature to instantiate a table object that is specific to the Customer class. (A discussion of generics is beyond the scope of this article. However, I have written about this subject in different articles. An in-depth discussion of generics can be found on my web site at www.MarkusEgger.com).

Finally, we are read to query data from that table. We do so using a command that is relatively similar to a VFP or T-SQL select command, except it seems to be backwards. While this seems odd at first, there is a good reason for the “from” part to come before the “select” part. This mainly has to do with ease of editing. Whenever the developer types “from customer”, the Visual Studio editor knows the source of the data. This means that by the time the developer types the “select” part of the statement, IntelliSense can kick in and provide a list of fields that are available to be selected (see Figure 3). If it were the other way around, this would not be possible, as the editor could not know what can be selected without knowing the source of the query. This becomes especially problematic in complex queries that include constructs like joins and the like.

Figure 3: Due to the “backward” nature of the select statement, IntelliSense can provide useful information.

A Peek Behind the Curtain

There are other interesting aspects of the select statement worth explaining. One of the fundamental questions of any select command has to be “what is the result set?”. In Visual FoxPro, result sets of select commands are cursors (in-memory tables). .NET does not have a native concept of a table or a cursor. At least not outside of ADO.NET, and LINQ aims to be more generic and not necessarily tied to ADO.NET. For this reason, the result set has to be something that is native to .NET, and there is only one option: An object. Every result produced by LINQ is an object of some sort. The object can be a DataSet or a DataTable, but in many cases it will just be an object that contains a list of other objects (a collection or an “enumerable” object).

In the above example, the result set in fact is a collection of objects. Each of these objects has two properties: CompanyName, and ContactName. The observant reader faces a head-scratcher at this point: It makes sense that the result is a list of objects with two properties, but where do those objects come from? In other words: Objects are instances of classes, but where is the class used for these objects? Well, classes aren’t delivered by the stork. In object-oriented development there is a one commonly used way of creating classes: They are hand-coded in source code. However, it would be extremely inconvenient to hand-code a new class just because a query needs a certain object with certain properties to create a result set. For this reason, the compilers in the next release of Visual Studio introduce a new feature called “anonymous types”.

The idea behind anonymous types is rather simple: Whenever the developer uses a construct in code that implies the use of a class that only has properties (no methods), then the developer does not have to hand-code the class. Instead, the compiler will take on that task and create the class automatically behind the scenes. The name of the class is not known (hence the term “anonymous type”), but that is OK.

Anonymous types can be used outside of select statements as well. Consider the following C# code:

new { Property1 = "Hello", Property2 = "World" };

This line uses a number of new C# features. The “new” operator instantiates a new object. This is very similar to calling the CreateObject() function in VFP. For instance, one can create an instance of a Form class using the following C # command:

new Form();

In our example however, we are using anonymous types, which means that we do not specify a class name. We only say “new” and nothing else.

Another new feature our example uses is “type initializing”. This means that within the curly brackets, we can immediately specify values for properties. In this case, we initialize the properties “Property1” and “Property2” with string values. In a similar example, we could instantiate a Form and immediately set the form’s caption (the Text property):

new Form() { Text = "My Form"};

The same concept does not exist in VFP. We would have to first issue a CreateObject() call, and then set the Caption property in the next line of code.

To get back to the example above: We instantiate an object of unknown type, and we set two properties to string values. The C# compiler can therefore imply that we want to create an instance of a class that happens to have two properties of the specified names and they must be of type “string”. Therefore, the compiler creates such as class behind the scenes for us, and we never have to worry about those details. Other than being auto-created and of unknown name, the anonymous class is a fairly ordinary class.

There is one piece missing to this puzzle: Normally when an object gets instantiated, we do not just create it, but we also assign it to a variable. Everything else makes no sense, since the object would go out of scope right away, and it would never be usable. In C# for instance, we can create a variable of type “Form”, and then instantiate a Form object in a single line of code like so:

Form f = new Form();

In order to create a variable, we have to specify its type/class (Form in this example). However, whenever we create an anonymous type, the name of the class is unknown. So how can we define the type of the variable? Well, we can’t. But the compiler can, since it knows the internal name of the anonymous class. We can instruct the compiler to insert that name using a “var” keyword instead of the actual type name:

var p = new { Property1 = "Hello", Property2 = "World" };

Note that “var” does not stand for “variant”. Instead, it simply is an instruction to the compiler to auto-determine the appropriate type for us.

Note that the values assigned to the properties can also be retrieved from other objects. Consider this example:

var p = new { CompanyName = customer.CompanyName,
     ContactName = customer.ContactName };

In this example, the values of the new properties get set to the values of the properties on the customer object. Note that there is a short-hand version in case the property names are the same on the new object as they are on the source object. In that case, the property name on the anonymous object does not have to be specified. In other words: The part to the left of the “=” sign (including the “=” sign) can be omitted:

var p = new { customer.CompanyName, customer.ContactName };

These are all rather cool productivity features that are useful for a number of scenarios. Most important for us however, is the use of this syntax in select statements. Armed with this information, we now understand the select-part of our example statement:

var result = from c in customers
    select new {c.CompanyName, c.ContactName};

This means that for every row in the queried source, we want to select/create a new object in the result set with two properties that are set to the values from the source.

Visual Basic supports all these concepts as well. However, Visual Basic tries to hide even more of the behind-the-scenes mechanics. However “select CompanyName, ContactName” in Visual Basic goes through the same exact motion C# does with its “select new {c.CompanyName, c.ContactName};” statement.

I think these are rather interesting details. Especially for those readers interested in object-oriented techniques, this is neat stuff. However, if you do not want to worry about all this stuff, or are not familiar enough with .NET to follow the examples, do not worry! The bottom line is that due to all this “magic”, queries in LINQ just work.

The Power of DLINQ

If you have followed along with the SQL Server data access examples, you now have a result set that contains data from our SQL Server query. The result is a collection of objects that can easily be enumerated:

NorthwindDataContext db = new NorthwindDataContext();
Table<Customer>customers = db.GetTable<Customer>();
var result = from c in customers select new
    {c.CompanyName, c.ContactName};
foreach (var cust in result)
{
    Console.WriteLine(cust.CompanyName);
}

There is one further detail that I have actually kept secret from you so far: The example above is actually a special case of “LINQ” called “DLINQ”, which is short for “Database LINQ”. DLINQ is a special version of LINQ that can query data from SQL Server (or any other database once third party database providers start supporting LINQ). We can actually spin this example a bit further and run a slightly more sophisticated query:

var result =
    from c in customers
    where c.LastName.StartsWith("A")
    select new {c.CompanyName, c.ContactName};

In this example, we limit the result set using a where-clause. The important part here is that the where-clause uses standard .NET (C# or VB) syntax. In this case, the LastName property/field is a string, and all .NET strings have a StartsWith() method that can be used to determine whether the string starts with a certain character. So in short, this query has to filter out all the records that start with an A.

This gets really interesting when you consider that this query is actually executed against SQL Server, which does not understand .NET syntax. Therefore, when this query is executed on SQL Server, it has to be translated to the following T-SQL statement:

SELECT CompanyName, ContactName
    FROM Customers WHERE LastName LIKE 'A%'

And this is quite the feat when you consider how flexible select statements can be. After all, LINQ supports all the typical query constructs such as joins and groups and so forth. However, DLINQ gets the job done. It is capable of translating any LINQ statement into something that is understood by the database on the back end (well, at least those statements that make sense for a database).

For those of you who are interested, DLINQ does this through Lambda Expressions and Expression Trees. A discussion of these techniques can be found on my web site.

The Power of Raw LINQ

So far, all the LINQ examples I used were special DLINQ examples. Those compare very well to queries in Visual FoxPro since they use a familiar data source. However, DLINQ is just a special case. Raw LINQ can query data sources and produce result sets in a much more flexible way. Fundamentally, LINQ is an engine that can query object data sources. This is why we had to map SQL Server objects to .NET classes. What makes LINQ really powerful (arguably much more so than VFP’s data manipulation language) is that anything in .NET is an object. This means that pretty much anything can be queried or produced by LINQ queries.

A simple example of an object data source is a string-array:

string[] names = {"John", "Joe", "Billy", "Markus"};

We can use a LINQ query to retrieve all the names that start with “J” and then order them by the length of the name:

var result =
     from s in names
     where s.StartsWith("J")
     orderby s.Length
     select s;

This also works with more complex lists of objects. Consider this Visual Basic .NET example:

Dim result = _
    From file In My.Computer.FileSystem.GetFiles("c:\") _
    Where file.IndexOf(".") > -1 _
    Select file _
    Order By file

This example looks at a list of all files in the C:\ folder and then queries all the files with a name that contains a “.” and returns the list sorted by file name.

This example also illustrates another detail of the select syntax that I left unexplained so far: While in VFP and in T-SQL the from-clause only uses the name of the source (such as “ from customers”), LINQ uses an “item in source” syntax (such as “from customer in customers”). Here’s why: Database scenarios are relatively static in terms of what can be inside a data source. Tables always contain rows. The customer table always contains a rows of individual customers. In an object-oriented environment however, things are not as simple. What is contained in the result set of the GetFiles() method above? It could be any type of object or list of objects. The above example contains a list of files represented by individual strings. As we want to reference those individual items and perform operations such as “file.IndexOf(…)”, we need to specify what each item is so we have an easy way to reference it. Compare the following two examples: In T-SQL, if we query from customers, we can use the following statement:

SELECT CompanyName FROM Customers
    WHERE Customers.CompanyName LIKE 'A%'

This is straightforward since the Customers table only has rows and we can check the value of each CompanyName in each row through Customers.CompanyName.

Now consider a hypothetical example from the object oriented world without the “in” clause:

Dim result = _
    From My.Computer.FileSystem.GetFiles("c:\") _
    Where My.Computer.FileSystem.GetFiles("c:\").IndexOf(".") > -1 _
    Select file _
    Order By file

This is complete mayhem and just plain would not work, since the where clause would repeatedly call the GetFiles() method, which overall is not at all what we want. Therefore, the “in” clause is sorely needed in an object-oriented query.

The Power of Objects

We have now seen a few examples of the power of running queries over objects. However, the power does not nearly stop there. Due to the nature of using objects as sources and result sets., LINQ queries can do amazing things. The above file-list example is unusual because it uses the result of a method call as the source for the query. However, the example is still relatively simple, since the list returned by the GetFiles() method is a simple list of strings. The query simply filters that list and returns the file names sorted alphabetically. But what if we wanted more than just the file name? Perhaps we want the file size or the creation time. That can be accomplished with LINQ as well, simply by returning a different select result:

Dim result = _
    From file In My.Computer.FileSystem.GetFiles("c:\") _
    Where file.IndexOf(".") > -1 _
    Select New System.IO.FileInfo(file)
 
For Each f In result
    Console.WriteLine(f.FullName + _
        " (" + f.CreationTime.ToShortDateString() + _
        ") - Size: " + f.Length.ToString())
Next

In this example, the result is not simply something that was part of the source (the file name). Instead, information from the data source is used to create a brand new result set (a list of FileInfo objects that are instantiated based on the file name).

Think about this for a moment! This is fundamentally different from any other query result I am aware of. Queries in languages such as VFP or T-SQL can only return information that is contained in the queried source. LINQ queries on the other hand can return practically anything. The following example for instance queries from a list of customers and launches new customer edit forms based on the queries information:

Dim result = _
    From customer In customers _
    Where customer.LastName = "Egger" _
    Select New CustomerEditForm(customer.Key)

And there you have it: A new customer edit form pops up for every customer in the data source with a last name that equals “Egger”. (This of course assumes that we have previously created an appropriate form class).

Another aspect of LINQ being based on objects is that you are completely free to add whatever functionality you want to these objects, and then use it as part of your queries. For instance, if you have added a method that checks for open invoices a customer may have, then you can use it in a query like so:

Dim result = _
    From customer In customers _
    Where customer.HasUnpaidInvoices(60) _
    Select New PrintCustomerStatement(customer.Key)

The idea behind this example is that the HasUnpaidInvoices() method accepts a number of days we are looking for. The PrintCustomerStatement() class we instantiate for the result would be used to print statements (it in essence is a report).

Note that you really have complete freedom in what the method does. For instance, the method could call a web service:

Dim result = _
    From customer In customers _
    Where Not customer.VerifyAddressWithPostalService() _
    Select customer.Key, customer.Name

In this example, the VerifyAddressWithPostalService() could call a web service offered by the post office (or a third party) to verify the customer address. The result of this query would be a list of all customer primary keys and names, for each customer who doesn’t have a valid postal address. Note that this could potentially be a very slow query, since the web service call would have to be placed for each customer. However, considering the usefulness of such a query, speed is probably not a main concern.

The Power of XLINQ

There is more to queries than data and objects. A new type of data source is here to stay: XML. LINQ acknowledges that fact by implementing another special flavor of LINQ specific to XML data sources. Let’s say instead of querying customers out of the Northwind database, you had the following XML document as a data source:

<?xml version="1.0" encoding="utf-8"?>
<Customers>
  <Customer ID="ALFKI">
    <Company>Alfreds Futterkiste</Company>
    <Contact>Maria Anders</Contact>
    <City>Berlin</City>
    <Country>Germany</Country>
  </Customer>
  <Customer ID="ANATR">
    <Company>Ana Trujillo Emparedados y helados</Company>
    <Contact>Ana Trujillo</Contact>
    <City>México D.F.</City>
    <Country>Mexico</Country>
  </Customer>
  <!-- More customers here... -->
</Customers>

XLINQ enables queries across XML by first mapping XML to objects. This is done through the new XElement object provided by XLINQ. Once an XML string is loaded into an XElement, it can be used for queries like any other .NET object. However, one still has to consider the nature of XML. For instance, each “Customer” node is a descendant of the entire “Customers” XML document. The following C# example queries a list of objects with customer and company names:

XElement nameXml = XElement.Parse(xml);
var names =
    from n in nameXml.Descendants("Customer")
    select new {CompanyName = n.Element("Company").Value,
         ContactName = n.Element("Contact").Value};

The result of this query is identical to one of the first examples that ran a query against a database table.

XML is somewhat unusual, in that XML is interesting as a data source, but it is also interesting as a result set. XLINQ can also be used to produce XML. For instance, we can go back to our file system example and create the result set in XML:

Dim result = _
  New XElement("Files", _
    From file In My.Computer.FileSystem.GetFiles("c:\") _
    Where file.IndexOf(".") > -1 _
    Select New XElement("File", file) _
  )
Dim xml As String = result.ToString()

The basic idea of this query is that first, a new XML node called “Files” is created. Then, as the second parameter of the object’s constructor, we pass the result of the query, which is another list of XML elements named “File” with the content set to the actual file name. The result is the following XML:

<Files>
    <File>c:\AUTOEXEC.BAT</File>
    <File>c:\boot.ini</File>
    <File>c:\CONFIG.SYS</File>
    <!-- more nodes here... -->
</Files>

This approach works in both C# and Visual Basic .NET. However, Visual Basic also supports a second approach in creating XML, by allowing XML as a native part of VB’s syntax:

Dim result = _
  <Files><%=From file In My.Computer.FileSystem.GetFiles("c:\") _
             Where file.IndexOf(".") > -1 _
             Select <File><%= file %></File> %>
  </Files>
Dim xml As String = result.ToString()

Using XML natively in Visual Basic is called “XML literals”. It is a rather convenient feature. For instance, it is possible to declare a simple XML document in the following fashion:

Dim xml = _
  <Files>
    <File>Autoexec.bat</File>
    <File>Config.sys</File>
  </Files>

However, it also tends to be a bit of a mess at times. Some people have thus been critical of the feature, calling it “the ultimate spaghetti code”. Personally, I think it is a cool feature that fits well in the VB.NET language and philosophy. Those that do not like it always have the option to either not use it or to use C# instead.

Tearing the Curtain Down

When we look at the LINQ syntax, especially in C#, some people might wonder how all this fits into the C# language. After all, C# is a very light-weight language with few commands that almost completely relies on objects. Some might consider it surprising that the C# team would bloat the language so substantially. And in fact, it turns out that the core C# language remains practically unchanged. All the LINQ language features are really converted to pure object-oriented syntax before they truly get compiled. For instance, consider this C# LINQ statement:

string[] names = {"John", "Joe", "Billy", "Markus"};
var result = from s in names where s.StartsWith("J") select s;

This really gets converted to the following “pure” C# statement before compilation:

var result = names.Where(s => s.StartsWith("J")).Select(s => s);

As you can see, C# really only uses method calls and “fancy” parameters. These “fancy” parameters are another new C# feature called “Lambda Expressions”, which are an evolution of C# 2.0’s anonymous methods. (A detailed discussion of all these techniques can be found on my web site). This leaves the question “why does the names array have a Where() method?”. The answer lies in yet another feature added to C# 3.0 called “extension methods”. This feature (which is also discussed in detail on my web site) allows the developer to add methods globally to all objects that do not already have such a method.

You do not have to know about these mechanics to use LINQ. However, if you do know about these details, you can take advantage of a number of beneficial side effects. Basically, all the LINQ features are really implemented through such extension methods, which one-by-one, show up on all objects whenever the LINQ assemblies are included in a project.

The interesting side effect resulting from this architecture is that all these features are usable individually. For instance, if you find yourself with a string array and would like to sort it by length of the string, you can simply use the LINQ Order-By feature all by itself:

string[] names = { "Markus", "John", "Joe", "Billy" };
var names2 = names.OrderBy(n => n.Length);

This works with all individual LINQ language features. So if you ever feel like joining two different object lists based on some criteria, you now have the means without the need to invoke a complete select-command.

Another side effect is that the LINQ extension methods only get added to objects that do not already have such methods. If the names object above wasn’t a string array, but instead some other object with an OrderBy() method, then that method would be used instead, even if the select-syntax was used. Therefore, whenever a select statement like the following is issued, the behavior of the order by clause could be overridden:

from c in customers orderby c select c;

In this example, the “customers” object could have an OrderBy() method, which the orderby-clause would automatically map to. So if you ever felt a certain aspect of query syntax didn’t work the way you expected it to, you now have the means to completely customize it.

Conclusion

LINQ is fascinating and powerful. This article has already grown fairly lengthy, yet I am still barely able to scratch the surface. LINQ brings a lot of the power of Visual FoxPro to .NET (it is fairly obvious that former VFP developers were involved in the creation of LINQ). However, LINQ is so much more than a simple DML (data manipulation language). It is hard to give a good expression of LINQ’s power in a single article, since many of LINQ’s unique features are only obvious in very large scenarios. It is the queries that retrieve data from SQL Server, joined with data from an XML document, joined with controls on a form, returning some information that is on-the-fly verified by a web service and ultimately served up as XML, that get me really excited.

I have spent a lot of time with LINQ and presented LINQ sessions at a number of conferences. I have also written a number of LINQ articles. Most of this content is available at my web site (www.MarkusEgger.com). In the not to distant future, we also plan to launch a web site dedicated to the LINQ community (www.LINQExperts.com). Feel free to stop by online, or send me an email if you have any questions.

Markus Egger