C-Sharpest

C# 2.0 just shipped with a number of interesting new features: anonymous methods, nullable objects, iterators, partial classes, generics, and others. But the innovation does not stop there! Microsoft (and Anders Hejlsberg in particular) have already allowed us a sneak peek at some of the new features that will be available in C# 3.0.

Many of these new features are either evolutionary changes for features that have been available before, or they are features needed by LINQ, the integrated query language which will be part of C# 3.0 as well as the next version of Visual Basic. LINQ itself can be seen as a new language feature for C# 3.0, although in some ways, it is the sum of many other innovations. On the other hand, LINQ is so fundamentally powerful and far-reaching that it deserves a series of articles on its own. For this reason, I will largely ignore LINQ in this article, but I am already working on an in-depth view which I’ll write about as both an eColumn as well as in printed article format

So what’s new in C# 3.0?

Type Inference

Type declaration and instantiation is seemingly simple in C#. You simply declare a variable of a certain type and then instantiate it. For example, to create an instance of a Windows form, you could use code like this:

Form frm = new Form();

Of course, in this particular example, the statement seems somewhat redundant, since the type that is instantiated is identical to the type that is declared. This is not always the case. You could instantiate a more specialized form yet keep the variable type more general like so:

Form frm = new CustomerEditForm();

This is a pretty specialized example and in many cases, the type declaration is identical to the instantiated type.

So far, the redundancy in declaring and instantiating objects has not been a major problem. However, I find it annoying with lengthy type names. And you’ll find it particularly problematic when you have multiple types involved, which you may find with generics. Consider this example:

Dictionary<Color,List<KeyValuePair<string,int>>> myList =
  new Dictionary<Color,List<KeyValuePair<string,int>>>();

This isn’t fun. Many developers may think, “How many more times do I have to state that I want an object of type x?” If you are one of those developers, Anders Heijlsberg feels your pain and has thought up a feature known as type inference. This sounds fancy but is pretty straightforward. It simply means that whenever a type can be identified unambiguously, then the developer does not have to explicitly declare it. Using type inference, you could define a string in this fashion:

var name = "Markus";

This declares a variable of type “string” called “name” with a value of “Markus”. The “var” keyword is a bit deceiving. It does not stand for variant. It stands for variable. A variable declared in this fashion is strongly typed and, in this example, is in any other way identical to a variable declared as a string. The only difference is that the compiler figures out the type rather than the developer. If someone tried to assign anything but a string, or use the variable in any non-string way, it would result in a compiler error just like it would if the variable was explicitly typed as a string.

This works with all types. For example, you could write the form example from above in a functionally identical fashion like so:

var frm = new Form();

Note that you could not create the second form example with the var keyword. Whenever the instantiated type needs to be different from the declared type, you must use the same syntax that you already use in C# 2.0. And of course, this syntax will still work in C# 3.0 and it is probably better style to explicitly declare variables unless type inference provides a serious advantage similar to the needs for type inference I will discuss later.

Using the type inference feature, you could use the generic from above in a much simpler fashion:

var myList =
  new Dictionary<Color,List<KeyValuePair<string,int>>>();

In these examples, type inference was mainly a convenience feature. Some of the examples I will introduce later rely entirely on type inference.

Object Initializers

Object initialization can be a hassle. It can also be very verbose. Consider this example which creates a simple Customer object and initializes a number of properties.

var cust = new Customer();
cust.LastName = "Egger";
cust.FirstName = "Markus";
cust.Company = "EPS Software Corp.";

There is nothing truly wrong with this example, but C# 3.0 will provide a more convenient way:

var cust = new Customer() {
    LastName="Egger", FirstName="Markus",
    Company="EPS Software Corp"};

Once again, this feature is not only convenient, but it also has functional importance in statements that need to be executed in a single line. Consider this example:

return new Customer() {
    LastName="Egger", FirstName="Markus",
    Company="EPS Software Corp"};

Note that the list of initialized properties is completely flexible. You can name and initialize any public member in this fashion, and you only have to name the members that you want to initialize. In C# 2.0 you had to use constructor parameters to achieve a similar effect and that list of parameters always remained the same and you could only alter them through overloads.

Anonymous Types

C# 3.0 will introduce a rather radical feature called anonymous types. They allows you to create a new type (class) without ever explicitly declaring a class or even a class name. You can simply state, “I need a class with these properties” and the compiler takes care of the rest. Consider a scenario where you need a class with LastName, FirstName, and Company properties (similar to the Customer example above) which you will use to temporarily store that kind of information. Using anonymous types, you would write code like this:

new {LastName="Egger",
    FirstName="Markus", Company="EPS Software Corp"};

This statement is practically identical to the “new Customer()…” statement from the object initializers example, except for the omission of a class name. This statement tells the compiler that you need an instance of a class with the three specified properties. You don’t care about the name of that class or any other details, and your code never creates such a class anywhere (unlike the Customer class in the previous example, which you had to explicitly create). The compiler can cater to that need by creating a class with these properties as well as internal fields that can store the values of those properties. That class will be assigned a name internally but that name is never visible to you (hence the name “anonymous types”).

Note that in this example, the types of the individual properties are inferred based on the property values. You also need to use type inference in order to assign this new instance to a variable so you can use it later on. Since the name of the type is unknown, you could not create a meaningful type declaration (other than the dreaded “object” which really is a meaningless type). However, type inference can figure out the internal type of this object instance and thus preserve type-safety.

var cust = new {LastName="Egger",
    FirstName="Markus", Company="EPS Software Corp"};
Console.WriteLine( cust.LastName );

Lambda Expressions

Lambda expressions are an evolution of C# 2.0’s anonymous methods, which in turn are an evolution of delegates.

You can use delegates for a lot of things. For instance, all events in .NET are implemented through delegates. Delegates simply allow the developer to define code (methods) and then point other objects at that code for execution. Developers can create event handler code and then link it up to a button’s Click event. A delegate handles the process of “linking up the code.”

Another example scenario for delegate-use is the ability to create very generic code that can be extended through other algorithms. Consider a scenario where you have a list of customers and you want to select all customers that comply with a certain criteria. For example, you could create a method that returns customers with a certain name.

public List<Customer> GetCustomers(string companyName)
{
    List<Customer> allCustomers = GetCustomerListSomewhere();
    List<Customer> wantedCustomers = new List<Customer>();
    
    foreach (Customer customer in allCustomers)
    {
        if (customer.CompanyName.StartsWith(companyName))
        {
            wantedCustomers.Add(customer);
        }
    }
    return wantedCustomers;
}

In addition, a second method could return customers from a certain country. A third method could return customers by name and country, and so forth. You might even have other versions of the same methods that do not use a StartsWith(), but match the properties exactly. The problem with this approach is that it is very inflexible. The list of methods grows very rapidly to account for as many possible combinations as possible, yet it still is not possible to account for all variations.

You can solve this problem by letting the developer who calls the method declare the algorithm or expression used to filter the customers. You just need to allow a delegate as a parameter. This delegate simply accepts a customer object as the parameter and returns true or false, depending on whether the result should include that customer. The equivalent method that accepts such a delegate is simple to implement:

public List<Customer> GetCustomers(GetCustomersDelegate criteria)
{
    List<Customer> allCustomers = GetCustomerListSomewhere();
    List<Customer> wantedCustomers = new List<Customer>();
    
    foreach (Customer customer in allCustomers)
    {
        if ( criteria(customer) )
        {
            wantedCustomers.Add(customer);
        }
    }
    return wantedCustomers;
}

Of course, you also have to create the delegate that you’ll use as the parameter. This is a simple step and very similar to the creation of a method without a body:

public delegate bool GetCustomersDelegate (Customer cust);

You can now use this delegate to point at any method that accepts a customer object as its only parameter and returns true of false. For example:

private bool IsUSCustomer(Customer cust)
{
    if (cust.Country == "USA")
    {
        return true;
    }
    else
    {
        return false;
    }
}

Now, all the pieces are in place and they only need to be put to work. To do so, you must first instantiate the delegate and point it at the IsUSCustomer() method. You can then execute the GetCustomers() method and pass along that delegate. The GetCustomers() method then executes the delegate inside the for-each loop and thus checks each customer for inclusion. The actual decision is completely up to the method the delegate points to, which includes customers from the US. Here is the implementation of this example.

GetCustomersDelegate func = new GetCustomersDelegate(this.IsUSCustomer);
List<Customer> customers = this.GetCustomers(func);

You can also use the same GetCustomers() method to query a different list of customers, but create another method:

private bool IsUSAndLCustomer(Customer cust)
{
    return (cust.Country == "USA" &&
        cust.CompanyName.StartsWith("L"));
}

If you point the delegate at this method, you’ll get all the customers from the US who’s company name starts with an “L”.

What you’ve created so far is very powerful since you have complete freedom in determining which customers to include in the list. The problem with this approach is that it is somewhat cumbersome because you always have to create new methods that check each customer and then instantiate a delegate and point it at that method. (Not to mention that everyone using the method needs to understand delegates.) It would be much more convenient to simply pass the desired code as the parameter. This is where C# 2.0’s anonymous methods come into play. This snippet shows how you can call the GetCustomers() method with an anonymous method:

string country = "USA";
List<Customer> customers = this.GetCustomers(
    delegate(Customer cust)
    {
        return (cust.Country == country);
    } );

In this scenario, the actual method that performs the check is simply declared inline, right as the parameter of the method call. Note that it is also possible to access the local “country” variable in this example, which would have been difficult with the delegate version above.

Anonymous methods are very cool in a geeky sort of way since they make it possible to pass code rather than values as the parameters of a method call. However, as the above example illustrates, they are also a bit cumbersome and not completely natural and intuitive. They are better than delegates in terms of maintenance and even functionality, but they are only marginally better in terms of readability and developer knowledge.

Lambda expressions take the concept of anonymous methods to the next level. They simply say “often, the only meaningful aspects of anonymous methods are parameters and return values, so why not omit everything else?” The result of this line of thought is a simplification of the above example.

string country = "USA";
List<Customer> customers = this.GetCustomers(cust => cust.Country == country);

Input parameters are specified first, separated from the return expression (or a complete message body for that matter) through the “=>” operator. Note that type inference figures out the type of the parameter(s) as well as the return type. (In some scenarios with multiple overloads the situation can be ambiguous, and thus require an explicit type statement.)

So is this done often and does it work well in the “real world”? Consider an example with slightly different object and method names:

Customers.Where(cust => cust.Country == country);

With a little bit of fantasy, (ignore the “cust” input parameter), this starts to look like the where-clause in a T-SQL select-statement. In fact, this technique is a fundamental building block of query languages.

Lambda expressions are not really a C# innovation. They have been available in a number of other languages. This does not take away from their usefulness.

Expression Trees

Since I talked about lambda expressions, I also have to talk about expression trees. Lambda expressions come in two flavors: Those that have a simple return expression (like the example above), and those that have complete method bodies. (Note that lambda expressions are a superset of anonymous methods and support everything anonymous methods do, including the ability to define method bodies.) Lambda expressions that do not have method bodies can be compiled as intermediate language (IL), but they can also be compiled into a pure data representation of themselves. Whenever lambda expressions are compiled to data, they are called “expression trees.”

In the lambda expression example above, the lambda expression was compiled to regular IL code because it was used as code. However, in other scenarios, the compiler could have turned it into data that represents the expression in a more abstract fashion. The individual pieces of data in the expression tree would then be the two sides of the expression (“Country” and “country” respectively) as well as the comparison operator (“equals”). More complex lambda expressions result in significantly more complex expression trees.

The beauty of storing lambda expressions as expression trees is that they become language independent. For instance, an expression tree can be sent to SQL Server, which can turn it into valid T-SQL.

In short: Expression trees are a nifty way to write C# code, yet execute it in environments that do not understand C#, such as T-SQL.

Extension Methods

Inheritance is an awesome mechanism. It allows developers to create a new class that is just like any other class, and then add something such as a method. However, inheritance does not work in all cases. What if you want to add a method to all string objects used in your application? You cannot accomplish this with inheritance, but extension methods solve this problem.

Extension methods are static methods defined on a static class. Whenever that class is in scope, its methods appear on all other types as if those static methods were, in fact, declared on the type itself. All extension methods take a reference to the actual type as the first parameter. That parameter is not visible when the method is actually used.

Consider a simple example where you would like to add a ToXml() method to all objects. You can do so with the following extension method:

public static class EM
{
    public static string ToXml(this object obj)
    {
        return "<value>" + obj.ToString() + "</value>";
    }
}

Whenever this class is in scope (by way of a “using” statement), you can use the ToXml() method on any type as if it was a member of the type:

string hello = "Hello World!";
hello.ToXml();

Of course, if a type already has a method of the same name and signature as an extension method, the class’ method will not be replaced. Extension methods are only added when there is no conflict.

Is this still proper object-oriented development? On the surface, object rules certainly seem to be bent a little bit. Under the hood however, object rules are preserved perfectly, since the compiler really turns the two lines of code above into a construct identical to this:

string hello = "Hello World!";
EM.ToXml(hello);

LINQ

LINQ stands for “Language Integrated Query” and adds a query language somewhat similar to T-SQL to C#. The main difference is that LINQ operates on objects and not on data. This seemingly small difference has major implications in practice and turns LINQ into a much more powerful language than T-SQL.

LINQ is too large a feature to deal with in just a few paragraphs. Nevertheless, here is a quick LINQ example.

var result =
    from cust in Customers
    where cust.Country == "USA"
    select new {LastName = cust.Name, cust.Country};

This returns a list of new objects with two properties each. This list of objects is populated with name and country information from an original list of customers whose country is set to “USA.”

This single statement includes many of the new C# features described above. In many ways, LINQ is combination of objects and methods, as well as a text processing compiler that turns the above statement into standard object syntax like this.

var result = Customers
    .Where(cust => cust.Country == "USA")
    .Select(new {LastName = cust.Name, cust.Country});

This is functionally equivalent to the previous statement. At this point, the mystery is gone. The statement creates a variable named “result” whose type you do not know, but the compiler does. The Where() and Select() methods are extension methods that are available on all objects whenever LINQ objects are in scope. Both methods use lambda expressions as their parameters. The Select() method uses anonymous types and object initializers to create a new object for every object in the result provided by the Where() method. A list of these new objects is the ultimate result set, the type of which is inferred using type inference. Note that “result” is always a list of some type, since multiple objects will (or at least can) be returned from this statement. Type inference will thus always arrive at an object list that uses generics to specify the type of the objects in the list. The generic type is the anonymous type created by the “new” statement.

Of course, this simple example doesn’t even scratch the surface of what’s possible in LINQ. Stay tuned for my next eColumn as well as other articles.

Conclusion

I believe C# 3.0 will be slick. In a way, LINQ is the ultimate proof how slick C# 3.0 really is. Many new features that have been brewing for several months and years now come together to form a completely new language, yet the core language didn’t have to change. Some features were streamlined (delegates and object initialization) and a few features have been added without interfering with existing features (extension methods, anonymous types, and type inference). And none of these features interfere with C#’s primary goal of being a high-quality, strongly-typed language that you can use at extremely high productivity levels.

C-Sharpest

Published in:

Filed under: