Do you ever feel like you’re beating your head against a wall?
I know I do; quite often, in fact. It seems like developers spend half of their time bending technology to their purposes when the technology doesn’t really quite fit. Well, I’m actually thinking of one problem in particular right now, namely that of validation. Can you think of a more boring topic? There are a few, but I think you can agree that it is an extremely important one in business software.
And yet it plagues us and has plagued us for years and years. Why? Because we’ve been thinking about it all wrong, or at least partially wrong-enough wrong to be a pain in the behind. The pain all started with the grand idea that validation should be done in the database schema; that truth statements about the validity of data are always true.
Business applications are about automating processes.
I can think of two problems with the traditional approach to data validation. First is that it resides in the wrong layer(s) in the application (that is, in the database or, worse, the UI). Second is the aforementioned error that there is one valid schema for a set of data.
Putting Validation in its Proper Place
I’ve said it before, and I’ll say it again: business applications are about automating processes. Too often, we think in terms of the data that we’re dealing with, but data is only important in as much as it supports the process that we’re trying to automate. So what does that tell us about the proper place of data validation?
You guessed it! Data validation should occur as part of the process in which it participates. “But what does that mean? After all, I’ve got some data validation occurring on my form, some in my business layer, and some final validation (via constraints) in my database. If my application is automating a process, and I’ve got validation sprinkled throughout that process, isn’t my validation occurring in my process?” you say.
Okay, now you’re being clever-too clever for your own good. Yes, I’ll be the first to say that even something as simple as directly updating a record is a process (or a workflow, if you prefer). Process is everywhere in a business, and our applications participate in it even when we don’t think about it per se. But my point is that we need to be thinking about everything our applications do as part of a process.
Once we do that, we’ll start thinking about the various pieces of our application in a way that makes good business sense. I’m talking about reducing the impedance mismatch (to steal a term from the object-relational mapping world) between how we think about designing our applications and how businesses think about what they do.
Good object-oriented design will take us far. Domain-driven design goes a bit further, and service orientation can help us in the long run. But we need to realize that the underlying domain in business software is process-centric, and we need to start aligning our design with the business domain.
But what does this mean in terms of validation? It means that a set of data is valid only when it intersects with a point in a process. Therefore, our validation schemas should not only specify constraints and rules but they should also be linked to a point in the data’s lifetime. It means that we can’t design our constraints and rules in the database, nor should we stuff them into the UI tier, at least not directly. In fact, we probably shouldn’t even put them in our business tier or domain model per se.
Rather, data validation should be a separate layer, a service, that we can ask at any given point in a process: “Is this data valid right now?” Again, this implies that our constraints and rules tie themselves explicitly to data lifetime, so we need frameworks and tools that support this.
As it stands today, the best toolset and framework I’ve seen that could support an approach like this (apart from writing a custom one, of course) is Windows Workflow Foundation (WF). It has a built-in rules engine, and it has explicit support for state-based workflows (a.k.a., state machine workflows). It is very extensible, and it is being advocated strongly by the biggest software vendor in the world as part of their foundational technologies for the future.
This article really isn’t about WF, though. Perhaps in a future article, I or someone similarly inspired will create a working solution in WF that does what I’m suggesting, namely, joining business rules (including constraints, which I think of as part of business rules) and state machine workflow in such a way that a client can query the workflow to determine if the related data is valid in its current state, the state being the delimiter in the data’s lifetime in which validation rules can be inhered.
Hopefully, you can imagine how you can still have a domain model, with all its behavior-based objects that participate on a more finely grained level in the overall process. For instance, one of the domain object’s methods might move the object (and related objects) from one state in a workflow to another and, in the process (no pun intended), ask the workflow if it is valid in the new state.
With such an approach, you more closely mirror the truth of business processes, namely that data validity is point-in-process-sensitive.
How Does the (Relational) Database Fit In?
I honestly am starting to feel that relational databases are overrated when it comes to applications. For some time, I’ve been in the camp that thinks databases are just a necessary detail in application design-we need to persist the data somewhere, and hey, relational databases are a good way to do it because they’ve got so many tools to get at the data for other purposes.
We need to realize that the underlying domain in business software is process-centric.
The problem is that thinking like this leaves us thinking that our relational database is, first, the right place for other applications to get at our data and, second, that the database should be the center of the application’s world. This latter misconception is quite rampant among developers today in my experience and is, as I alluded to in my “I Object” article (CoDe Magazine, Jan/Feb 2006), contributed to by the recordset mentality that was popularized during the VB and ASP era.
Once we allow for the possibility that the database is not and indeed should not be thought of as the center of our application, it allows us to think more in terms of how the business thinks, chiefly how to get things done. Put another way, it enables us to think in terms of the business process rather than the business data.
If we allow ourselves to think this way, in a process-centric way, we see how the database, like the data, should serve the application, and not the other way around. This is a truly revolutionary way to think about applications if you come from a data-oriented background. Just stop for a minute and let it sink in and ponder the implications.
One implication is that relational databases may not be the best solution for data persistence in your application. Maybe an object database would be better, or maybe just POXML (plain ol’ XML) would work. For instance, I can tell you that my blog software, dasBlog, does just fine using XML as its backing store, and folks far more prolific and popular than I use it as well without a problem.
Even SQL Server 2005 has made XML a first-class citizen, which speaks to the validity of storing data as XML. In fact it is so good at XML, it is tempting to not even bother splitting the data out into separate columns. And the .NET Framework itself treats XML very, very well, which leads one to think twice that maybe XML is an okay medium for persistent storage.
The point here is not to talk about the virtues of XML, though they do abound. It is simply to suggest that there are other viable alternatives to an RDBMS for application data persistence. In fact, one might suggest that the alternatives play nicer in an application world, particularly in a distributed and service-oriented world.
I’m not about to herald the demise of the RDBMS, though. Even if we all agreed that this technology is over the hill, it will still be with us for an extremely long time. At the same time, maybe it is time to seriously consider alternatives in your application design.
If you do want to use a relational database with a process-oriented application, consider taking advantage of XML even if it is within the context of the RDBMS. And if you still want to break out your objects’ individual properties in the database, I suggest the following:
Everything that can possibly be nullable should be. This goes back to putting the validation where it properly goes-in the validation layer/service.
The only constraints you should have should be relational, i.e., those proper in a relational database. They shouldn’t be enforced or checked; they should really just be there for informational purposes; otherwise, you’re putting validation where it shouldn’t be.
Fashion your data model as closely to your object (domain) model as possible. As Ted Neward summed up in a controversial blog post that likened the Vietnam War to object-relational mapping, ORM is just not clean, architecturally speaking. If you can’t avoid it altogether (as I have suggested above), minimize the impedance mismatch.
Some good reasons to countermand these guidelines:
Many, many records (in the millions or more) being reported on in real time means that you need the most efficient data retrieval for data that your application actively modifies. As far as I know, a relationally-designed database is still the best choice for a situation like this.
You have a DBA that makes you do things in a specific way or controls the design himself. If he or she can’t come up with a good reason other than that it is an unfamiliar approach or against policies, consider pressing your point about it if you can. If they have valid reasons that make sense to you, go with the flow.
You have an existing database you have to use. Here, I’d at least consider the possibility of having a transient data store to use during application processes and only stick it in this final reference store when the process is complete, if it does complete. Otherwise, go with the flow.
Note that these guidelines only apply to application databases, meaning those designed expressly as a persistent store for data used in an application. I think we too often try to multi-purpose databases when having multiple databases would make more sense.
In particular, I’m thinking of reporting and business intelligence. If you’re not familiar with BI, I strongly suggest you gain at least a passing familiarity with it. BI is an extremely important aspect of business automation that isn’t best served directly off of a transactional business application. By this I mean that you should consider creating a real data warehouse for your application and do reporting and analytics off of it, not your transactional store, which should be basically a persistence medium for a process-driven application (as I’ve been talking about up to this point).
If we design our applications to persist data in the manner that I am advocating, we won’t be pushing validation checks off to what is arguably a necessary evil, that is, persistent storage. Rather, we will be enabling our application to proceed in a way that makes sense to the process and validate data only when it needs to be validated and how it needs to be validated within that process.
How Does the UI Fit In?
It is axiomatic that we want validity checks to occur as close to the user as possible, and this is why we often duplicate validity checks in the UI layer itself using validation controls and the like. We don’t want users to have to wait long to know that what they’ve given the application is valid (or invalid); we certainly don’t want them to walk away thinking that they’ve given valid data when they haven’t.
Once we allow for the possibility that the database is not and indeed should not be thought of as the center of our application, it allows us to think more in terms of how the business thinks, chiefly how to get things done.
In other words, we definitely want to have as much validation as possible occur on the “client side.” So how can we make this mesh with a separate validation layer? This goes back to my dual-naming of the validation layer as a service. We need to be able to call this layer either directly or through the domain model to ask if the data as it stands now is valid. If it is a WCF service endpoint, a standard Web service, or a direct method call-whatever it is-we need to be able to ask it from the UI what is valid and what is not.
If you were using WF as I suggested earlier, this might imply a child workflow that is specific to a UI flow that has its own set of rules attached to the various points in the UI flow. If you don’t need that extra granularity, you can just ask the main workflow if the document is valid. The nice thing is that WF is flexible enough to make such an approach work.
If that isn’t responsive enough, you could add a client-side validation generator to the framework to attach appropriate validation controls to the UI based on the current point in the process. But I tend to think that’s over the top. Even with Web applications, you can use asynchronous server requests to your validation service while maintaining a usable interface for your users without going to the extra trouble of duplicating validation in the UI layer itself.
How Does SOA Fit In?
So you have this application that has a domain-driven-designed, behavior-based domain model. You have a persistent data store that plays well with your domain/process-driven design, and you’ve written your UI such that it takes advantage of your validation layer without duplicating logic. So what about this SOA thing?
It should be clear by now that the validation layer could (and probably should) be exposed as a primary service in your application. But contrary to some approaches to designing services, the messages should be loosely-schema’d. This means you need some basic XML payload that will contain the data to be validated that has virtually no schema attached to it, some kind of identifier for the data, and some kind of identifier for the instance of the process (e.g., a WF workflow ID). This should be sufficient for the validation service to find the appropriate workflow, ascertain its state, and request validation of the XML payload it was given.
You might also expose particular services that are specific to certain objects during certain points in their lifetime. These can adhere to the more strongly-schema’d contracts because they are both object and point-in-process-specific. These are also more likely to be the kinds of services that you expose to other applications.
Where to Next?
Most everything I’ve discussed thus far is reasonably achievable with Microsoft technologies as they stand today. But I do think that we could do better. Maybe we’re limited by the generalness of the general-purpose languages that are far and away the most popular in the business market (Visual Basic, Java, and C#), so maybe it’s not an option to change those to be more process-centric.
As it is now, even for a standard approach to workflow, things are far too complicated. While it is good that WF is an F (foundation), it would be better if process awareness was built into the language and even the runtime itself, such that we’re not learning a bunch of abstractions on top of GPLs and general-purpose runtimes. What I’d suggest is workflow and point-in-process-specific schema become easy to define using language constructs and easy to interact with and query.
If changing the GPLs and CLR is not an option, how about a language for business applications that has the features I’m hinting at? Even if we have to stick with frameworks and foundations like WF, how about a language that makes their usage integrated? I think something like this would be ideal and would be a big step forward in how we think about, design, and implement business applications.
Of course, this will never go anywhere if I’m the only one who thinks this is the right way to go. For all I know, I may not be the only one thinking this way. I’m certainly not intentionally stealing anyone’s thunder, though I’m sure all of this comes out of a jumbled blend of my own experiences, critical thinking, and what I’ve heard and read others thinking over the years. I’m sure the more erudite readers will be happy to point out similarities in what I’m suggesting with what others have suggested, but hopefully there is something useful and original here, if not in the content then in the packaging.
This is not negating what we have now in terms of architectural approaches, methodologies, or tools.
In any case, I believe that any change in our industry in such a direction would have to be a grass roots effort that convinces a vendor with enough influence to produce tools to support this way of thinking about application design. Microsoft has proven itself to be responsive to developer customer input, and they’ve taken important steps in this direction with WCF and WF. If you think something like this sounds good, talk it up and spread the word-make sure other people know about it so that we can get the rest of the industry on board. Process-driven design is, I think, the next evolution we need to adequately address business needs.
Much of the foundations are in place; this is not negating what we have now in terms of architectural approaches, methodologies, or tools. In fact, it fits very nicely with agile development practices that have been rightly gaining popularity. It could easily fit in with test-driven design as well. I see all of these fairly recent advances in our industry (again in terms of architecture, software development life cycle, and in tools and frameworks) to be important and necessary foundations on which a process-driven design paradigm could be erected.
I’d love to hear what you think, so please either comment on this article online or send me an e-mail, and if you think it is worthwhile, blog about it, tell your friends, and just generally spread the word.