In this article I will share some of my thoughts about validating the data against regular expressions in SQL Server 2005. I will back this article up with sample CLR function that performs regular expression validation on given data and given pattern and returns True of False depending if validation succeded or failed.
Introduction:
It is normal thing to validate user input on the client side as well on the server side (In the SQL Server database) before the data enters in the database. It is fairly simple task to do so, that is, it is simple task in most of the cases. If you simply have to validate the data such as if given data is numeric or date or if given data falls within specified range or such, there is little or no challenge to validate this data quick and simple. You can simply check the data using ISNUMERIC, ISDATE or whatever SQL Server allows you to do with T-SQL. But what if you have to check if given data is valid e-mail address or valid telephone number or web address ? In this case things gets little complicated. At least things were complicated up until SQL Server 2005.
The Old Way:
In SQL Server 2000 if you wanted to validate some data against regular expressions was to:
a) Write extended stored procedure
b) Write COM based library and use it in stored procedure
c) Write your own stored procedure to do the task
So, given the options above I came to following conclusion:
To write extended stored procedure is not a simple task even for experienced developer. You have to write it in C++, you have to follow strict rules and if you're not carefull enough you can end up with SQL Server crashed or some other catastrophic event due to memory leaks etc.
If you write COM based library and use it in stored procedure you can get into troubles with memory leaks if you forget to release object references used inside the library or in your stored procedure. You'll have to install and register the component and you could simply get into infamous dll hell problems. And still you could crash SQL Server as effectively as with extended stored procedures. And COM based components in the days of .NET is not the best way to go anyway.
Writting your own stored procedure/UDF to do the task will end up with performance issues due to extensive string parsing you'll have to do in order to validate e-mail address. This should not be the problem if you have to validate only one e-mail address at the time, but if you have to validate large resultset then it could get you into the trouble.
The good news are that all of this isses mentioned above are gone by introducing simple yet fast and elegant solution: CLR inside SQL Server 2005.
The New Way
In order to create the function that will validate any given data against given pattern you'll have to write a scalar CLR based User Defined Function that uses functions offered inside System.Text.RegularExpressions namespace.
To create the function first we have to open Visual Studio 2005 and create new SQL Server Project. After creating new project we'll right click the project name in solution explorer and select Add/User-Defined Function from the pop-up menu.
The code that Visual Studio generates for us will look something like this:

Now we'll change function type to Boolean, add two parameters to the function: pattern and text and add the code that will perform validation. Pattern parameter will be used as regular expression pattern that we want to be used to validate the data. Text parameter is the data we want to validate. Now the code will look something like this:

There are two ways how you can test this function. First way is to right click on project name and select "Deploy" option. After deploy finishes you can use SQL Server Management studio to execute this statement:
SELECT
dbo.fn_ValidateRegExp('^(([a-zA-Z]([a-zA-Z0-9-_]*?)(\.[a-zA-Z]([a-zA-Z0-9-_]*))*?)@([a-zA-Z]([a-zA-Z0-9-_]*?)(\.[a-zA-Z]([a-zA-Z0-9-_]*))+?))\s*$','mymail@yahoo.com')
As you can see the first parameter is Regular Expression pattern that validates e-mail address and second one is e-mail address you want to validate. This statement should return 1 which means that e-mail address is ok.
But this one should return 0 because given e-mail address is not valid:
SELECT
dbo.fn_ValidateRegExp('^(([a-zA-Z]([a-zA-Z0-9-_]*?)(\.[a-zA-Z]([a-zA-Z0-9-_]*))*?)@([a-zA-Z]([a-zA-Z0-9-_]*?)(\.[a-zA-Z]([a-zA-Z0-9-_]*))+?))\s*$','mymailyahoo.com')
The second way is to use test.sql file which is located in Test Scripts foder of the project. You can enter select statement in it and run the project. Now you should be able even to debug your CLR function. But debugging of CLR procedures is not in the scope of this article so i'll not write about it right now.
Conclusion
Using CLR inside the SQL Server 2005 have many advantages over the old way of doing it. The most important ones are:
- You cannot crash SQL Server by using CLR code
- You cannot get into memory leak problems because garbage collector will do the cleaning for you
- You can potentially speed up execution of queries in case of extensive string based or heavy Math operations because of compiled code (instead of interpreted like t-sql)
- You can use your current VB or C# skills to write the functions you need.
- Some other that I cannot think of right now but I'm sure they exists :-)