Basic Knowledge in Microsoft Technology and Data Science with Python: December 2009

While the title of this article may seem misleading, you actually can achieve incredible performance gains by updating your web application to use LINQ. This article will dispell some common myths about LINQ and demonstrate how LINQ can practically increase performance. The examples in this article will be using LINQ to SQL, but the principles apply to LINQ to Entities as well.

First of all, it's important to realize what LINQ is and what it isn't. LINQ is a language feature you'll see in Visual Studio 2008 (Visual Studio Orcas) and beyond. It's not dependant on any particular .NET language as you can use it in C# or VB.NET. LINQ stands for "Language Integrated Query", but its usage goes beyond language capabilities alone.

It is true that the language speeds up development by allowing you to use a simple query-syntax in C# or VB.NET to query data from a HashTable, XML file or SQL Database. And it's also true that you can have Visual Studio 2008 write tons of code for you to build classes for your data tables (or even to build SQL tables from your classes). However, this article will demonstrate runtime performance gains you'll see by using some built in features that come with the language as well.

The benefits we'll discuss are:

Intelligent query translations (such as faster data pagination).
Redundant query checking to prevent needless data hits.
Redundant update checking to ensure minimal data transfers.
Intelligent Query Translations
One of the built-in benefits of using LINQ is that methods that are deemed "best practice" are used in the underlying technologies. For instance, if you were to use a standard ASP.NET 2.0 SqlDataSource, and you bound that data to a GridView with pagination turned on, you might not realize that you're transferring your entire data table just to use a few rows at a time in your paginated GridView. This results in some heavy queries, heavy data transfers and often times slow page loads.

The best place to do pagination would be on the SQL server itself. That way you are only transferring the few rows that are desired, saving you much network bandwidth and time. Ultimately, LINQ isn't doing anything that you couldn't do yourself, but most developers do not know how to do Pagination In SQL Server 2005, or they may not have time to change all of their queries to do so.

Again, if you take that simple "SqlDataSource" example above, but this time bind two ASP.NET controls to it (let's say a GridView and a DropDownList), then you'll be hitting your SQL server twice, and download all that data twice. This brings us to the next performance gain in LINQ.

Redundant Query Checking
Some of the biggest performance gains that I've benefited from by switching to LINQ is it's ability to look at the "context" of your request to determine if it needs to hit the database again. If you've already started to use LINQ yourself, but you don't know what I'm talking about, then you'll probably have a new appreciation for LINQ in a few minutes.

LINQ uses a "DataContext" object to determine what connection string to use, what objects to map to and other internal things. But you should be careful not to think of the LINQ data context class as you would think of a traditional "SqlConnection". If you are transitioning from using SqlConnections and SqlCommands you'll probably code something like this:

// Example of doing your own straight SQL query:
public static IEnumerable GetMyData()
{
using (SqlConnection myConnection = new SqlConnection(myConnectionString))
{
using (SqlCommand myCommand = myConnection.CreateCommand())
{
myCommand.CommandText = "SELECT * FROM dbo.myTable";

using (SqlDataAdapter myAdapter = new SqlDataAdapter(myCommand))
{
DataTable resultTable = new DataTable();

myAdapter.Fill(resultTable);

return resultTable.Rows;
}
}
}
}
If you translated the above method into LINQ, then your method might look something like this:

// Example of using LINQ to query the "Clients" table:
public static IEnumerable GetMyData()
{
using (MyDataContext myContext = new MyDataContext())
{
return from myRow in myContext.Clients select myRow;
}
}
This at first seems like an acceptable conversion from using a SqlConnection to using LINQ, but you're missing out on so much (which you'll see in a minute).

This is pretty much a 1 to 1 switch from writing your own SQL query to writing a query in LINQ. It is noteworthy that it takes a lot less code to query a database. However, these two code blocks will do the same thing, and you won't gain any performance from calling the LINQ method instead of calling the straight SQL method. Even if you call both of these functions twice in a row, you won't gain any performance because we are not utilizing the LINQ context correctly.

If we extend the range of our context to last for the entire web request (not just for the life of the function call) then we will begin to see performance increases. Before we examine how to do that, let's look at this next example that should help explain how the context works. We'll change our above method, "GetMyData", so that we can pass in our own context. Here's the code:

// Change the GetMyData method to take in an existing context.
public static IEnumerable GetMyData(MyDataContext myContext)
{
return from myRow in myContext.Clients select myRow;
}

// This will demonstrate how we are going to gain performance with LINQ:
protected void Page_Load(object sender, EventArgs e)
{
// First, we create our data context just like we did before.
MyDataContext myContext = new MyDataContext();

// Calling this function the first time will hit SQL and query the data.
GetMyData(myContext);

// LINQ is smart enough to realize that we've already used this query
// against this data context, so it won't hit SQL again. Instead it will
// simply return the same result set as last time.
GetMyData(myContext);

// This is a bit ridiculous, but you get the idea.
for (int i = 0; i < 100; i++)
{
GetMyData(myContext);
}
}
The way to achieve the performance benefits of using the same data context during the entire page request is to store your data context instance in the HttpContext Items collection (which is alive for the duration of the thread). We'll make a static property that will give us a data context instance, but because we are using the current HttpContext, we won't have cross thread issues. You can read a little more about the HttpContext in this article: Connect Your Site With HttpContext Items. Let's look at the code to see how to achieve this:

public static class DataAccessLayer
{
// This object is a "key" to the HttpContext Items dictionary.
private static object _eelsContextKey = new object();

// This will always return an instance of the MyDataContext
// (which was created by a LINQ wizard in Visual Studio 2008). It
// will return the SAME instance if you call it twice during the
// same HTTP request.
private static MyDataContext Context
{
get
{
if (HttpContext.Current.Items.Contains(DataAccessLayer._eelsContextKey) == false)
{
MyDataContext context = new MyDataContext("Connection String Here...");

HttpContext.Current.Items.Add(DataAccessLayer._eelsContextKey, context);

return context;
}
else
{
return (MyDataContext)HttpContext.Current.Items[DataAccessLayer._eelsContextKey];
}
}
}

// Now we can make very simple DataAccessLayer functions like these:

public static IEnumerable GetClients()
{
return from myRow in DataAccessLayer.Context.Clients select myRow;
}

public static IEnumerable GetProjects()
{
return from myRow in DataAccessLayer.Context.Projects select myRow;
}

public static IEnumerable GetThePoint()
{
return from myRow in DataAccessLayer.Context.You_Get_The_Point select myRow;
}
}
Let's now look at the last area of performance gain from using LINQ that we will discuss.

Redundant Update Checking
The last place where you can see performance gains is with writing back to your data source. The LINQ data context not only remembers if you have already queried some data before, but it also keeps track of any changes that are made. This means that if you call the "save" function, but no changes were made first, then LINQ will ignore the request to save. Likewise, if you only change one property (such as "FirstName"), then LINQ will build an UPDATE statement that only contains that one column (again saving time and bandwidth).

Basic Knowledge in Microsoft Technology and Data Science with Python

Pages

Monday, December 7, 2009

The benefits Of LINQ

Tuesday, December 1, 2009

Hiding SQL code