In this post, I want to discuss some aspects of data retrieval with Linq-To-Entities and Entity Framework from a SQL database. Entity Framework is the most commonly used object relational mapper for .Net applications and it is great because in enables you to create complex data queries without having to write the SQL statements by hand. However, in this helpful automation also lies a danger of Linq generating very inefficient SQL that can potentially cripple the performance of your application. In my own experience, poorly structured queries are often a bottleneck for the performance of your web-application. Therefore, they are an especially relevant thing for developers to take good care of.
Consider, for example, the following (somewhat contrived) piece of code.
What a mess! The idea here is to retrieve some information concerning a company from the database, including some information of the users and laboratory locations attached to that company. The information retrieved here is arbitrary, it just servers as an example. The AllUsers(), AllLocations() and AllCompanies() methods just retrieve a set of those entities from the database using Entity Framework.
I use this as a starting point for this article, because it demonstrates practically everything that a programmer could do wrong when using Linq-To-Entities. I will be spending the rest of this write-up to point out and subsequently fix the problems with it. By doing so, I hope to make some valuable points on correctly using Linq-To-Entities and the Entity Framework.
Point 1: Only put data in memory that you actually need.
Focus on this section:
The call to ToArray() here queries the database for all known companies, selects all the data in all the columns of the Company table and then puts all that data into an array object. Wow. Not only does this not make use of the delayed loading capabilities of entity framework, it also retrieves much more information than we actually need. We are only interested in the company name, it’s laboratories and its users and not in all the other information that the company table might hold.
It is sometimes tempting to use a method like ToArray() prematurely when you want to use methods on your object-to-be that Linq to Entities does not support but your actual object does. Try to prevent this because it is almost never necessary.
Note that calling is fine, because of how Entity Framework works: The underlying query will not execute until you actually do something with the users, like putting them in an array.
Point 2: Avoid querying the database inside loop structures.
Focus on this section:
In the way the code is set up currently, both of these lines will result in database queries and these queries are individually executed for each company in the database, because they are inside the companies.Select(c =>.. ) statement! You definitively never want that. This kind of structure to retrieve data gives horrible performance even on relatively small data-sets. It is a surprisingly easy mistake to make and definitely worth looking out for.
Point 3: Avoid querying multiple times when one query would suffice.
This one is sort of related to the second point. Because this is not well represented by my initial example, I’ll refer to a stack overflow post by someone nicked Moby Disk instead:
Consider the following example:
Note that the query will be repeated here for both foreach statements. A better approach would be to use a single select to retrieve both the Name and OwnerName. (Much) worse on the other hand, would be to do a ToList() beforehand because of reasons explained above.
Let’s refactor our own code to something a little more sensible now:
This is a piece of code I might realistically write if the company data in the example would be what I need to retrieve for, let’s say, a controller endpoint. A couple of things go right this time:
-We retrieve only data that is really needed into memory (when calling SingleAsync()).
-There are no new queries inside loops.
-The code is fairly easy to read and understand because of its tree-like structure with the company table as ‘entry point’.
Point 4: Don’t query too deep and don’t forget to use joins when looking for optimal performance.
The second version of the code will work ‘o.k.’ for most situations but is still not optimal in terms of performance. It is easy to read and work with but if working with large data-sets you probably want to be using something like this instead:
This is a little more verbose and returns a list of locations (laboratories in this case), including their company and user information. By (a) explicitly telling where to join the tables via the Join() methods, by (b) not querying more than one layer deep (user.Roles and user.Companies is as ‘deep’ as we go) and by (c) selecting no more data than we actually need, Linq-To-Entities will usually be able to generate the most accurate SQL and thus work as fast as possible!
Hopefully this post will help you next time you find the performance of your Entity Framework powered data retrieval lacking, or help you spot common pitfalls when working with Linq-To-Entities!