Saturday, April 13, 2013

Entity Framework: IQueryable vs. IEnumerable

Many of us sometimes get confused of different aspects of using IEnumerable<T> and IQueryable<T> in Entity Framework. One of considered opinions is that Entity Framework and DbContext and deffered execution just do all the magic and there is no difference. But even though LINQ query result remains basically the same, it may significantly impact the query performance.

Code sample

I'm going to create a simplified database consisting of one table with million records with Entity Framework Code First.

Model

There is a POCO class and the context:

public class Foo
{
    [Key]
    public int Id { get; set; }

    [Required]
    public int Random { get; set; }
}

public class FooContext : DbContext
{
    public DbSet<Foo> Foos { get; set; }
}

Populate database

There is a script, which populates data to our "database". There's also an index created on Random field after the data is inserted:

declare @i int = 1

while (@i < 1000000)
begin
    insert into Foos(Random)
    values(RAND()*1000)

    set @i = @i + 1
end

create nonclustered index IX_FOO on Foos (Random asc)

Database initializer

The script is put into the project as an embedded resource and is run in database initializer. As populating of million records takes time, I also increased command timeout to 15 minuntes.

public class FooInitializer : DropCreateDatabaseIfModelChanges<FooContext>
{
    protected override void Seed(FooContext context)
    {
        ((IObjectContextAdapter)context).ObjectContext.CommandTimeout = 900;

        var resourceStream = Assembly.GetExecutingAssembly()
            .GetManifestResourceStream("EntityFramework.Querying.DataModel.PopulateDatabase.sql");

        using (var reader = new StreamReader(resourceStream))
        {
            context.Database.ExecuteSqlCommand(reader.ReadToEnd());
        }
    }
}

And initializing our database explicitly:

private static void InitializeDatabase()
{
    Database.SetInitializer(new FooInitializer());
    using (var context = new FooContext())
    {
        context.Database.Initialize(false);
    }
}

Queries to compare

Now, let's create two methods with the same query, but one receives IEnumerable<T>, another one IQueryable<T>

private static IEnumerable<Foo> WhereQueryable(IQueryable<Foo> foos)
{
    return foos.Where(foo => foo.Random < 1);
}

private static IEnumerable<Foo> WhereEnumerable(IEnumerable<Foo> foos)
{
    return foos.Where(foo => foo.Random < 1);
}

Performance measure

Now I'm going to measure performance of both methods with this helper method:

private static void ExecuteWithMeasure(Action<FooContext> action, string actionName)
{
    Console.WriteLine(actionName);
    var watch = new Stopwatch();
    watch.Start();

    using (var context = new FooContext())
    {
        action(context);
    }

    watch.Stop();
    Console.WriteLine("Completed in {0} seconds.", watch.Elapsed.TotalSeconds);
}

calling:

ExecuteWithMeasure(context =>
    {
        var foos = WhereQueryable(context.Foos).ToList();
        Console.WriteLine("{0} foos retrieved", foos.Count);
    }, "Querying Queryable...");

ExecuteWithMeasure(context =>
    {
        var foos = WhereEnumerable(context.Foos).ToList();
        Console.WriteLine("{0} foos retrieved", foos.Count);
    }, "Querying Enumerable...");

We can use performance profiler as well, but I would like to focus on the time the methods are executed only.

Performance results

What result will we get?

Querying Queryable... 96 foos retrieved Completed in 0.0420732 seconds. Querying Enumerable... 96 foos retrieved Completed in 5.3643147 seconds.

The difference is impressive: IEnumerable query is about 127 times slower than its IQueryable version. And the working time is really demonstrative: 5 seconds is obviously too much for this simple and trivial query.

SQL Server inside

We can use IntelliTrace or any SQL profiler to see what happens on SQL server side.

In case it's IQueryable the following SQL query is executed:

SELECT 
[Extent1].[Id] AS [Id], 
[Extent1].[Random] AS [Random]
FROM [dbo].[Foos] AS [Extent1]
WHERE [Extent1].[Random] < 1

And the following in case it's IEnumerable:

SELECT 
[Extent1].[Id] AS [Id], 
[Extent1].[Random] AS [Random]
FROM [dbo].[Foos] AS [Extent1]

Impact

The obvious conclusion is that the IEnumerable query requests all the million records from SQL Server and performs the Where extension method through enumerating all the obtained entities after their materialization.

Reasons

Why it happens? We should expect that in both cases different extension methods are exectuted: Enumerable.Where and Queryable.Where. Let's dig deeper and look into the code of these methods. We can use .NET Reflector or JetBrains dotPeek (which is free) or any other .NET decompiler. The following code is simplified to show the most essential info:

// Type: System.Linq.Enumerable
// Assembly: System.Core, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089
// Assembly location: C:\Windows\Microsoft.NET\Framework\v4.0.30319\System.Core.dll
public static class Enumerable
{
    public static IEnumerable<TSource> Where<TSource>(
        this IEnumerable<TSource> source, 
        Func<TSource, bool> predicate)
    {
        return (IEnumerable<TSource>) new Enumerable.WhereEnumerableIterator<TSource>(source, predicate);
    }
}


// Type: System.Linq.Queryable
// Assembly: System.Core, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089
// Assembly location: C:\Windows\Microsoft.NET\Framework\v4.0.30319\System.Core.dll
public static class Queryable
{
    public static IQueryable<TSource> Where<TSource>(
        this IQueryable<TSource> source, 
        Expression<Func<TSource, bool>> predicate)
    {
        return source.Provider.CreateQuery<TSource>(
            Expression.Call(
                null, 
                ((MethodInfo) MethodBase.GetCurrentMethod()).MakeGenericMethod(
                    new Type[] { typeof(TSource) }), 
                    new Expression[] { source.Expression, Expression.Quote(predicate) }));
    }
}

We can see that Enumerable returns WhereEnumerableIterator which knows nothing about query provider and can only perform Where predicate through enumerating all entities. On the other hand Queryable keeps IQueryable.Provider, in Entity Framework it's System.Data.Entity.Internal.Linq.DbQueryProvider, which can generate combined SQL query.

Workaround

But what can we do if our architecture supposes IEnumerable and we cannot change method signature?

private static IEnumerable<Foo> WhereEnumerable(IEnumerable<Foo> foos)
{
    return foos.AsQueryable().Where(foo => foo.Random < 1);
}

The AsQueryable() extension method will cast the parameter to IQueryable if it's possible, otherwise create EnumerableQuery, which will be a query provider itself and generate queries basing on existing enumerable source.

public static IQueryable<TElement> AsQueryable<TElement>(this IEnumerable<TElement> source)
{
    if (source is IQueryable<TElement>)
    {
        return (IQueryable<TElement>) source;
    }

    return new EnumerableQuery<TElement>(source);
}

So it will work the same effectively whether the actual source is IQueryable or IEnumerable.

Conclusion

We should be careful while passing IQueryable and IEnumerable in our cascade queries and always remember the simple difference, that IEnumerable is enumerAble and IQueryable is queryAble ;)

Feel free to download the sample source code.

8 comments:

  1. Great example!

    Thanks for sharing the sample.

    Regards
    Kam

    ReplyDelete
  2. Great Post Olexander Ivanits :)

    Nice way of explaining the difference between IQueryable and IEnumerable

    ReplyDelete
  3. Nice, neat and informative article. Good job, keep it up :)

    ReplyDelete
  4. I love this article :-D

    thanks for your sharing

    ReplyDelete
  5. Nice Explanation. Helps a lot :)

    ReplyDelete
  6. Thanks for the article, nice example!

    ReplyDelete
  7. Good one, nice clarification with performance and good explanation Olexander.

    ReplyDelete