Friday 3 January 2014

Null Sematics in LINQ’s Sum

C# LINQ code is not always executed as C# code. With LINQ-to-SQL or Entity Framework it is translated into SQL instead. This can bring unpleasant surprises when things that can’t happen do happen.
One day when you’re deep in the zone coding a marketing specialist shows up with a haunted look because the CEO wants some numbers right now. You know, the CEO wants it NOW as in IMMEDIATELY, within 60 seconds. Well, you’re already kicked out of the zone so let’s fix those numbers. The CEO wants to know the total number of seats in the company’s blue cars (I don’t understand why that’s relevant – but the CEO is the boss so let’s just do it (besides it makes a good example of what I want to show. All right?))
The explanation already took 15 seconds – so now you’re down to 45 seconds to produce the numbers for the CEO. Being a C#/LINQ hacker the fastest tool is of course LINQPad.
5 seconds later (you’re fast at the keyboard!) you’ve got the query done:
Cars.Where(c => c.Color == "Blue").Sum(c => c.Seats)
5 more seconds to compile and run and you’ve got the magic number – 14. The marketing specialist runs off, happy to make it within the required 60 seconds. You get back to coding. The moment when you’ve got into the zone the marketing specialist shows up again (interesting how they always show up exactly when the solution to that hard problem is within reach).
Now the CEO wants the numbers for Green cars as well to tomorrow’s meeting. As it looks like a minor thing, it’s best to do it right away. You switch back to LINQPad and change the color in the Where(...) call and hits F5. This is when everything blows up in your face with an InvalidOperationException.
The null value cannot be assigned to a member with type System.Int32 which is a non-nullable value type.
At least the deadline is more than 60 seconds this time. You’ve got all night to figure out what happened. To save you from a sleepless night, I’ll let you know both what happened and what to do about it.

Back to Basics – LINQ to Objects

Before going into details on what happened and why, let’s have a look at the basics: LINQ to Objects. The first encounter with LINQ is often for data access, so it may be easy to overlook the basic implementation that works on objects in memory.
List<Car> Cars = new List<Car>() { new Car() };
var q = Cars.Where(c => c.Color == "Green").Sum(c => c.Seats);
The only car in the list is blue, so there will be no matching objects at all. The number of seats in zero cars is zero. q becomes and int with value 0. When running this query against a SQL database using LINQ to SQL or LINQ to Entities things are getting more complicated.

LINQ to SQL and LINQ to Entities

With LINQ to SQL and LINQ to Entities the C# code is no longer executed itself. It is translated into SQL and therein lies the problem. LINQ to SQL and LINQ to Entities are both leaky abstractions. The generated SQL code is basically the same (this is the LINQ to SQL one):
SELECT SUM([t0].[Seats]) AS [VALUE]
FROM [Cars] AS [t0]
WHERE [t0].[Color] = @p0
It’s a really straight forward translation of the C# LINQ expression. The problem is the SQL SUM operator which returns NULL for empty sequences. When the result is returned to LINQ to SQL or Entity Framework it fails miserably when trying to assign the NULL value into a non-nullable int.
InvalidOperationException: The cast to value type ‘Int32′ failed because the materialized value is null. Either the result type’s generic parameter or the query must use a nullable type.
- Entity Framework
InvalidOperationException: The null value cannot be assigned to a member with type System.Int32 which is a non-nullable value type.
- LINQ to SQL

A Fix (EF only)

A simple fix is to call the DefaultIfEmpty() function, which inserts a single element with a default value if the sequence is empty. To be able to call DefaultIfEmpty(0) the selection of the seat property has to be moved outside the call to Sum.
Cars.Where(c => c.Color == "Green").Select(c => c.Seats).DefaultIfEmpty(0).Sum()
The SQL generated contains a CASE WHEN to to handle the NULL case.
SELECT 
[GroupBy1].[A1] AS [C1]
FROM ( SELECT 
 SUM([Join1].[A1]) AS [A1]
 FROM ( SELECT 
  CASE WHEN ([Project1].[C1] IS NULL) THEN 0 ELSE [Project1].[Seats] END AS [A1]
  FROM   ( SELECT 1 AS X ) AS [SingleRowTable1]
  LEFT OUTER JOIN  (SELECT 
   [Extent1].[Seats] AS [Seats], 
   CAST(1 AS tinyint) AS [C1]
   FROM [dbo].[Cars] AS [Extent1]
   WHERE N'Green' = [Extent1].[Color] ) AS [Project1] ON 1 = 1
 )  AS [Join1]
)  AS [GroupBy1]
Unfortunately this doesn’t work with LINQ to SQL, it gives an exception.
NotSupportedException: Unsupported overload used for query operator ‘DefaultIfEmpty’.
The exception message indicates that another overload might be the solution, so let’s try DefaultIfEmpty().
InvalidOperationException: Could not format node ‘OptionalValue’ for execution as SQL.
That’s even more confusing. However, there is another solution to the problem.

Another fix (EF and LINQ to SQL)

Another way to solve the problem is to explicitly tell the query engine that the result might be null and then use the C# coalesce operator (??) to translate null to 0.
Cars.Where(c => c.Color == "Green").Sum(c => (int?)c.Seats) ?? 0
This works both with EF and LINQ to SQL. The SQL generated is identical to the first one showed. The null handling is now done after the data has been returned from the database to the query provider.

Leaky abstractions

Both EF a/nd LINQ to SQL obviously suffers from being leaky abstractions. I think that letting the differentnull semantics of SQL bubble up to the C# code is a bad design decision. Handling the null semantics should really be part of the query provider, not something that every application using it will have to take care of.

No comments:

Post a Comment