Thursday, February 16, 2012

SQL CE performance improvements in 2.6.1

Galleries using SQL CE benefited from significant performance improvements in 2.6.0. Retrieval of data was up to 15 times faster than in 2.5. However, many users noticed that synchronize operations still were slower than both SQL Server and the now obsolete SQLite data provider (replaced by SQL CE in 2.5). This week I focused on analyzing the source of the difference with an aim to improve it. I am pleased to say that I succeeded in achieving about a 50% speed improvement with only a minimal change. I will soon be releasing 2.6.1, which contains this benefit as well as several bug fixes.

Before I dig into the details, I want to mention that the upcoming .NET 4.5 runtime is expected to bring additional performance benefits to SQL CE users. The ADO.NET Team just announced some details that show a real world app running 67% faster under .NET 4.5 than .NET 4.0. This will be in addition to the performance enhancements in 2.6.1, so there is a lot of movement in the SQL CE performance arena.

What is taking so long?

Using the Performance Analyzer in Visual Studio 2010, I ran some tests against Gallery Server Pro 2.6.0 using both the SQL CE and SQL Server data providers. I studied gigabytes of data and was able to condense it all down to these two tables:

Table A - 2.6.0, Sync 648 images

 

SQL CE

SQL Server

Duration

38.3 min.

13 min.

% of time inserting records into gs_MediaObjectMetadata

41%

0.21%

Table B - 2.6.0, Sync existing 105 images, option ‘Re-import metadata’ selected (no thumbnail or optimized images created)

 

SQL CE

SQL Server

Duration

9.3 min.

1.6 min.

% of time inserting records into gs_MediaObjectMetadata

83%

4.4%

Table A shows the performance when the sync operation discovers and finds 648 new objects. This requires creating thumbnail and optimized images.

Table B shows a sync of an album that already has 105 images in the gallery. The option ‘Re-import metadata’ is selected, resulting in a sync where all it is really doing is extracting and inserting metadata, since no thumbnail/optimized images need to be created.

Notice that for the first test, SQL CE takes nearly three times longer than SQL Server, with almost half the time spent inserting records into gs_MediaObjectMetadata. This table stores data such as camera model, shutter speed, file size, etc. Each media object in the gallery usually has at least four metadata records, and there are often twenty or thirty for images from digital cameras. In the tests above each image had about 20 metadata items.

Table B shows an even more dramatic difference. When we remove the time-consuming image generation from a sync, more than 80% of the time is spent inserting metadata records in SQL CE, while it is only 4.4% in SQL Server.

It was pretty clear I needed to focus on the metadata insertion routine. Here is the SQL CE version:

InsertMetadataItems260

It loops through each metadata item, converts it into a simple data object, and passes it to the Entity Framework (EF) data context for persisting to the database. Digging into the data further, I could see that nearly all the time in this function was spent executing ctx.SaveChanges().

Ah ha! The villain hath made himself known! SaveChanges is an Entity Framework/.NET method that does a lot of complicated stuff to ensure data integrity. It has been an incredible productivity booster for developers, but it can be much slower than pure ADO.NET. And here we have a perfect example.

Look at the SQL SaveChanges() generates:

insert [gs_MediaObjectMetadata]
([FKMediaObjectId], [MetadataNameIdentifier], [Description], [Value]) values (@0, @1, @2, @3)

select [MediaObjectMetadataId]
from [gs_MediaObjectMetadata]
where [MediaObjectMetadataId] = @@IDENTITY

The first one I expected. The second one made me cringe. It forces SQL CE to query the table to look for the newly assigned ID, when it could avoid the table search by just doing this:

SELECT @@IDENTITY()

Since the inefficient SQL was executing for each metadata insertion, a media object with 20 metadata properties was causing a SELECT query against the table 20 times. In contrast, the SQL Server routine executes a stored procedure where the newly assigned ID is returned as an output parameter, and no SELECT is used.

A few failed attempts

Using the hypothesis that the unnecessary table search after every insert was the root cause, I spent some time looking for a setting or some way to optimize the SQL. In the end, I couldn’t find one. There seemed to be no way to tell EF that there was a better way to do things.

It is well known that the fastest way to interact with data in the .NET world is to use ADO.NET. Although I used EF in the SQL CE data provider, there is no technical reason why I couldn’t replace the EF approach in one place with blazing fast ADO.NET. So I whipped up a replacement function for inserting metadata:

InsertMetadataItems_ADONET

This change made an incredible difference. Repeating the test from Table A showed that the percent of time spent inserting metadata dropped from 41% to 0.91%. By changing one function, suddenly SQL CE was performing within spitting distance of SQL Server.

But then something horrible happened. Before the test finished, the IIS worker process w3wp.exe crashed. It experienced an AccessViolationException with the message ‘Attempted to read or write protected memory. This is often an indication that other memory is corrupt’.

I did some more tests, and the error happened at some random point in every sync I ran. At one point I was able to capture a stack trace:

at System.Data.SqlServerCe.SqlCeConnection.Dispose(Boolean disposing)
at System.ComponentModel.Component.Dispose()
at System.Data.Entity.Internal.LazyInternalConnection.Dispose()
at System.Data.Entity.Internal.LazyInternalContext.DisposeContext()
at System.Data.Entity.Internal.InternalContext.Dispose()
at System.Data.Entity.DbContext.Dispose()
at GalleryServerPro.Data.SqlCe.SqlCeGalleryServerProProvider.MediaObject_Save(IGalleryObject mediaObject) in C:\Dev\GSP\Dev-Main-2.5\Src\TIS.GSP.Data.SqlCE\SqlCeGalleryServerProProvider.cs:line 828
at GalleryServerPro.Business.MediaObjectSaveBehavior.Save() in C:\Dev\GSP\Dev-Main-2.5\Src\TIS.GSP.Business\MediaObjectSaveBehavior.cs:line 82
at GalleryServerPro.Business.GalleryObject.Save() in C:\Dev\GSP\Dev-Main-2.5\Src\TIS.GSP.Business\GalleryObject.cs:line 959
at GalleryServerPro.Business.SynchronizationManager.UpdateExistingMediaObject(IAlbum album, IGalleryObject mediaObject) in C:\Dev\GSP\Dev-Main-2.5\Src\TIS.GSP.Business\SynchronizationManager.cs:line 707
at GalleryServerPro.Business.SynchronizationManager.SynchronizeMediaObjectFiles(DirectoryInfo directory, IAlbum album) in C:\Dev\GSP\Dev-Main-2.5\Src\TIS.GSP.Business\SynchronizationManager.cs:line 614
at GalleryServerPro.Business.SynchronizationManager.Synchronize(String synchId, IAlbum album, String userName) in C:\Dev\GSP\Dev-Main-2.5\Src\TIS.GSP.Business\SynchronizationManager.cs:line 207
at Gsp.Gallery.Synchronize(Int32 albumId, String synchId, Boolean isRecursive, Boolean rebuildThumbnails, Boolean rebuildOptimized, Boolean regenerateMetadata) in C:\Dev\GSP\Dev-Main-2.5\Src\Website\gs\services\Gallery.asmx.cs:line 549

The error was occurring within EF code, not the new ADO.NET code I just wrote. For some reason, combining EF and ADO.NET together was not working. I googled around but ultimately found nothing. I racked my brain for a couple hours trying to figure out some way to preserve this awesome new performance trick, but in the end I could not solve it.

I had to scrap the ADO.NET code. With a sniff and a tear in my eye I hit the delete key…

The solution that worked

I went back to the original InsertMetadataItems() function and studied it. What if, instead of calling SaveChanges() for each metadata insertion, I batched them up and only ran it once for each media object? I commented out the SaveChanges() line and the one immediately after it (since we no longer would have the newly assigned ID at that point) and ran the test again.

Success! The time it took to run the tests dropped by almost half! Here is the data from table B again, this time with an additional column for the improved version.

Table C - Sync existing 105 images, option ‘Re-import metadata’ selected (no thumbnail or optimized images created)

 

SQL CE
2.6.0

SQL CE
2.6.1

SQL Server

Duration

9.3 min.

4.8 min.

1.6 min.

% of time inserting records into gs_MediaObjectMetadata

83%

72%

4.4%

# of calls to SaveChanges()

2404

108

N/A

Interestingly, the SQL generated by EF still includes the unnecessary table query, which may account for much of the remaining performance difference with SQL Server. But reducing the number of calls to SaveChanges() still had a dramatic effect.

In order to preserve the original behavior of the function, I wasn’t done yet. I still needed to assign the newly created ID to the metadata objects. So I added a little plumbing code to assign all the IDs once they are known. Here is the final version of the function:

InsertMetadataItems261

Wrap up

This change really improves insertion scenarios in SQL CE, which are primarily synchronizations and file uploads to the gallery. Retrieval performance is already pretty good.

Even with this change and the expected improvements in .NET 4.5, SQL Server will remain the fastest data provider for Gallery Server Pro. It is highly optimized to use stored procedures, indexes, and carefully constructed ADO.NET.

Use SQL Server when you need the best performance and don’t mind it’s additional complexity and maintenance requirements. Use SQL CE when simplicity is paramount.

No comments: