Monday, February 11, 2013

Refactoring T-SQL using SSDT - Part 1

One of my favorite tools in my bag of SQL tricks is SQL Server Data Tools (SSDT) for Visual Studio. It's the successor to the old "DB Pro's" projects and can be a T-SQL junkie's best friend. I just recently presented on SSDT at the Feb FoxPASS meeting and I've submitted a similar talk to SQL Saturday #206 in Madison, WI on April 6th. Over the next few weeks, I'll be posting about some of my favorite features in SSDT as well as some of my biggest complaints and work-arounds. This is not an introduction to SSDT. For that I'd refer you any of a number of great SSDT resource on the web, including Gert Draper's great talk at TechEd.

Today I want to talk about Refactoring T-SQL code in SSDT. SSDT has a number of refactor options, I'll be going over two of them today and others later in the week.

1. Expand a Wildcard

We all recognize that SELECT * will only end in tears. SSDT, however, provides a quick and easy way to eliminate its use from a project and expand those wildcards into a nice list of columns. You can do so at both the project level or for just a single instance. To refactor a single instance, simply right click on the offending line and select Refactor > Expand Wildcards...
You're then given the opportunity to preview the changes that will be made. You can choose to apply or Cancel.
If you click Apply, you'll see that we now have expanded the wildcard to select all the columns of the table by name.
You can also expand all the wildcards in the entire project or solution by right-clicking on the project in the Solution Explorer.

It will search for all wildcards in the entire project or solution and you'll see all of them in the Preview Changes dialog.
It really is that simple! When I inherit a new application or am first given a database to deploy from a developer, one of the first things I do is use these refactor methods to root out some bad practices. SSDT also give us the ability to do static code analysis, but that's a topic for another post.

2. Rename a Column or Table

A task that should be fairly straightforward is to rename a table or column. Whether it's in order to bring a project in line with corporate naming standards or just because you changed your mind, renaming an object can often be an exercise in frustration. Finding all instances of the referenced object can involved querying sys.syscomments, sys.sysdepends and other tables and even then you may not find ALL of the references. SSDT does a great job of tracking dependencies and can make this job a snap. Simply go to the definition of the object you wish to rename, right click on it and select Refactor > Rename...
You are then presented with a dialog where you can enter the new name for the object. In this case, I'm renaming the "Birthday" column to "DOB".
When you click OK, much like with the Expand Wildcard refactor, you'll be presented with a Preview Changes dialog where you can see every object that SSDT tracked down that will need to change and the text of the change it will be making. In this case you can see that we're not only changing the table definition, but it found a stored procedure that refers to the column we're changing and has renamed the reference for us. Incidentally, it's the same stored procedure where we expanded the wildcard earlier. The most common opposition to expanding wildcards I've encountered is that it makes renaming columns difficult, but the Rename Refactor eliminates this excuse. Handy!.
After you click apply, the previewed changes will be made and you're all set. Or are you? Notice in the Solution Explorer a new file has appeared at the root of the project called "Sample_SSDT_refactorlog". The filename will be different depending on the name of the project (Sample_SSDT in this case), but you will have a new file here. What the heck is it?
Let's have a look. If we inspect the file, we'll see that it is an XML file that is a log of all the refactor commands that change the data structure of the project. Our Expand Wildcard is NOT in here, but the Column Rename IS.
Why is that? What is this file used for? It confused me as well, and since it had "Log" in the name, I just removed it as unnecessary cruft in the project. Then I went to deploy my changed project to my Dev-Integration server and the data that was in the renamed column disappeared. Whoops. It seems that SSDT and the Data Tier Application Framework (DACFX) use this file to know when a data-containing object is renamed, otherwise it would believe that the old object was dropped and an unrelated new object was created. In our case, DACFX will actually copy the data from the old column to the new if we leave this file in place. When a rename operation is done at Deploy-time, a new system table named __RefactorLog will appear in the database. It contains the Operation Key of the rename operation that has been applied and corresponds to the same Key in the log file in the project. This is so DACFX can keep track of what rename operations have occurred already and can apply any new ones that may conflict if not applied in the correct order. This is all pretty flipping cool, if you ask me.

Potential Gotchas

The sad reality, though, is that SSDT is not perfect. It can do a lot, but it can't do everything. One big thing you'll need to be aware of when doing these two refactor operations is that SSDT cannot refactor Dynamic SQL. In our sample project there is a stored procedure that uses dynamic SQL to build a WHERE clause. You'll see that it also uses a wildcard that did not get expanded. 
Dynamic SQL, to SQL Server and to SSDT is just a string. A string that is later passed to sp_executesql, but a string none-the-less. As such, SSDT can't be expected to be able to refactor it. This is a case where you'll have to be careful, and know your application and whether it uses dynamic SQL so you can refactor those bits by hand. 

Monday, February 4, 2013

Does a Database Restore 'Prime' the Buffer Pool?

A co-worker asked me one day last week if, when a database is restored, the restored pages are retained in the buffer pool. I immediately answered 'no', but I realized this was something I'd either always assumed or someone had told me once upon a time. I was sure I was right, but I wanted to remind myself of the how and why.

The first thing I did was to clear the buffer pool, restore a database and then checked the DMV sys.dm_os_buffer_descriptors to see if there were pages for that database in the bufferpool.

SELECT
    
(CASE WHEN ([database_id] = 32767) THEN N'Resource Database'
        
ELSE DB_NAME ([database_id]) END) AS [DatabaseName],
    
COUNT (*) * 8 / 1024 AS [Size_MB]
FROM sys.dm_os_buffer_descriptors
GROUP BY [database_id]
ORDER BY DB_NAME ([database_id]);


Sure enough, the amount of the buffer pool used by that database did not change and remained zero. In fact, if I perform the opposite test of doing some selects against the database to read pages into the buffer pool and then restore the database, we can see that again the used buffer pool for that database returns to zero after the restore. So what's going on? Why doesn't a restore 'prime' the buffer pool? In short, because the restored data pages never enter the buffer pool in the first place!

During backup and restore operations, SQL Server uses a separate memory area for buffering the data being handled. This buffer is NOT the buffer pool! In fact, this memory is allocated from the OS, outside of the server memory that you can allocate using the "max server memory" and "min server memory" settings. You can configure the size of this buffer using the BUFFERCOUNT and MAXTRANSFERSIZE options in the BACKUP and RESTORE commands. The total buffer size is BUFFERCOUNT*MAXTRANSFERSIZE. By default, MAXTRANSFERSIZE is 1MB and the BUFFERCOUNT is variable and version dependent, but can be calculated by the formula "NumberofBackupDevices* 3 + NumberofBackupDevices + (2 * NumberofVolumesInvolved)" for SQL 2005+. Details on this formula can be found here. In a very simple situation of a single backup device and a single volume, there would be 7 buffers for a total of 7MB used. Modifying these values can, in some cases, offer a performance benefit, but you'll need to test various scenarios in your environment in order to determine the optimal settings. In general, the benefits to be gained are minimal, but that doesn't mean it's not worth testing! Note that it is possible, if using large values, especially on 32-bit systems, to request more buffer space then the OS can deliver, in which case you'll receive the error:

Msg 701, Level 17, State 17, Line 1
There
is insufficient system memory to run this query.d]);


When testing, you can use trace flags 3605 & 3213 to have SQL Server write backup/restore statistics (including the values used for BUFFERCOUNT and MAXTRANSFERSIZE) to the error log. Also, Ben Snaidero wrote a great article talking about performance tuning backups.

Friday, February 1, 2013

Speaking at FoxPASS on Feb 6

I'll be presenting at the Monthly FoxPASS meeting next Wednesday Feb 6th. I'll be speaking on SQL Server Data Tools (SSDT) for Visual Studio 2012. I've submitted it to SQL Saturday #206 as well. Here is my entirely too wordy abstract:

Summary: Application development has for a long time had processes, controls and tools for code management; including source control, build management, code refactoring, deployment packaging and more. Management of database code has been an afterthought, perhaps keeping scripts in source control or using third-part schema comparison tools, but the times are changing. In 2007, Microsoft introduced Data Dude for Visual Studio 2005 to fill the gap. It was buggy and lacking in many respects, but it mostly worked and was a welcome step in the right direction. Many versions later we have SQL Server Data Tools for Visual Studio 2012 (and 2010). A much more mature product that finally brings database development up to par with traditional application development and has many features data geeks like ourselves will truly love. This talk will provide a quick overview of SSDT’s features and show you how to get started using it to save time, implement better code controls and be a better DBA or database developer.

Odd ADSI OpenQuery Performance Issue When Domain Controller was Retired

I recently ran into a Sharepoint/SSRS performance issue with slow report load times. That is, the report itself was slow to load before picking parameters and actually executing it. Once it did load, though, the report ran just fine. Some tracing determined that the hangup was on the population of the parameter lists for the report. The lists are dynamic, displaying specific values based on the executing user's Active Directory group memberships. The user's group membership are determined through a stored procedure that uses an ADSI OpenQuery to  interrogate AD.

Because AD groups can be nested or bundled with a role, the procedure processes groups/subgroups recursively. Our environment is pretty complex, so each time this procedure runs, it is often doing about 100 or so ADSI queries. Normally, this still completes in a fraction of a second and report load times are acceptable. Until about a week ago when suddenly report load times began taking about 2 minutes. Having traced it to this procedure, a statement-level Profiler trace showed that each ADSI query was taking about 1015ms, where in all of our test and development environments, they took about 15ms. Each ADSI query was taking almost exactly 1 second longer in our production environment. What the heck?

Grasping at straws at this point, I remembered from our change control meetings that a old Domain Controller had been retired about the same time the problem began. I theorized that SQL Server had cached the domain controller it used for these queries, even though the query was written to interrogate the domain, not a particular controller. Perhaps when this query was first run, it hit the now-retired controller, and now it was attempting to contact it first each and every time. Perhaps it was then timing out after 1s, and then retrying on another DC, which would succeed.

Since this was affecting our reporting system and not anything transactional, an after hours restart of the SQL service was simple to schedule, so we decided to try it. When in doubt, reboot, right? We hoped that when the service account logged in, it would pick up a new domain controller and the problem would go away. Sure enough, after the service restart, query times returned to the 15ms range and report load times returned to normal.

So, the question is, was our theory correct? Because we decided to test the theory by restarting the service instead of attempting to trace AD query traffic, I guess we'll never know, and that is bothering me now. I wish, in hindsight, that we had worked with our infrastructure team to trace that traffic and know for sure.

Our ADSI linked server is using the default timeout values (0), I've done some Googling to see if I could find a reference to ADSI timeouts being 1s by default, but found that it seems to be 30s by default. So, I'm at a loss to confirm that our theory was correct. Any additional thoughts on the matter are much appreciated!