Keeping database size to reasonable levels is one of a typical DBA's biggest challenge and with good reasons. Huge size databases are susceptible to corruption and are a nightmare to manage.
Here's a scenario. You inherited a database on an Enterprise edition instance acting as a backend database for a custom application. You had just been informed that the database size has grown so much recently and free storage space is decreasing as a result. This is SQL Server 2017.
The question is?
This post will outline the approach used including queries. We will also look at where to investigate and how and why some space can be reclaimed. It will also outline the components to investigate and why database space can be reclaimed there without sacrificing on performance or functionality.
First we will analyze allocated and unallocated space in the storage. This we will achieve with the below TSQL query:
SELECT DF.name AS [FileName], DF.type_desc AS [FType], FG.name AS [FGroup], DF.physical_name AS [FPath], DF.size / 128.0 AS [CurrentSize_inMb], DF.size / 128.0 - CONVERT(INT,FILEPROPERTY(DF.name,'SpaceUsed')) / 128.0 AS [FreeSpace_inMb] FROM SYS.DATABASE_FILES DF WITH (NOLOCK) LEFT JOIN SYS.FILEGROUPS FG WITH (NOLOCK) ON DF.data_space_id = FG.data_space_id
With this query we will know how much space can be saved. We will focus on cutting down unused data and indexes as this is the safest options for database size reduction with less or zero negative performance footprints.
We will analyze TSQL queries to ensure that datatype usage is not tying up spaces unnecessarily. For example to store cell phone numbers for an application that will only be used in the
USA we will use
CHAR(10) for the column since US cell phone numbers have fixed length of 10 characters.
Using the following query, we will look at column definitions per table and their corresponding data types and character length if string type or numeric precision if numeric type. These values will help us understand whether the database is tying up spaces unnecessarily.
SELECT TABLE_NAME, COLUMN_NAME, DATA_TYPE, CHARACTER_MAXIMUM_LENGTH, NUMERIC_PRECISION FROM INFORMATION_SCHEMA.COLUMNS
We will use DMVs and DMFs (Dynamic Management Views and Functions) to check index usage statistics to identify indexes that have never been used since their creation and indexes that have last been used six months ago for example.
The following queries will identify these. Indexes no doubt occupy spaces on the hard disk, therefore removing unused ones will definitely free up space in the system storage:
For indexes that have never been used since their last creation, we will use this query:
SELECT OBJECT_NAME(B.OBJECT_ID) AS [Table],B.NAME AS [INDEX],C.create_date,A.last_user_seek,A.last_user_scan FROM sys.dm_db_index_usage_stats A INNER JOIN SYS.INDEXES B ON A.index_id=B.index_id INNER JOIN SYS.OBJECTS C ON A.object_id=C.object_id WHERE A.user_seeks=0 AND A.user_scans=0 ORDER BY C.create_date DESC
For indexes that have never been used since last six months for example, we will use this query:
SELECT OBJECT_NAME(B.OBJECT_ID) AS [Table],B.NAME AS [INDEX],C.create_date,A.last_user_seek,A.last_user_scan FROM sys.dm_db_index_usage_stats A INNER JOIN SYS.INDEXES B ON A.index_id=B.index_id INNER JOIN SYS.OBJECTS C ON A.object_id=C.object_id WHERE A.last_user_seek <= DATEADD(MONTH,-6,GETDATE()) ORDER BY C.create_date DESC
These are non-clustered indexes with the same leftmost column of the same table. For example if we find a table with two index definition as follows:
Table: test_main Index: IX_test_main_b Included_Columns: b,c,e
Table: test_main Index: IX_test_main_bd Included_Columns: b,d
Then it will save space to merge the two indexes into one with included columns
I wrote the query below to identify and analyze these columns with additional unnecessary non-clustered indexes:
;WITH CTE1 AS( SELECT TableName,IndexName,Included_Columns,Dup_Column,DUP FROM( SELECT *,ROW_NUMBER() OVER(PARTITION BY Dup_Column,TableName ORDER BY IndexName) AS DUP FROM( SELECT *,SUBSTRING(Included_Columns,1,CHARINDEX(',',Included_Columns)-1) AS Dup_Column FROM( SELECT TableName,IndexName, (SELECT STUFF((SELECT ',' + C.NAME FROM SYS.INDEXES A INNER JOIN SYS.INDEX_COLUMNS B ON A.OBJECT_ID = B.OBJECT_ID AND A.INDEX_ID = B.INDEX_ID INNER JOIN SYS.COLUMNS C ON B.OBJECT_ID = C.OBJECT_ID AND B.COLUMN_ID = C.COLUMN_ID INNER JOIN SYS.TABLES D ON A.OBJECT_ID = D.OBJECT_ID WHERE A.[TYPE]=2 AND D.IS_MS_SHIPPED=0 AND A.NAME=G.IndexName FOR XML PATH('')),1,1,'')) AS Included_Columns FROM( SELECT D.NAME AS TableName, A.NAME AS IndexName FROM SYS.INDEXES A INNER JOIN SYS.TABLES D ON A.OBJECT_ID = D.OBJECT_ID WHERE A.[TYPE]=2 AND D.IS_MS_SHIPPED=0 ) G) H) R) F), CTE2 AS(SELECT Dup_Column,TableName FROM CTE1 WHERE DUP>1) SELECT TableName,IndexName,Included_Columns,Dup_Column FROM CTE1 WHERE TableName IN (SELECT TableName FROM CTE2)
Page splits occur when there is not enough space on a page to write incoming data. Because this usually leads to index fragmentation, we will minimize it by using index fill factors.With fill factor, we can control the amount of free space SQL Server reserves on pages during index creation and index rebuild.
For example we could set fill factor of 80, this tells SQL Server to use 80% of total space and reserve 20% free space during page writing. First I will have to identify indexes that could use fill factors to free up space using the following TSQL query:
SELECT DISTINCT B.TABLE_NAME,A.index_id, partition_number, alloc_unit_type_desc ,index_level, page_count, avg_page_space_used_in_percent,avg_fragmentation_in_percent FROM SYS.INDEXES A INNER JOIN INFORMATION_SCHEMA.COLUMNS B ON A.OBJECT_ID=OBJECT_ID(B.TABLE_NAME) CROSS APPLY sys.dm_db_index_physical_stats ( DB_ID(),OBJECT_ID(B.TABLE_NAME),A.index_id,NULL,'DETAILED' )
Per Microsoft recommendation, if we see
avg_fragmentation_in_percent values greater than 5% but less than 30%, then we would alter the corresponding indexes. For values
greater than 30%, we would rebuild the associated indexes.
We will accomplish these with the following query for example:
ALTER INDEX IX_NAME ON [TABLE] REORGANIZE GO ALTER INDEX IX_NAME ON [TABLE] REBUILD WITH(FILLFACTOR = 80,ONLINE = ON) GO
The fill factor value can be adjusted until an optimal performance is achieved. There are some limits at which we can recover space using index reorganization and rebuilding. For example performance could be degraded on indexes with over 1000 partitions.
LOB stands for large binary object. These are usually files that are stored in the database using TSQL datatype
VARBINARY. Storing binary data in database takes up huge
amount of space as the data is stored in huge binary text. I will write a post detailing how we can efficiently move existing LOB records in the database to the file system storage using
Java. Ok so I have written that post. Here it is move database BLOBs to file system storage.
We could also implement data compression that occurs as a result of inefficiency of fixed-data types like the
INT type. For example the
INT type uses 4 byte
even when the current value is
NULL. However data compression is only possible with SQL Server enterprise edition.
More from @sqldibia