0

How to Reduce Database Size and Reclaim Space without Sacrificing Performance

Emeka OkekeDec 20, 2019 | edited Dec 27, 2019 - by @sqldibia

Keeping database size to reasonable levels is one of a typical DBA's biggest challenge and with good reasons. Huge size databases are susceptible to corruption and are a nightmare to manage.

Here's a scenario. You inherited a database on an Enterprise edition instance acting as a backend database for a custom application. You had just been informed that the database size has grown so much recently and free storage space is decreasing as a result. This is SQL Server 2017.

The question is?

How will you reduce the database size and reclaim space?

This post will outline the approach used including queries. We will also look at where to investigate and how and why some space can be reclaimed. It will also outline the components to investigate and why database space can be reclaimed there without sacrificing on performance or functionality.

First we will analyze allocated and unallocated space in the storage. This we will achieve with the below TSQL query:



SELECT DF.name AS [FileName], DF.type_desc AS [FType], FG.name AS [FGroup],
	DF.physical_name AS [FPath], DF.size / 128.0 AS [CurrentSize_inMb],
	DF.size / 128.0 - CONVERT(INT,FILEPROPERTY(DF.name,'SpaceUsed')) / 128.0 AS [FreeSpace_inMb]
	FROM SYS.DATABASE_FILES DF WITH (NOLOCK) LEFT JOIN 
     SYS.FILEGROUPS FG WITH (NOLOCK) ON DF.data_space_id = FG.data_space_id

With this query we will know how much space can be saved. We will focus on cutting down unused data and indexes as this is the safest options for database size reduction with less or zero negative performance footprints.

Using appropriate data types

We will analyze TSQL queries to ensure that datatype usage is not tying up spaces unnecessarily. For example to store cell phone numbers for an application that will only be used in the USA we will use CHAR(10) for the column since US cell phone numbers have fixed length of 10 characters.

Using the following query, we will look at column definitions per table and their corresponding data types and character length if string type or numeric precision if numeric type. These values will help us understand whether the database is tying up spaces unnecessarily.



SELECT TABLE_NAME, COLUMN_NAME, DATA_TYPE, CHARACTER_MAXIMUM_LENGTH,
NUMERIC_PRECISION FROM INFORMATION_SCHEMA.COLUMNS

Removing unused indexes

We will use DMVs and DMFs (Dynamic Management Views and Functions) to check index usage statistics to identify indexes that have never been used since their creation and indexes that have last been used six months ago for example.

The following queries will identify these. Indexes no doubt occupy spaces on the hard disk, therefore removing unused ones will definitely free up space in the system storage:

For indexes that have never been used since their last creation, we will use this query:



SELECT OBJECT_NAME(B.OBJECT_ID) AS [Table],B.NAME AS [INDEX],C.create_date,A.last_user_seek,A.last_user_scan FROM sys.dm_db_index_usage_stats A
INNER JOIN SYS.INDEXES B ON A.index_id=B.index_id
INNER JOIN SYS.OBJECTS C ON A.object_id=C.object_id
WHERE A.user_seeks=0 AND A.user_scans=0
ORDER BY C.create_date DESC

For indexes that have never been used since last six months for example, we will use this query:



SELECT OBJECT_NAME(B.OBJECT_ID) AS [Table],B.NAME AS [INDEX],C.create_date,A.last_user_seek,A.last_user_scan FROM sys.dm_db_index_usage_stats A
INNER JOIN SYS.INDEXES B ON A.index_id=B.index_id
INNER JOIN SYS.OBJECTS C ON A.object_id=C.object_id
WHERE A.last_user_seek <= DATEADD(MONTH,-6,GETDATE())
ORDER BY C.create_date DESC

Removing unnecessary indexes

These are non-clustered indexes with the same leftmost column of the same table. For example if we find a table with two index definition as follows:

First index:

Table: test_main
Index: IX_test_main_b
Included_Columns: b,c,e

Second index:

Table: test_main
Index: IX_test_main_bd
Included_Columns: b,d

Then it will save space to merge the two indexes into one with included columns b,c,d,e.

I wrote the query below to identify and analyze these columns with additional unnecessary non-clustered indexes:



;WITH CTE1 AS(
SELECT TableName,IndexName,Included_Columns,Dup_Column,DUP FROM(
SELECT *,ROW_NUMBER() OVER(PARTITION BY Dup_Column,TableName ORDER BY IndexName) AS DUP FROM(
SELECT *,SUBSTRING(Included_Columns,1,CHARINDEX(',',Included_Columns)-1) AS Dup_Column FROM(
SELECT TableName,IndexName,
(SELECT STUFF((SELECT 
     ',' + C.NAME FROM
SYS.INDEXES A
INNER JOIN SYS.INDEX_COLUMNS B ON  A.OBJECT_ID = B.OBJECT_ID AND A.INDEX_ID = B.INDEX_ID
INNER JOIN SYS.COLUMNS C ON B.OBJECT_ID = C.OBJECT_ID AND B.COLUMN_ID = C.COLUMN_ID
INNER JOIN SYS.TABLES D ON A.OBJECT_ID = D.OBJECT_ID 
WHERE A.[TYPE]=2 AND D.IS_MS_SHIPPED=0 AND A.NAME=G.IndexName
FOR XML PATH('')),1,1,'')) AS Included_Columns
FROM(
SELECT 
     D.NAME AS TableName,
     A.NAME AS IndexName 
FROM 
SYS.INDEXES A
INNER JOIN SYS.TABLES D ON A.OBJECT_ID = D.OBJECT_ID 
WHERE A.[TYPE]=2 AND D.IS_MS_SHIPPED=0
) G) H) R) F), CTE2 AS(SELECT Dup_Column,TableName FROM CTE1 WHERE DUP>1)

SELECT TableName,IndexName,Included_Columns,Dup_Column FROM CTE1
WHERE TableName IN (SELECT TableName FROM CTE2)

Minimizing index fragmentation

Page splits occur when there is not enough space on a page to write incoming data. Because this usually leads to index fragmentation, we will minimize it by using index fill factors.With fill factor, we can control the amount of free space SQL Server reserves on pages during index creation and index rebuild.

For example we could set fill factor of 80, this tells SQL Server to use 80% of total space and reserve 20% free space during page writing. First I will have to identify indexes that could use fill factors to free up space using the following TSQL query:



	SELECT DISTINCT B.TABLE_NAME,A.index_id, partition_number, alloc_unit_type_desc
    ,index_level, page_count, avg_page_space_used_in_percent,avg_fragmentation_in_percent FROM SYS.INDEXES A
	INNER JOIN INFORMATION_SCHEMA.COLUMNS B
	ON A.OBJECT_ID=OBJECT_ID(B.TABLE_NAME)
	CROSS APPLY sys.dm_db_index_physical_stats
    (
        DB_ID(),OBJECT_ID(B.TABLE_NAME),A.index_id,NULL,'DETAILED'
    )

Per Microsoft recommendation, if we see avg_fragmentation_in_percent values greater than 5% but less than 30%, then we would alter the corresponding indexes. For values greater than 30%, we would rebuild the associated indexes.

We will accomplish these with the following query for example:



ALTER INDEX IX_NAME ON [TABLE] REORGANIZE GO 
ALTER INDEX IX_NAME ON [TABLE] REBUILD WITH(FILLFACTOR = 80,ONLINE = ON) GO 

The fill factor value can be adjusted until an optimal performance is achieved. There are some limits at which we can recover space using index reorganization and rebuilding. For example performance could be degraded on indexes with over 1000 partitions.

Storing binary LOB data as files outside the database

LOB stands for large binary object. These are usually files that are stored in the database using TSQL datatype VARBINARY. Storing binary data in database takes up huge amount of space as the data is stored in huge binary text. I will write a post detailing how we can efficiently move existing LOB records in the database to the file system storage using Java. Ok so I have written that post. Here it is move database BLOBs to file system storage.

Data compression in database

We could also implement data compression that occurs as a result of inefficiency of fixed-data types like the INT type. For example the INT type uses 4 byte even when the current value is NULL. However data compression is only possible with SQL Server enterprise edition.

Emeka Okeke@sqldibia+ Follow
0
Emeka Okeke+ Follow
locationPennsburg, PennsylvaniajoinedDec 11, 2019

More from @sqldibia