0

Why You Should Move Database BLOBs to File System Storage

Emeka OkekeDec 27, 2019 | edited Dec 27, 2019 - by @sqldibia

I wrote a post last week on reclaiming database space without any negative impact on performance and functionality. I also promised to share my solution to moving files stored as large binary objects out of the database to file system storage. So here I am to fulfil my promise.

One of the ways of cleaning up the database and reducing its size that I listed was moving BLOB or large binary objects out of database storage and to file system storage.

I also stated that BLOBs take up huge space in the database. If you want to see just what I mean, try copying the binary textual representation of your file from the database table where it is stored and paste it in a notepad. I did, see below image:

Blob text

As you can imagine, it is huge. This is just one file. Now imagine having over 1000 or a million files stored like that in the database.

You should expect your database maintenance cost to increase with each BLOB entry in your database.

Unless of course you have unlimited funds to afford such cost. Even if you can afford it, every DBA and database architect that is well experienced knows the importance of keeping database size at the barest minimum.

They all know that huge size databases are more likely to get corrupted during backups, restoration, and database media file migration.

Storing BLOBs or large binary objects in the database not only increases the size of it but also significantly degrade its performance. Take the results of running select query against a table with BLOB files compared with that of a table without BLOB files below:

Query that returns BLOB files:

blob_table_query.PNG

Query that returns no BLOB files:

no_blob_table_query.PNG

As you can see the table with BLOB files' select query ran much slower than the select query of the table with no BLOB files. Even though the table with no BLOB files has more records, selecting from it still ran faster than selecting from the table with the BLOB files. This is just for a few number of records. I'll leave you to imagine what the performance differences would look like for large number of records.

If you cannot afford increasing database maintenance cost or performance degradation, then you should consider moving your files stored as BLOB out of the database and to the file system storage.

However, moving such huge binary objects out of the database poses serious technical challenge, especially if you have huge number of such files already stored in your database. Don't worry I have already solved this problem using a simple Java class below that I developed for this purpose last year.

The Solution

Assuming we have a table as defined below:


CREATE TABLE [AccountsInfo].[Docs](DocID INT IDENTITY(1,1) PRIMARY KEY, Doc VARBINARY(MAX),DocName VARCHAR(100))

Then we will alter the table as follows to add the filepath column that will be used to indicate the location of the BLOB files in file system storage.


ALTER TABLE [AccountsInfo].[Docs] ADD FilePath VARCHAR(100)

Here's the Java class I developed for this purpose:



package dentApp.Util.Functions;

import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.StringReader;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.Statement;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;

import org.w3c.dom.Document;
import org.w3c.dom.NodeList;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;


public class BLOBrefactor {

	
	public static void main(String[] args) {
		
		
		String userName = "DentUSA";
		String password = "09!@#0oop2AS";
		String url = "jdbc:sqlserver://localhost;instance=USER-PC/SQLEXPRESS;databaseName=DentAp";
		int t = 10;
		
		try {
			Class.forName("com.microsoft.sqlserver.jdbc.SQLServerDriver");
			
			 

	         Connection connection = DriverManager.getConnection(url, userName, password);
	         DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
	         DocumentBuilder builder = factory.newDocumentBuilder();	    		
       	 		        	 

	        	 
	       	        for(int i = 0; i<t; i++){		        	 

                        String gstring = "SELECT '<row>' + (SELECT DOCID FROM (SELECT DocID,NTILE(" + String.valueOf(t) + ") OVER (ORDER BY DOCID) AS NUM FROM [AccountsInfo].[Docs] WITH (NOLOCK) WHERE FilePath IS NOT NULL) A WHERE A.NUM=" + String.valueOf(i+1) + " FOR XML PATH("")) + '</row>' AS G";
		        	 
		        	 Statement statement1 = connection.createStatement();
		        	 ResultSet res = statement1.executeQuery(gstring);	

		        	 InputSource ins = null;
		        	 while(res.next())
                         ins = new InputSource(new StringReader(res.getString(1)));
		        	 
			         Document doc = builder.parse(ins);        	 
			         NodeList nodes = doc.getFirstChild().getChildNodes();		   
			         
			         new Thread(() -> {	          
			 			try {
			 				

			 				
			 				Statement statement2 = connection.createStatement();
			              
			 		         for (int d = 0; d < nodes.getLength(); d++) {      	 	        	 
			 		        	 ResultSet rs = statement2.executeQuery("SELECT DocID,DocName,Doc FROM [AccountsInfo].[Docs] WITH (NOLOCK) WHERE DOCID=" + 
                                                  nodes.item(d).getFirstChild().getNodeValue());
			 		        	 while (rs.next()) {

			 		        		String filename = "C:/DB/Documents/" + rs.getString(2);
					        	    String fileid = rs.getString(1);

						            FileOutputStream fop = new FileOutputStream(filename);
						            InputStream in = rs.getBinaryStream(3);
						            int b = 0;
						            while ((b = in.read()) != -1)
						            {
						            	fop.write(b); 
						                fop.flush();
						                
						            }
		 		        	  
						            String update = "UPDATE [AccountsInfo].[Docs] SET FILEPATH='" + filename + "' WHERE DOCID=" + fileid;
			 			            Statement statement3 = connection.createStatement();	
			 			            statement3.executeUpdate(update);	
			 			            fop.close();
			 			            
			 		        	 }

			 			         }		         
			 		         
			 		         
			 			} catch (SQLException e) {
			 				
			 				e.printStackTrace();
			 			} catch (FileNotFoundException e) {
							// TODO Auto-generated catch block
							e.printStackTrace();
						} catch (IOException e) {
							// TODO Auto-generated catch block
							e.printStackTrace();
						} 

			 	        }).start();
		        	 
		        	 
		         }		         
		         
		         
	        
		} catch (ClassNotFoundException | SQLException | ParserConfigurationException | SAXException | IOException e) {
			
			e.printStackTrace();

		}

	}

}


Java Code Explanation

We will use the Java class to read the BLOB files and store them in a designated folder and then update the database with the folder path. To avoid putting too much stress on the system memory, I read the BLOB files as binary stream rs.getBinaryStream(3).

I could have read the files into memory using rs.getString(3), but that would have quickly chewed up huge memory space, so the first option is the ideal option. I also used ten worker threads to take advantage of parallel processing to reduce processing time.

The parallel processing with multiple threads was made possible by the native XML support with classes available in Java packages javax.xml and org.w3c, and TSQL's FOR XML PATH. I also used TSQL's row function NTILE(10) to divide the number of records to process into ten groups. Then each of those ten groups is assigned to each worker thread in the Java class.

However, depending on the size of BLOB records you want to move to the file system storage and computer memory available to you, you could increase the number of the worker threads by changing the value of the local variable int t = 10; which is currently set to ten. Once the Java class is done, we can then drop the BLOB column Doc as follows:



ALTER TABLE [AccountsInfo].[Docs] DROP COLUMN Doc

You should replace the folder path C:/DB/Documents/ on the local variable String filename = "C:/DB/Documents/" + rs.getString(2); with your own folder path where you want to store the BLOB files in your file system storage.

Do not forget to change the database connection credentials that are held in variables String userName = "DentUSA";, String password = "09!@#0oop2AS";, and String url = "jdbc:sqlserver://localhost;instance=USER-PC/SQLEXPRESS;databaseName=DentAp"; with your own credentials.

To use the Java class above, you will need a Java-capable IDE. I'd recommend Eclipse IDE. You can download it from here Download Eclipse.

There might be concerns with this method of database file storage. Such concern is files going out of sync with database path storage. We can work around this with SQL Server FileStream data type for ensuring synchronization and relational mappings.

So if you are concerned with files going out of sync with filepaths stored in database table, then you should consider using using FileStream.

Instead of storing the BLOBs in a database table, FileStream stores the BLOB in file system storage using the VARBINARY(MAX) data type.

For more on FileStream visit Microsoft Documentation of FILESTREAM (SQL Server)

Emeka Okeke@sqldibia+ Follow
0
Emeka Okeke+ Follow
locationPennsburg, PennsylvaniajoinedDec 11, 2019

More from @sqldibia