0

How to Integrate SSIS with Java

Emeka OkekeDec 31, 2019 | edited Jan 01, 2020 - by @sqldibia

Integrating SSIS with Java or even using Java as a primary ETL tool is not something that is too common or even talked about in the SQL Server business intelligence world.

Rather what you will often find is C# with SSIS. After all C# is Microsoft's flagship object-oriented programming language, and since SSIS is a Microsoft ETL tool, C# is largly used within the tool itself. You can either use it in Data Flow task's script component or in control flow's script task.

However, if you are very proficient with Java like me, and you are also an SSIS ETL developer like me, the thought of using Java in SSIS must have crossed your mind at least once before.

Well it did cross my mind severally. Although I am quite confident with using C# in SSIS. As a matter of fact I have been using C# in SSIS for things like creating excel documents and populating them with transformed data among other things.

However, as a full stack developer who have been developing applications using Java, I am way more proficient with it than I am with C#. This means that I could build even more powerful ETL solutions using SSIS and Java.

So recently I got a task to add extended properties that will briefly describe all columns of all tables in one of our databases. I was also required to group all the columns' extended property definition in one file per table, then add all the table files in one folder.

Surely you don't expect me to do all that manually, especially when I am already a fullstack developer, we are talking about over 200 tables here. So again I thought of doing this with Java and SSIS.

Integrating SSIS with Java

First we define SSIS variables that will be used to accomplish this task as follows:

ssis_variables.PNG

The query variable will be used to get the extended properties definition query for each of the columns of the current table in the foreach loop container.

We will use the str variable to hold the extended property definition query execution result that is returned by the GetEachTblQuery executeSQL task.

The tables variable with the Object datatype is used to hold the full result set that was returned by the GetTables executeSQL task.

Lastly the table variable is used to hold the value of each table name that is returned by the LoopTables foreach loop container.

The Java Integration

In SSIS, we use the execute process control flow task to execute any executable programs with extensions like .exe and .jar.

Here, we will be using the .jar extension which is the smallest unit of Java executable. JAR stands for Java Archive.

The execute process task configuration window has the Executable property under the Process tab. See image below:

ssis_exec_process_task.PNG

The Java JRE/JDK installation comes with a JAR executor program javaw.exe that can execute your JAR files. So we will provide the path

C:\Program Files\Java\jre1.8.0_121\lib\javaw.exe

to the javaw.exe executor program as the value of the Executable property.

Then we will provide the command line arguments that will be used to execute the program below

"-jar \"C:\\createsqlfiles\\CreateSqlFiles.jar\" \"" + @[User::table]+"\" \""+ @[User::str] + "\""

Because the argument includes variables @[User::table] and @[User::str] with values that are returned on each iteration of the SSIS foreach loop, we will use expressions to build it.

So under Expressions, click on the elipse button [...] to open the expressions configuration window. Under properties select Arguments, and under Expressions, click on the elipse button [...] to write the command line arguments above. See image below:

expression_arguments_ssis.PNG

SSIS command line arguments explanation

The -jar argument is used to tell the program that we are executing a JAR file, the next argument is the path to the JAR file which in this case is CreateSqlFiles.jar. The last two arguments will be passed to the main method of the Java code contained in the JAR file.

Java code explanation

The CreateSqlFiles.jar executable file contains just this below Java class:



package files;

import java.io.BufferedWriter;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;


public class CreateFiles {

	public static void main(String[] args) {
	
		
		String file = args[0];
		String s = args[1];
		String folder = "C:/Extended_Properties/";
				
		try (BufferedWriter writer = Files.newBufferedWriter(Paths.get(folder + file + ".sql"))) {
		   

		        	 try {
						writer.write(s);
					} catch (IOException e) {
						// TODO Auto-generated catch block
						e.printStackTrace();
					}
		        	
		  
		} catch (IOException e) {
            BufferedWriter writer = null;
			try {
				writer = Files.newBufferedWriter(Paths.get("C:\\filelog.txt"));
				writer.write(e.getMessage());
				writer.close();
			} catch (IOException e1) {
				// TODO Auto-generated catch block
				e1.printStackTrace();
			}

			

            
		}
		

	}

}


Don't you love automation? I do. This few lines of Java code saved me several hours of work manually creating SQL files.

Every Java program must have an execution entry point or starting place. This is implemented in Java using the main method. The main method takes an array of String as input parameter. This array is used to pass command line input parameters and initialize the Java program.

However, here we will use it to pass each table name and the extended property definition query for that table's columns to the Java class. So the arguments:

+ @[User::table]+"\" \""+ @[User::str] + "\""

is retrieved as follows in the Java code:

String file = args[0]; --> this @[User::table]
String s = args[1]; --> this is @[User::str]

As you can see, you can pass as many arguments as your computer memory can allow. With the introduction of Java Stream interface which allow both sequential and parallel stream along with its filter and map reduce features, we could do even a whole lot more with SSIS and Java in providing cutting edge ETL and ELT solutions.

We could also return an output from the Java code to SSIS for more processing or further data transformation. We could convert our Java object to JSON string in Java using Google's Gson library or some other method, and then print it to the output stream using System.out.print(JSONSTRING) or System.out.println((JSONSTRING).

We could then consume it in SSIS using any method including TSQL's OPENJSON(@JSON).

Then We can retrieve this value using the StandardOutputVariable property of the SSIS execute process task.

SSIS supports the three default streams of standard error, standard input, and standard output. Instead of passing values to the Java code using the main method's parameter, we could also do so by using the property StandardInputVariable.

Then for exception/error handling, we could return exceptions that are caught in the Java code to SSIS using the StandardErrorVariable. Notice how these three default streams have the word Variable in them? That's because you will need to assign them to variables by selecting from the drop down list of variables you created for this purpose. See image below:

ssis_stdstream.PNG

Also see a sample Java code I wrote below to further demonstrate how we can use the standard input, output, and error streams in our Java code:



package SSIS;


public class testStdstreams {

	public static void main(String[] args) {
	

		    StringBuilder str = new StringBuilder();
		    try {
		        int c = 0;
		        while ((c = System.in.read()) != -1) {
		        	str.append((char) c);
		        }
		    
		    } catch(Exception e){
		    	
		    	System.err.append(e.getMessage());
		    }
		    
		    System.out.println("input is: " + str.toString());
		    
	}

}

See image below to see the values of SSIS standard output and input variables when I executed the SSIS package with Java code above:

ssis_standardstream.PNG

As you can see,there is a whole lot of awesome things that can be accomplished with Java and SSIS as per ETL or ELT.

Emeka Okeke@sqldibia+ Follow
0
Emeka Okeke+ Follow
locationPennsburg, PennsylvaniajoinedDec 11, 2019

More from @sqldibia