1. CodersCay »
  2. SSIS »
  3. Top 20 SSIS Interview Questions with detailed answers

Published On: 6/29/2022

CodersCay Logo

Top 20 SSIS Interview Questions with detailed answers

SSIS Interview Questions and Answers

1. What is delay validation in SSIS Package?

SSIS package validates the tasks, connections and container properties value in design mode and run time. The default property value is false which means to validate all the object properties while opening and developing the SSIS package.

If we change the property value as true then the package will not force to validate the package objects at opening/design mode and this will be validated at run time. This can be configurable in package and object/task/container/connections level.

2. What is expression and what is the use in SSIS package?

Expression is a code which can be a combination of variables or parameters or functions and it returns values. The main use of this, dynamically update the properties value at run time.

For instance, we have defined the OLEDB connection in connection managers and we need to change the server name, user name, password based on the environments (DEV, QA, PROD) without changing the package. We can assign the user name, server name property values with parameters in connection manager properties and which can be passed from environment variables.

3. What is the difference between Breakpoint and checkpoint in SSIS Package?

Breakpoint - Breakpoint is used to debug the SSIS package while executing the package in visual studio.

Checkpoint - Checkpoint is used to restart the SSIS package from the point of failure using checkpoint file.

4. Can we use temporary table across the tasks in SSIS package?

Yes. We can use the temporary table across the tasks and we need to change the Retain same connection property value as true in connection manager properties.

Let say example, If you have created a temporary table in one Execute SQL task and you are using the same connection in other tasks. The temporary table will be available on across the tasks and this will be considered as same transaction.

5. What is checkpoint in SSIS Package?

Checkpoint is used to restart the SSIS package from the point of failure using checkpoint file.

For instance, we have five data flow tasks in SSIS package and the package execution is failed while running third data flow task due to some errors. We need to start the package from third data flow task instead of running from start position. We can choose checkpoint option in SSIS package to perform this scenario which is available in control flow properties.
Checkpoints in SSIS package

CheckpointFileName - First of all, we have to define the checkpoint file name and path to capture the package execution logs and completed tasks.
CheckpointUsage - There are three different values in checkpoint usage property and we have to choose it appropriately which determines the checkpoint file use.
  • Nerver - This option will not consider the checkpoint file even if exist.
  • If exists - This option will consider the checkpoint file if exist in the specified path.
  • Always - This option always consider the checkpoint file.
SaveCheckpoints - There are two values in the drop down like True or False which is determining the checkpoint implementation in the package.
Note: We need to change the FailPackageOnFailure property as true to use checkpoint behavior in SSIS package.

6. What is breakpoint and where can be usable in SSIS package?

Breakpoint is typically used to debug the SSIS package in run time and this can be used on control flow and event handling tasks.

As shown in picture above, we have ten events condition in breakpoints and we can choose multiple events at the same time. Also, we have four hit count type and we have to assign the hit count value appropriately to perform the breakpoint condition.

We can choose this option to debug the C#/VB.NET code inside the script task.


7. Difference between for loop and for each loop container?

For Loop Container repeats the process up to the iteration level.
For each loop container is same like as for loop container and the loop implemented by using For each Enumerator. There are seven enumerators are available in for each loop container collection.

Foreach File Enumerator

Foreach File enumerator to enumerate files in a folder. The enumerator can traverse sub folders.
For example, you can read all the files that have the *.log file name extension in the Windows folder and its sub folders.

Foreach Item Enumerator

Foreach Item enumerator to enumerate items that are collections.
Let say example, you can enumerate the names of executable and working directories that an Execute Process task uses.

Foreach ADO Enumerator

Foreach ADO enumerator to enumerate rows in recordset. We can get the rows in an ADO recordset.
The Recordset destination saves data in memory in a recordset that is stored in a package variable of Object data type. You typically use a Foreach Loop container with the Foreach ADO enumerator to process one row of the recordset at a time. The variable specified for the Foreach ADO enumerator must be of Object data type.

Foreach ADO.NET Schema Rowset Enumerator

Foreach ADO.NET Schema Rowset enumerator to enumerate the schema information from the data source.
For example, you can enumerate and get a list of the tables from the database.

Foreach From Variable Enumerator

Foreach From Variable enumerator to enumerate the enumerable object that a specified variable contains. The enumerable object can be an array, an ADO.NET data table, an Integration Services enumerator, and so on.
For example, you can enumerate the values of an array that contains the name of servers.

Foreach Nodelist Enumerator

Foreach Nodelist enumerator to enumerate the result set of an XML Path Language (XPath) expression.
For example, this expression enumerates and gets a list of all the authors in the classical period: /authors/author[@period='classical']

Foreach SMO Enumerator

Foreach SMO enumerator to enumerate SQL Server Management Objects (SMO) objects.
For example, you can enumerate and get a list of the views in a SQL Server database.

8. What are the ways to deploy the SSIS package into SQL server?

There are mainly two ways to deploy the SSIS package on SQL server.

1. Deploy using Visual Studio

Once we have completed the SSIS development in Visual studio, right click on the project name and choose the deploy option which will be deployed on SSIS DB under the Integration Services Catalogs folder.

Deploying SSIS package using Visual Studio

Before going to do this step, we need to create the catalog on Integration Services Catalogs folder which will create a SSISDB database.

Creating Catalog in Integration Services Catalogs

2. Deploy using SQL Server Management Studio

An another way, once we have created the SSIS catalog under the Integration Services Catalogs then we can deploy the SSIS package as shown picture below.

Deploying the SSIS package using SSMS

This option will leads to Integration deployment wizard window, there we can choose Project deployment file and Integration Services catalog options.

If we choose Project deployment file option, we need to browse the ISPAC file from windows drive.

If we choose Integration Services Catalog option, we can choose the Integration server and package path to deploy it.

9. What are the options to execute the SSIS package?

There are many ways to execute the SSIS package in SQL Server.

1. Execute using Visual Studio
Once we have completed the SSIS package development in Visual Studio, we can execute the package by using start button or pressing F5 button in keyboard.

2. Execute using Command Prompt (DTEXEC)
DTEXEC is a command prompt utility used to execute the SSIS package and we can pass the values for connections, properties and variables/parameters.

SYNTAX: DTEXEC /FILE "PackagePath" /CONFIGFILE "ConfigurationFilePath" /SET "\Package.Variables[VariableName].Properties[Value];VariableValue"

3. Execute using Execute Package Utility (DTEXECUI)
The Execute Package Utility is a UI (User Interface) utility which is similar behavior of DTEXEC utility. This can be opened from SSMS or by typing dtexecui at command prompt.

4. Execute using SQL Server Agent Job
We can create a job in SQL Server Agent there we can assign the SSIS package source as shown in picture below. We can execute the SSIS package by manual or schedule basis.
Execute SSIS package using SQL Server Agent Job

5. Execute using Integration Service Catalog
An another way to execute the SSIS package using Integration Service Catalog. Once we have deployed the SSIS package on SSISDB there we have an option to execute the package as shown in picture below.
Execute SSIS package using Integration Service Catalog or SSISDB

10. How to trace the execution logs in SSIS package?

SSIS exclusively gives logging method to trace the events logs in SSIS package. There are five ways to capture the logs while executing the package by event basis.

SQL Server

The SQL Server log provider, which writes log entries to the sysssislog table in a SQL Server database.

Windows Event Log

The Windows Event log provider, which writes entries to the Application log in the Windows Event log on the local computer.

Text Files

The Text File log provider, which writes log entries to ASCII text files in a comma-separated value (CSV) format. The default file name extension for this provider is .log.

SQL Server profiler

The SQL Server Profiler log provider, which writes traces that you can view using SQL Server Profiler. The default file name extension for this provider is .trc.
You cannot use the SQL Server Profiler log provider in a package that is running in 64-bit mode.

XML Files

The XML File log provider, which writes log files to an XML file. The default file name extension for this provider is .xml.


11. What are the scenarios do you prefer for script task?

SSIS provides many transformations to perform the data integration between source and destination. But in some cases we need customization to handle the data integration between source and destination. The script task supports .NET (C# and VB.NET) code to handle these kind of scenario. We can choose either one language in scrip task to develop our custom code for the data integration.

12. What are the ways to extract the SSIS package from deployed server?

We can extract the SSIS package from SQL Server after deployed using SQL Server Management Studio (SSMS).


13. What are the steps to improve the performance in SSIS package?

There are many ways to improve the SSIS package performance.


14. How to check the SSIS package execution status after deployed?

There are two ways to check the SSIS package execution status using SSMS after deployed.

1. Generate Reports
As shown in picture as below, We just right click on the deployed SSIS package and we can generate the reports of package execution.
SSIS package execution reports

2. Using SQL Query in SSIS DB

USE SSISDB
GO

SELECT
s.Execution_id
,s.Start_time
,s.End_time
,ds.Sum_rows_sent
,DATEDIFF(ms,s.start_time,s.end_time) AS 'Total(ms)'
,t.Execution_duration
,s.Folder_name
,s.Project_name
,s.Package_name
,p.description AS Package_desc
,r.Environment_name
,CASE s.status
WHEN 1 THEN 'Created'
WHEN 2 THEN 'Running'
WHEN 3 THEN 'canceled'
WHEN 4 THEN 'Failed'
WHEN 5 THEN 'Pending'
WHEN 6 THEN 'Ended unexpectedly'
WHEN 7 THEN 'Succeeded'
WHEN 8 THEN 'Stoping'
WHEN 9 THEN 'Completed'
END AS Status
,CASE
WHEN o.operation_type IN (2,3) THEN 'SQL Agent job'
WHEN o.operation_type IN (200) THEN 'Execute'
WHEN o.operation_type IN (301) THEN 'Validate'
ELSE 'Other'
END AS OperationMothed
,s.total_physical_memory_kb/1024 AS 'TotalMemory(MB)'
,s.available_physical_memory_kb/1024 AS 'AvailableMemory(MB)'
,s.total_page_file_kb/1024 AS 'TotalPage(MB)'
,s.available_page_file_kb/1024 AS 'AvailablePage(MB)'
FROM catalog.executions s
INNER JOIN catalog.operations o
ON s.execution_id = o.operation_id
INNER JOIN catalog.packages p
ON p.name = s.package_name
INNER JOIN catalog.projects j
ON s.project_name = j.name
LEFT JOIN catalog.environment_references r
ON j.project_id = r.project_id
LEFT JOIN
(
SELECT execution_id,
SUM(execution_duration) AS execution_duration
FROM catalog.executable_statistics
GROUP BY execution_id
) t
ON s.execution_id = t.execution_id
LEFT JOIN
(
SELECT execution_id,SUM(rows_sent) AS sum_rows_sent
FROM catalog.execution_data_statistics
GROUP BY execution_id
) ds
ON ds.execution_id = t.execution_id

ORDER BY s.execution_id


15. What is the difference between Merge, Merge Join and Union All?

Merge:

We can combine two datasets into single dataset and both should be sorted before the merge.
The rows will be merged from both dataset based on the key columns. Merge is similar to UNION ALL transformation, but this should accept only two inputs.

Merge Join:

Merge Join transformation similar to JOINs in T-SQL statement. We can choose different join types such as INNER, LEFT and FULL outer Join.

Union All:

We can combine multiple dataset into single dataset without sorting.

Differences:

Merge and Merge Join transformations will accept two dataset as input whereas Union All accept more than two dataset as input.
Merge and Merge Join transformations requires both dataset should be sorted, but Union All accept without sorting.

Similarity:

Merge, Merge Join and Union All transformations are considered as Semi-Blocking.


16. What is SCD (Slowly Changing Dimension) and How many types are available?

Slowly Changing Dimension (SCD) is mainly used in data warehouse to maintaining the data changes from OLTP to data warehouse database.

The data warehouse database consist of two objects such as Dimension and Fact to build the cube for analyzing the statistics. These two objects are actually coming from OLTP (OnLine Transaction Processing) to OLAP (OnLine Analytical Processing), Dimension holds textual data and Fact will have measures. There are some types to trace the data changes in Dimension from OLTP to OLAP which is called as Slowly Changing Dimension (SCD).

SCD Type 0: Retaining Original

The Dimension attributes never change and holds static data such as Date of Birth, SSN number. This type mostly preferred for Date dimension attributes.

SCD Type 1: Overwrite on existing

This type overwrites the old data with new data and this will not track the historical data in the dimension.

For Instance, if we take customer address, the address will be changed based on the customer relocation.

Old Data
Cust_IDCustomer NameAddress LineCityZip CodeState
1001BaskaranR122, East West StreetChennai600001TamilNadu

New Data
Cust_IDCustomer NameAddress LineCityZip CodeState
1001BaskaranR521, North South StreetChennai600002TamilNadu

In the above example, the customer relocates from one place to another place and the record will be overwritten.

SCD Type 2: Adding new Records

This type will track the historical data by creating new entry on the dimension and identifying the current data using flag.

Cust_IDCustomer NameAddress LineCityZip CodeStateActive
1001BaskaranR122, East West StreetChennai600001TamilNaduN
1001BaskaranR521, North South StreetChennai600002TamilNaduY

In the above example, the new data will be inserted as new record and the active flag can be identified as current or active data.

SCD Type 3: Adding new Fields

This type will be tracked the changes in new attribute and this is preserving limited history.

Cust_IDCustomer NamePermanent AddressTemporary AddressCityZip CodeState
1001BaskaranR122, East West Street521, North South StreetChennai600001TamilNadu

Based on the above scenario, the modified data will be tracked in separate column which would track only the most recent historical changes.

SCD Type 4: Adding new Table (History Table)

This type of SCD will track the changes on separate table and usually called as history table.

Customer Table

Cust_IDCustomer NameAddress LineCityZip CodeState
1001BaskaranR122, East West StreetChennai600001TamilNadu

Customer History Table

Cust_IDCustomer NameAddress LineCityZip CodeStateCreated Date
1001BaskaranR122, East West StreetChennai600001TamilNadu26-Jun-2018
1001BaskaranR521, North South StreetChennai600001TamilNadu21-Jun-2018

Based on the above example, the historical data will be tracked on separate table which holds recent and previous data identified by date field.


17. What are the ways to perform the delta load in SSIS Package?

There are three different ways to perform the delta load in SSIS package.
  • Lookup Transformation
  • Merge Join
  • Slowly Changing Dimension

Lookup Transformation is a non blocking transformation whereas Merge Join is a semi blocking transformation.

18. Difference between Integration Security Info and Persist Security Info in Connection Manager?

Integrated Security

The Integrated security also called as Trusted Connection.

The Integrated security set to true, the current Windows account credentials are used for authentication.
The Integrated security set to false, we need to specify the User ID and password in the connection string.

Persist Security Info

This is a precautionary security measure in connection string and this can be changeable as true or false.

The Persist Security Info set to true, windows will remember the password specified in the connection string.
The Persist Security Info set to false, the security-sensitive information such as password is not returned as part of the connection if the connection is open or has ever been in an open state.


Scenario Based Interview Questions

19. We have five data flow tasks in a package to loading the flat files data to SQL tables and each data flow task is loading each table. The package execution got interrupted to loading in third data flow task and the task had moved some data from flat file to SQL table. The package was configured as checkpoint file and now, if we are restarting the package where will it start from?


Answer: Of Course, it will start from the beginning of third data flow task. Because the checkpoint file keeps the information about what are the task/content id completed and it will not trace the data transfer information. So the execution will compare the list of task/content id between package and checkpoint file and it will from the sequence of task/content id which is not available in the checkpoint file.

20. We have configured checkpoint option in a package and deployed the same on SQL server. We got execution error when we are executing the SSIS package from deployed server. Now, we have to start the package from starting itself without changing the SSIS package. Is this possible and How can we do it?


Answer: Yes. This is possible in simple step that if we delete the checkpoint file from the drive. The package will try to validate by checkpoint file while executing and if the file is not available in the drive, the package will start from beginning step.
Note: This is possible only if the package checkpoint usage property value is IfExists option.

21. You have developed a SSIS package to migrate the data from flat file to SQL server table and you have load the data from source to destination. And now you have to ensure the package whether it was populated all the data from flat file to SQL server table. What are the ways to compare the data between source and destination?


Answer: There are two ways we can compare the data between source and destination.
  • We just import the data from flat file to SQL server temporary table by using Import and Export wizard or BCP or Bulk INSERT method. Then we can compare the data between temporary and actual table by using LEFT JOIN or TABLEDIFF utility.
  • The second option is we can export the data from SQL table after executing the SSIS package. Then we can do the text comparing between source and exported file by using tool.



1 comment: