Implementing a logging framework for SSIS in project deployment model

With SQL server integration services 2012, Microsoft introduced a new way of deploying SSIS packages to the production or the UAT environment directly. This method is called as “Project deployment model” and it has been heavily used in industry, mainly due to the advantages it brings to development teams, QA, and support team. One of the main advantage of project deployment mode is SSIS package logging is enabled by default and valuable information related to package executions is saved inside “Integration Service Catalog”.

If you cannot see the SSIS catalog inside your SQL server database collection, you must create a SSIS Catalog using SSMS. To create SSIS catalog, connect to your SQL server database instance and right click on “Integration Service Catalogs” node. Unless you have not already created a SSIS catalog, you will see “Create Catalog…” menu option is available for you. In my case, since I have already created a SSIS Catalog, that menu option is grayed out.

Untitled1

Once you click on this menu option, it will give you an interface to provide a name to the catalog and once you provide a name as you want,  SQL server will create the “SSIS Catalog” along with a multitude of tables, views, and stored procedures inside it. In my case, I named it as “SSISDB” and you can use any name you feel suitable. Once you created the database, you can go and check what are the available objects inside it.

If you expand the “Tables” node, this is how it should looks like.

SSIS CatelogC

Using MSDN and other blog posts you will be able to get an idea about what these tables and what kind of information is saved insides these tables and what each view return and what each stored procedure do. In this blog post, I will not go through each database object, rather I will create views using available tables to obtain required datasets. Apart from that, I will create two tables, which will be used to store very valuable information,and these information can be used to perform troubleshooting and performance monitoring in SSIS packages.

As the first step of the logging framework, I want to get information about all the projects which are currently deployed inside SSIS Integration Services. Therefore, I will create a view calls “Logging.ProjectInformation” by using “internal.object_versions” and “internal.projects” tables inside SSIS catalog to obtain that information.

 Logging.ProjectInformation

CREATE VIEW [Logging].[ProjectInformation]

AS
       SELECT P.project_id AS ProjectID,
              P.[name] AS ProjectName,
              OV.object_version_lsn AS ProjectVersionId,
              OV.created_by AS DeployedBy,
              OV.created_time AS DeployedTime,
              P.deployed_by_name AS ProjectCreatedBy,
              P.created_time AS ProjectCreatedDate,
              P.last_deployed_time AS LastDeployedTime,
              P.[object_version_lsn] AS LatestVersionId,
              IIF(ov.object_version_lsn = P.[object_version_lsn], 1, 0) AS IsLatestVersion
       FROM internal.object_versions OV
            INNER JOIN internal.projects P ON OV.[object_id] = P.[project_id]
                                              AND OV.object_type = 20;

GO

This view will return

  • the project id
  • the project name,
  • version number of the project
  • the deployed person’s name
  • deployed time
  • the name of the person who created the project
  • the project created time
  • last deployed name
  • what is the latest version available for this project
  • whether this version is the latest version or not.

Remember every time you deploy your SSIS project to SQL server, it will create a new version of your project and that is why you will get multiple rows for one project. Once I got the project information, I need to get all the packages inside this project and for that, I will create another view called “Logging.PackageInformation”.

Logging.PackageInformation

CREATE VIEW [Logging].[PackageInformation]

AS
       SELECT PK.[package_id] AS PackageId,
              PK.[package_guid] AS PackageGuid,
              PK.[name] AS PackageName,
              PK.project_id AS ProjectId,
              p.[name] AS ProjectName,
              pk.project_version_lsn AS ProjectVersionId,
              (CAST(pk.version_major AS VARCHAR(10))+’.’+CAST(pk.version_minor AS VARCHAR(10))+’.’+CAST(pk.version_build AS VARCHAR(10))) AS PackageVersion,
              pk.[description] AS PackageDescription,
              pk.version_comments AS VersionComment,
              IIF(PK.project_version_lsn = P.object_version_lsn, 1, 0) AS IsLatestVersion
       FROM internal.packages PK
            INNER JOIN internal.projects P ON PK.project_id = P.project_id;

GO

This view will return

  • the package id (Server side Id)
  • the package guid
  • the name of the package
  • the id of the project which this package is belong to
  • the name of the project which this package is belong to
  • the project version
  • the package version id (combination of version parts as a string)
  • the package description (the textual description which you can put using SQL server data tools)
  • the package version comment (For deployment round, you can add a version comment to each package.This property can be seen as a property of the package)
  • whether this package version is the latest version

The next thing I want is to get information related to package execution, irrespective of whether package is succeeded or not.

Logging.PackageExecutionDetail

CREATE VIEW [Logging].[PackageExecutionDetail]

AS
       SELECT es.execution_id AS PackageExecutionId,
              p.package_id AS PackageId,
              P.package_guid AS PackageGuid,
              P.[name] AS PackageName,
              p.project_id AS ProjectId,
              es.start_time AS ExecutionStartTime,
              es.end_time AS ExecutionEndTime,
              es.execution_duration AS ExecutionDuration,
              e.executed_as_name AS ExecutedAsUser,
              CASE ES.execution_result
                  WHEN 0
                  THEN ‘Succeeded’
                  WHEN 1
                  THEN ‘Failed’
                  WHEN 2
                  THEN ‘Completed’
                  WHEN 3
                  THEN ‘Canceled’
                  ELSE ‘Unknown Status’
              END AS [Status]
       FROM internal.executable_statistics es
            INNER JOIN internal.executions e ON es.execution_id = e.execution_id
            INNER JOIN internal.executables eb ON eb.executable_id = es.executable_id
            INNER JOIN internal.packages p ON p.package_guid = eb.executable_guid
                                              AND eb.project_id = p.project_id
                                              AND eb.project_version_lsn = p.project_version_lsn
       UNION ALL
       SELECT ex.execution_id AS PackageExecutionId,
              P.package_id AS PackageId,
              P.package_guid AS PackageGuid,
              ex.package_name AS PackageName,
              p.project_id AS ProjectId,
              O.start_time AS ExecutionStartTime,
              O.end_time AS ExecutionEndTime,
              0 AS ExecutionDuration,
              EX.executed_as_name AS ExecutedAsUser,
              CASE O.[status]
                  WHEN 1
                  THEN ‘Created’
                  WHEN 2
                  THEN ‘Running’
                  WHEN 3
                  THEN ‘Canceled’
                  WHEN 4
                  THEN ‘Failed’
                  WHEN 5
                  THEN ‘Pending’
                  WHEN 6
                  THEN ‘Ended Unexpectedly’
                  WHEN 7
                  THEN ‘Succeeded’
                  WHEN 8
                  THEN ‘Stopping’
                  WHEN 9
                  THEN ‘Completed’
                  ELSE ‘Unknown Status’
              END AS [Status]
       FROM [internal].[executions] EX
            INNER JOIN internal.operations O ON ex.execution_id = O.operation_id
            INNER JOIN internal.packages P ON EX.package_name = p.[name]
                                              AND EX.project_lsn = P.project_version_lsn
       WHERE EX.execution_id NOT IN
       (
           SELECT execution_id
           FROM internal.executable_statistics
       );

GO

This view will return following fields.

  • package execution id
  • the id of the package (server side id)
  • the package guid
  • the name of the package
  • the id of the project in which this package belongs to
  • package execution start time
  • package execution end time
  • total duration of the execution
  • the user who executed the package
  • the final result of the execution. Whether it is succeeded, failed, etc..

For a detail analysis of my SSIS packages, I need to know, in the event of execution failure, what is the error message thrown by SSIS and which part of the package caused the package to fail. Therefore, I will create another view, which will provide error message and all the related information.

Logging.PackageExecutionErrorLog

CREATE VIEW [Logging].[PackageExecutionErrorLog]

AS
       SELECT DISTINCT
              O.operation_id AS ExecutionId,
              O.[object_id] AS ProjectId,
              O.[object_name] AS ProjectName,
              P.[package_id] AS PackageId,
              P.[name] AS PackageName,
              E.executable_id AS ExecutableId,
              EM.message_source_id AS ExecutableGuid,
              EM.message_source_name AS ExecutableName,
              CASE OM.message_type
                  WHEN 110
                  THEN ‘Warning’
                  WHEN 120
                  THEN ‘Error’
                  ELSE ‘Unknown’
              END AS MessageType,
              OM.[message] AS [Message],
              message_time
       FROM internal.operation_messages OM
            INNER JOIN internal.operations O ON OM.operation_id = O.operation_id
            INNER JOIN internal.executions ES ON ES.execution_id = OM.operation_id
            LEFT OUTER JOIN internal.event_messages EM ON OM.operation_message_id = EM.event_message_id
            LEFT OUTER JOIN internal.packages P ON p.[name] = es.package_name
                                                   AND P.project_version_lsn = ES.project_lsn
            LEFT OUTER JOIN internal.executables E ON EM.message_source_id = E.executable_guid
                                                      AND p.project_id = E.project_id
                                                      AND e.project_version_lsn = P.project_version_lsn
       WHERE OM.message_type IN(110, 120);

GO

This view will only return data if there is an error or a warning is associated with the package execution. The information you can get from this view is

  • the id of the execution which threw the error message
  • the id of the project which this package belong to
  • the name of the project which this package belong to
  • the id of the package
  • the name of the package
  • the unique id of the package part (Executable) which failed or threw this error message
  • the name of the package part (Executable) which failed or threw this error message
  • the error message type, whether this is an error message or warning message
  • the message
  • the message time

Remember in this view, if the package failed even before it starts to execute, due to validation errors, the executable id and executable name would be null since there is no specific executable caused the package to fail, rather package it self got failed.

All thought these 4 views provide some kind of an insights about SSIS package executions, I need to measure performance and view execution behavioral changes in packages for a detail analysis. Unfortunately, SSIS catalog does not have any information related to affected record counts (in Data Flow Tasks) which is very much required to perform package execution behavior analysis. Therefore, I created a separate table called “Logging.AffectedRowCountLog” inside SSIS Catalog, in which I will save affected row count along with other related information. Remember this table will not be automatically populated by SSIS, rather you will need to insert records within you package at the end of the each Data Flow task. Check the table structure bellow.

EffectedC

This table would contain

  • the package execution id
  • the package guid
  • the record inserted time
  • the guid of the package executable ( In my case the id of the Data Flow task)
  • The operation type ( whether this was a insert operation,update operation or a delete operation)
  • The number of affected rows

Using this information, you can get an idea about whether your package is executing as you expected in production/UAT environment. For more detail analysis, I created a view called “Logging.PackageExecutionPerformance” using this view I can get better insights about package executions.

Logging.PackageExecutionPerformance

CREATE VIEW [Logging].[PackageExecutionPerformance]

AS
       SELECT P.project_id AS ProjectId,
              PP.[name] AS ProjectName,
              P.package_id AS PackageId,
              P.[name] AS PackageName,
              P.package_guid AS PackageGuid,
       (
           SELECT AVG(ES.execution_duration)
           FROM internal.executable_statistics ES
                INNER JOIN internal.executables E ON ES.executable_id = E.executable_id
                INNER JOIN internal.operations O ON O.operation_id = es.execution_id
           WHERE o.created_time BETWEEN DATEADD(DD, -30, GETDATE()) AND GETDATE()
                 AND E.executable_guid = P.package_guid
       ) AS AverageExecutionDuration30Days,
       (
           SELECT AVG(ES.execution_duration)
           FROM internal.executable_statistics ES
                INNER JOIN internal.executables E ON ES.executable_id = E.executable_id
                INNER JOIN internal.operations O ON O.operation_id = es.execution_id
           WHERE o.created_time BETWEEN DATEADD(DD, -60, GETDATE()) AND GETDATE()
                 AND E.executable_guid = P.package_guid
       ) AS AverageExecutionDuration60Days,
       (
           SELECT AVG(ES.execution_duration)
           FROM internal.executable_statistics ES
                INNER JOIN internal.executables E ON ES.executable_id = E.executable_id
                INNER JOIN internal.operations O ON O.operation_id = es.execution_id
           WHERE o.created_time BETWEEN DATEADD(DD, -90, GETDATE()) AND GETDATE()
                 AND E.executable_guid = P.package_guid
       ) AS AverageExecutionDuration90Days,
       (
           SELECT AVG(A.AffectedRowCount)
           FROM Logging.AffectedRowCountLog A
           WHERE A.RecordedTime BETWEEN DATEADD(DD, -30, GETDATE()) AND GETDATE()
                 AND A.PackageGuid = P.[package_guid]
       ) AS AvarageAffectedRowCount30Days,
       (
           SELECT AVG(A.AffectedRowCount)
           FROM Logging.AffectedRowCountLog A
           WHERE A.RecordedTime BETWEEN DATEADD(DD, -60, GETDATE()) AND GETDATE()
                 AND A.PackageGuid = P.[package_guid]
       ) AS AvarageAffectedRowCount60Days,
       (
           SELECT AVG(A.AffectedRowCount)
           FROM Logging.AffectedRowCountLog A
           WHERE A.RecordedTime BETWEEN DATEADD(DD, -90, GETDATE()) AND GETDATE()
                 AND A.PackageGuid = P.[package_guid]
       ) AS AvarageAffectedRowCount90Days
       FROM internal.packages P
            INNER JOIN internal.projects PP ON p.project_id = PP.project_id
                                               AND P.project_version_lsn = PP.object_version_lsn;

GO

This view would return

  • the project id
  • the project name
  • the package id
  • the package name
  • the package guid
  • average execution time for last 30 days executions
  • average execution time for last 60 days executions
  • average execution time for last 90 days executions
  • average affected row count for last 30 days executions
  • average affected row count for last 60 days executions
  • average affected row count for last 90 days executions

By looking at this information you can get a some kind of understanding about executions and its behavioral patterns. For example, there can be scenarios where your package successfully runs but due to source system errors, no records has come to the data warehouse. If you were only monitoring success or the failure of the package, that kind of exceptional behaviors would be difficult to identify.

Finally, I need to know which dataset caused my packages to fail so that I would not require to regenerate production and UAT issues. In my case, I only want to monitor Data Flow tasks and hence I created a table called “Logging.SampleDataLog ”inside SSIS Catalog which save sample set of record every time a Data Flow task behavior unexpectedly. Check the table structure for more detail.

sampleC

In this table, I saved following information,

  • Package execution id
  • Package execution guid (Client side id)
  • Guid of the package
  • Data Flow task guid
  • Data Flow task name
  • Id of the failed task (Inside the Data Flow task)
  • Sample record set (JSON format records)

Inserting records to “Logging.SampleDataLog ” and “Logging.AffectedRowCountLog” is part of the SSIS package development and although it will be burden to the development team, the data inside this tables would provide very important insight about package executions.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s