In data warehousing, capturing changes in source data system is one of the fundamental requirements, and something we always come across. In most scenarios, this is accomplished using a flag column or a date-time stamp column, which indicates the last changed time.However, due to some designing problems , some data sources might not contain straight forward ways ( a flag or an indicator) to identify which records and which columns have changed during certain period. Since you can not change existing source system to any new columns , you might end up in scenarios where you have to perform full loading rather than performing incremental loading, which would take more time and more resources. However, if your source system is based on MS SQL Server, and version is not later than 2008, you can use the features called CDC (Change Data Capture) or CT ( Change Tracking) to identify changed data or changed records.
In this post, I will only discuss about CDC, and hopefully, I will discuss about CT in another post in near future. According to Microsoft, SQL Server read transaction log and keep track of Insert, Update and Delete operations performed against a table if CDC is enabled for that table by a user. This is how CDC works as per the MSDN.
How to enable CDC? To enable CDC for a database , you have to execute “sys.sp_cdc_enable_db” stored procedure against that database. For example , if I want to enable CDC for “CDCDemo” database , I would execute following script.
–Enable CDC for database
Next step is to enable CDC for a particular table. In my case, I am going to enable CDC for dbo.SalesOrderDetail table and I am going to name my CDC instance as “SalesOrderDetailCaptureInstance” . Here is the script I ran.
@source_schema = N’dbo’,
@source_name = N’SalesOrderDetail’,
@supports_net_changes = 1,
@role_name = NULL
Once this statement is executed, SQL server will automatically create tables to store all the changes taking place to data inside dbo.SalesOrderDetail table.