Why you shouldn’t use Auto Resolve Integration Runtime in Azure Data Factory or Synapse?

Integration Runtime (IR) is one of the key concepts in Azure Data Factory. Any activity runs within ADF is associated with an IR and it acts as the compute engine for the activity run. There are 3 different types of integration runtimes. Self-Hosted IR, Azure IR and SSIS IR. Purpose of this post is to not cover on what are these integration runtimes. If you want to learn about these different IR types, you can refer to Microsoft documentation from below link

More about ADF IR concept :https://docs.microsoft.com/en-us/azure/data-factory/concepts-integration-runtime

If someone wants to work with cloud data sources or cloud related activities, default Integration Runtime type is Azure Integration Runtime. When an ADF instance is created, it comes with an Azure IR called AutoResolveIntegrationRuntime. This IR cannot be deleted or disabled. What I have noticed that this default Auto Resolve IR is been used in more often than not in ADF projects; sometimes without knowing it’s impact and how it works.

When it comes to default IR, in case of an activity run, Azure Data Factory decides how much compute power is required within a selected data center. Then it will perform data movement and processing operations using the available compute power within the selected IR location. If you want to know how ADF decides which data center to be used for the IR, below document provides more information on that.

Azure IR Location on Auto IR: https://docs.microsoft.com/en-us/azure/data-factory/concepts-integration-runtime#azure-ir-location

Personally, I don’t recommend using Auto Resolve IR in an enterprise ADF project and this post is all about my reasons behind that.

Moving data to different regions

As I mentioned before, when Auto Resolve IR is been used, ADF decides which data center to be used base on both sink and ADF instance locations. For an example, if data source is in US East data center and Sink is in Australia East, Azure IR will be in Australia East. In that case data is obviously moved from US region to Australia region. Nevertheless, it is not guaranteed that it will always use Sink location as the Azure IR location when copying data. If for some reason, ADF can’t detect the sink location, it will use ADF location as Azure IR. In such case, data might move to different region before it comes to sink location. This unpredictable behaviour would create data security concerns. Especially if you have a security requirement on geographical location of the data, Auto Resolve IR should not be used in your ADF implementation.

On the other hand, this could leads to performance issues as moving data from one region to another depends on network traffic between the two data centers.

Queue Time

When Azure Auto Resolve IR is used in an activity run such as copy activity, ADF assigns compute power to the activity available in the selected data center. In almost all the most cases, this would be by picking resources from a existing pool of resources. However, when you execute multiple activities parallelly or sequentially, there is no guarantee that all the activities would use the same compute resource. In that case, for each activity, it has to request compute resources from ADF. That time is indicated as Queue time in copy activity as indicated below. If you have 100 activities, each activity will have to queue for resources until ADF assign resources and this will a significant time duration at the end.

Unable to reuse existing cluster

Time to Live is a important concept in Azure IR. Using Time to Live (TTL) configuration, you can ask ADF to keep Integration Runtime cluster up and running for a defined duration. In that case, if another data flow or copy activity needs an IR, same IR resource is used rather than asking for a new resources from Azure. That would make queue time significantly low. This setting cannot be used in Azure Auto Resolve IR. In other words, current IR nodes cannot be used in any subsequent activity runs. The quick re-use and TTL feature can only be used in a dedicated Azure Integration runtime.

Performance

In ADF Dataflows, performance is mainly depends on available resources within the IR. When you create dedicated Azure IR, you have option to select the type of Compute, General or Memory Optimised. If you are processing large data column, you can always pick memory optimised compute type. Apart from that, you can select the size of the IR cluster, from 4 core to 256 cores based on the complexity of the data processing and data volume.

However, if you use Auto Resolve IR, You will not have that flexibility of defining the compute type and size of the cluster. It would always be 4 core, general purpose cluster. While this cater for processing small datasets, for large and complex data processing, this would not be sufficient.

Managed Private Endpoint

Managed Private Endpoint (MPE) is an important security feature in ADF. Rather than opening data sources to Microsoft public network, MPE allows to create secure connections between Integration Runtime resources and data sources by creating Private links. Enabling Managed VNet for Auto Resolve IR can only be done when the ADF instance is created. Once an ADF is created, you cannot enable/disable this setting for the IR. Therefore, the best options is to create dedicated Azure IR with VNet enabled based on the requirements.

Conclusion

Using Azure Auto Resolve IR might be the easy option in an ADF implementation. However, in my opinion, it should not be used within an enterprise solution due to all the reasons I mentioned in this post. Since Auto Resolve IR option can not be disabled or deleted, it’s bit hard to enforce to not to use it when multiple developers working in same ADF instance.

Thank you for reading and stay safe!

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s