Very recently Microsoft Azure Data Factory team announced that they have renamed “Data Wrangling Dataflow” into “Power Query” and this made a quite a bit of excitement among BI professionals. While Data Wrangling Dataflow was there for a while as a preview feature in ADF, this renaming it-self made people to wonder whether they can use Azure Data Factory Power Query to replace Power BI Dataflow, specially to overcome certain limitations with Power BI Dataflow such as refresh rate and computation power. However, these two features have different architectures and hence one must needs to understand how these two features work before making any such decisions. In this post I’m trying to cover main differences between these two and hope this will helps you to have a better idea on where you can use each component within a BI solution.

Processing Architecture

When it comes to architecture, both Power Dataflow and ADF Power Query use Power Query Online Mashup Editor to allow users to perform data transformations. Because of this one can misunderstand that both features work similar way. However, ADF converts M query generated by the editor into Spark code and perform data processing at scale. On the other hand, Power BI Dataflow executes M query within a Mashup container process (I believe .NET Process). Which means even though front end looks similar, backend process entirely different to each other.

Processing Architecture Difference

ETL vs ELT

Power BI Dataflow act as a ETL (Extract, Transform and Load) tool within self-service BI space. It allows non- technical users to read data from multiple sources and perform data transformations and finally store in a shared or dedicated Azure Data Lake Gen2 storage account. Since Power BI Dataflow is coupled with Power BI, Power BI allows to analyze data stored inside the data lake easily.

However, since ADF Power Query does not support any on-premises data sources or even most of the cloud-based data sources, it needs to accommodate ELT (Extract, Load and Transform) approach. In other words, you will have to use ADF copy actives to load data to any of the supporting data sources ( check the image below) and then use Power Query to perform transformation on loaded data. Once data is processed you can persist data into a storage account or a cloud data warehouse.

Supported Source System Types for Power Query in ADF

Parallel vs Distributed Processing

As mentioned in the architecture section, BI dataflow uses Mashup Container Process to transform data. In Power BI service, based on the license, one might get shared capacity or dedicated capacity to run these processes parallel within a single server. If you have dedicated capacity, you must be able to scale up the processing power by increasing capacity and the container memory size.

On the other hand, ADF Power Query uses Spark, the well-known distributed processing engine. In this case data is processed using multiple worker nodes and collected inside the master node. When you use Power Query within ADF, it will allow you to select the size of cluster which is used to perform data transformations. If someone needs to scale out, it is a matter of selecting a cluster with higher number of nodes.

Parallel vs Distributed Processing

Matured vs Preview

Power BI dataflow has been there for a while and it has been used by many people in production. Apart from that it allows to perform lot of data transformations and supports a lot of connectors, both on premises and cloud.

However, ADF Power Query or Wrangling Dataflow has been in preview mode for a while now. I don’t believe that anyone would or should use Preview feature for in production. Additionally, as of now, Power Query in ADF only supports limited data transformation functions. Although you see that all the transformations are available in UI, when you save a Power Query, it will let you know most of the transformations are not supported as of now. I hope this will change when the go to production. Apart from that, again as of now, Power Query only supports limited number of data source systems.

Feature Comparison

FeaturePower BI DataflowADF Power Query
Data SourceOn-Premise and Most Cloud based Data SourcesLimited, Few Azure data sources
DestinationADLS Gen2, Shared or DedicatedCloud Data warehouses, Azure Storage Accounts
Data LoadSmall to Medium Data VolumeMedium to Large Data Volume
M Query Transformations SupportExtensiveLimited
Power BI SupportYesNo ( Power BI still can connect to a sink using a connector)
Maximum data Refresh Rate8 in Shared/ 48 in DedicatedNo limit
Linked EntitySupportNot Support
Common Data Model (CDM)SupportNot Support
CostCheap( Comes free with Power BI Licensing)Expensive (Depending on the cluster size and the execution time)

Conclusion

Azure Data Factory already have Mapping Dataflow to process data using Spark cluster and ADF team has been focusing more to enhancing it. However, they seems to have interest about allowing users to use Power Query for data processing. I believe from that Power Query experience, ADF team is trying to bring business users ( citizen data integrators) to Azure Data Factory platform. These business users who are already using Power BI Dataflow must know that ADF Power Query is not the same as what they have been using. ADF Power Query allows business users to process data at scale without writing any code using same UI they have been using within Power BI. In my personal experience, Power BI dataflow can not be replaced by Power Query in ADF even if these two have some similarities.(as of now)

Thank you for reading and stay safe! Cheers!

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s