Reducing No of Meta Data Queries send from Power BI to Synapse

Last week I faced an interesting yet very challenging problem. One of clients I was working on was facing a serious problem with Power BI and I have been asked to provide some assistance to troubleshoot the problem. The client is using Azure Synapse as Data Warehousing solution and Power BI as reporting and analytical solution. However, due to the data volume and complexity of the processing in Synapse, DWU300 was more than enough for them for a quite a long period. However, all the sudden business users and Data Engineering team started to complain that they can’t connect to Synapse frequently and Synapse rejects incoming connections to the data warehouse as it already exceeds 512 concurrent open connections (sessions) limit. Knowing that they don’t have that much of users and complex reports, client wanted to know what is happening with Synapse.

Luckily for me, Client already knew which report generates so many sessions within Synapse. This dataset only had 42 queries, 40 coming from Synapse and 2 coming from SharePoint. First thing I did was investigate what happens in Synapse when that dataset get refreshed. I used below SQL query to identify no of requests and sessions generated by Power BI.

select r.* ,s.*
 from [sys].[dm_pdw_exec_requests] r
inner join sys.dm_pdw_exec_sessions s on s.session_id=r.session_id
where s.[status] in ('Active', 'Idle') 
and s.[app_name] like '%Mashup Engine%'

I was surprised with the outcome of the query. There were 625 sessions created and 1250 SQL requests were coming from Power BI in just one refresh. Client was already in DWU 1000 to just to get 1024 sessions; In any other source system, this would not have an issue. But in Synapse, this is a huge problem.

While there were so many problems like No Query folding, loading reference tables to the model and complex M query transformations, What surprised me most was number of meta data related queries generated from Power BI. Out of all 1250 queries, 625 queries was “Use [DataWarehouse]” queries. From rest of 625 queries, only 75 queries were actually data queries; rest was meta data related queries generated by Power BI. Even the 75 queries were due to duplicate queries generated by Power BI for reference queries which I knew I could fix by optimising the data model. However, when 70% of sessions were created by metadata queries, I was pulling my hair thinking what to do for those sessions/ queries. In other words, I knew that Power BI should only create 75 sessions for 75 queries it generate. But it creates 625 sessions, just to accommodate these meta data queries 😦

Let’s replicate this problem by taking one table as an example. Following are the queries generated by Power BI when loading one table. In this case it creates 14 queries and 7 sessions. Now if this table is referenced inside another query, these set of queries are re-created again from Power BI. Which means 14 sessions. Just to load one simple table. I was really frustrated with Power BI thinking why can’t they reuse the meta data coming from one table, why do they have to do it for all the tables from same data source. Then again, I’m not here to tell Power BI team to do what they should be doing.

USE [Contoso DW]
select @@version + ‘ EngineEditon: ‘ + cast(SERVERPROPERTY(‘EngineEdition’) as varchar(4))
USE [Contoso DW]
select t.[TABLE_CATALOG], t.[TABLE_SCHEMA], t.[TABLE_NAME], t.[TABLE_TYPE], tv.create_date [CREATED_DATE], tv.modify_date [MODIFIED_DATE], cast(e.value as nvarchar(max)) [DESCRIPTION] from [INFORMATION_SCHEMA].[TABLES] t join sys.schemas s on s.name = t.[TABLE_SCHEMA] join sys.objects tv on tv.name = t.[TABLE_NAME] and tv.schema_id = s.schema_id and tv.parent_object_id = 0 left outer join sys.extended_properties e on tv.object_id = e.major_id and e.minor_id = 0 and e.class = 1 and e.name = ‘MS_Description’
USE [Contoso DW]
select r.[ROUTINE_SCHEMA], r.[ROUTINE_NAME], r.[ROUTINE_TYPE], p.create_date [CREATED_DATE], p.modify_date [MODIFIED_DATE], cast(e.value as nvarchar(max)) [DESCRIPTION] from [INFORMATION_SCHEMA].[ROUTINES] r join sys.schemas s on s.name = r.[ROUTINE_SCHEMA] join sys.objects p on p.name = r.[ROUTINE_NAME] and p.schema_id = s.schema_id and p.parent_object_id = 0 left outer join sys.extended_properties e on p.object_id = e.major_id and e.minor_id = 0 and e.class = 1 and e.name = ‘MS_Description’
USE [Contoso DW]
select s.name [TABLE_SCHEMA], o.name [TABLE_NAME], c.name [COLUMN_NAME], cast(c.column_id as bigint) [ORDINAL_POSITION], c.is_nullable [IS_NULLABLE], case when (t.is_user_defined = 0 and t.name is not null) then t.name when (c.system_type_id = 240 or t.name is null) then ‘udt’ else t_system.name end [DATA_TYPE], case when (c.system_type_id in (59, 62)) then 2 when (c.system_type_id in (48, 52, 56, 60, 104, 106, 108, 122, 127)) then 10 else null end [NUMERIC_PRECISION_RADIX], c.precision [NUMERIC_PRECISION], case when (c.system_type_id in (59, 62)) then null else c.scale end [NUMERIC_SCALE], case when (c.system_type_id in (40, 41, 42, 43, 58, 61)) then c.scale else null end [DATETIME_PRECISION], case when (c.system_type_id in (231, 239)) then floor(c.max_length / 2) when (c.system_type_id in (165, 167, 173, 175)) then c.max_length else null end [CHARACTER_MAXIMUM_LENGTH], cast(e.value as nvarchar(max)) [DESCRIPTION], d.definition [COLUMN_DEFAULT], cc.definition [COLUMN_EXPRESSION], case when c.is_identity = 1 or c.is_computed = 1 or t.system_type_id = 189 then 0 else 1 end [IS_WRITABLE] from sys.objects o join sys.schemas s on s.schema_id = o.schema_id join sys.columns c on o.object_id = c.object_id left join sys.types t on c.user_type_id = t.user_type_id left join sys.types t_system on t.system_type_id = t_system.user_type_id left join sys.default_constraints d on d.object_id = c.default_object_id left join sys.computed_columns cc on c.object_id = cc.object_id and c.column_id = cc.column_id left join sys.extended_properties e on o.object_id = e.major_id and c.column_id = e.minor_id and e.class = 1 and e.name = ‘MS_Description’ where 1=1;
USE [Contoso DW]
select convert(nvarchar, fk.object_id) [FK_NAME], cast(f.constraint_column_id as bigint) [ORDINAL], s1.name [TABLE_SCHEMA_1], o1.name [TABLE_NAME_1], c1.name [PK_COLUMN_NAME_1], s2.name [TABLE_SCHEMA_2], o2.name [TABLE_NAME_2], c2.name [PK_COLUMN_NAME_2], f.constraint_object_id, f.constraint_column_id from sys.foreign_key_columns f join sys.foreign_keys fk on f.constraint_object_id = fk.object_id join sys.objects o1 on o1.object_id = f.parent_object_id join sys.schemas s1 on s1.schema_id = o1.schema_id join sys.objects o2 on o2.object_id = f.referenced_object_id join sys.schemas s2 on s2.schema_id = o2.schema_id join sys.columns c1 on c1.object_id = o1.object_id and c1.column_id = f.parent_column_id join sys.columns c2 on c2.object_id = o2.object_id and c2.column_id = f.referenced_column_id where 1=1 union select convert(nvarchar, fk.object_id) [FK_NAME], cast(f.constraint_column_id as bigint) [ORDINAL], s1.name [TABLE_SCHEMA_1], o1.name [TABLE_NAME_1], c1.name [PK_COLUMN_NAME_1], s2.name [TABLE_SCHEMA_2], o2.name [TABLE_NAME_2], c2.name [PK_COLUMN_NAME_2], f.constraint_object_id, f.constraint_column_id from sys.foreign_key_columns f join sys.foreign_keys fk on f.constraint_object_id = fk.object_id join sys.objects o1 on o1.object_id = f.parent_object_id join sys.schemas s1 on s1.schema_id = o1.schema_id join sys.objects o2 on o2.object_id = f.referenced_object_id join sys.schemas s2 on s2.schema_id = o2.schema_id join sys.columns c1 on c1.object_id = o1.object_id and c1.column_id = f.parent_column_id join sys.columns c2 on c2.object_id = o2.object_id and c2.column_id = f.referenced_column_id where 1=1 order by f.constraint_object_id, f.constraint_column_id;
USE [Contoso DW]
select s.name [TABLE_SCHEMA], o.name [TABLE_NAME], i.name [INDEX_NAME], s.name [TABLE_SCHEMA], o.name [TABLE_NAME], cc.name [COLUMN_NAME], cast(ic.key_ordinal as bigint) [ORDINAL_POSITION], i.is_primary_key [PRIMARY_KEY] from sys.objects o join sys.schemas s on s.schema_id = o.schema_id join sys.indexes as i on i.object_id = o.object_id join sys.index_columns as ic on ic.object_id = i.object_id and ic.index_id = i.index_id join sys.columns as cc on ic.column_id = cc.column_id and ic.object_id = cc.object_id where (i.is_primary_key = 1 or i.is_unique_constraint = 1 or i.is_unique = 1) and o.type in (‘U’, ‘V’) and ic.key_ordinal <> 0 and 1=1 order by i.name, s.name, o.name;
USE [Contoso DW]
select [$Table].[AccountKey] as [AccountKey], [$Table].[ParentAccountKey] as [ParentAccountKey], [$Table].[AccountLabel] as [AccountLabel], [$Table].[AccountName] as [AccountName], [$Table].[AccountDescription] as [AccountDescription], [$Table].[AccountType] as [AccountType], [$Table].[Operator] as [Operator], [$Table].[CustomMembers] as [CustomMembers], [$Table].[ValueType] as [ValueType], [$Table].[CustomMemberOptions] as [CustomMemberOptions], [$Table].[ETLLoadID] as [ETLLoadID], [$Table].[LoadDate] as [LoadDate], [$Table].[UpdateDate] as [UpdateDate] from [dbo].[DimAccount] as [$Table]

As you can see, only the last query returns any data and rest of the queries are to get data meta data about the Synapse data warehouse. Let’s focus on how I have loaded the data within M query. Just like any of us do, first step is to define the source and next step is to navigate to required table, in this case “DimAccount” table.

If we check the M Query for this, this is how it looks like.

I felt that all these meta data queries are generated by these 2 steps as it tells Power BI to go and navigate Synapse to find the DimAccount table. Why should I do that! I can tell Power BI just bring the data from SQL server using a SELECT statement without doing any searching.

Therefore, I changed my M query like below and check what are the queries created by Power BI when refresh data.

Luckily, it only creates 3 sessions instead of 7 sessions and yet it executes same query data query 2 times. One for actual data and one to get meta data required for Power BI. And most importantly, it breaks “Query Folding”!

USE [Contoso DW]
select @@version + ‘ EngineEditon: ‘ + cast(SERVERPROPERTY(‘EngineEdition’) as varchar(4))
USE [Contoso DW]
SELECT * FROM [dbo].[DimAccount]
USE [Contoso DW]
SELECT * FROM [dbo].[DimAccount]

Therefore, I looked for the help from one and only Chris Webb and saw that I can still keep Query folding on if I use Value.NativeQuery function.

This time, I changed the query like below and checked what queries are sent from Power BI to Synapse.

Surprisingly, 2nd query which actually returned data in previous method does not returns any data as it is wrapped with SELECT top 0 * statement and Query Folding is back!

USE [Contoso DW]
select @@version + ‘ EngineEditon: ‘ + cast(SERVERPROPERTY(‘EngineEdition’) as varchar(4))
USE [Contoso DW]
select top 0 * from ( SELECT * FROM [dbo].[DimAccount] ) as [_]
USE [Contoso DW]
SELECT * FROM [dbo].[DimAccount]

As you can see, with this simple method, I reduced 7 sessions into 3 sessions. I wish I could do more and reduce it into 1 session or at least 2. But I think Power BI needs meta data about the table and Synapse DW version. Hence reducing it to 2 or 1 will not be possible; may be. But if the same table is referenced multiple times and multiple tables like that create 100s of sessions in Synapse, I think this is good enough for me.

Update: Note that when I use the query as reference query, Power BI introduce another meta data query to get top 1000 rows required for it’s internal optimisations. In this case I did 2 steps, 1st step to remove a column and 2nd one to group data. At the end of the 2 steps, I can see that Query Folding is still there.

This time it creates 4 sessions, 8 queries as shown below. If I do exactly the same thing as I did in very first way using navigations, it generates 16 queries, 8 sessions. just double the number of sessions! There could be other steps to optimize this further. But for the movement, this gives me some breathing space. 🙂

USE [Contoso DW]
select @@version + ‘ EngineEditon: ‘ + cast(SERVERPROPERTY(‘EngineEdition’) as varchar(4))
USE [Contoso DW]
select top 0 * from ( SELECT * FROM [dbo].[DimAccount] ) as [_]
USE [Contoso DW]
select top 1000 [$Table].[AccountKey] as [AccountKey], [$Table].[ParentAccountKey] as [ParentAccountKey], [$Table].[AccountLabel] as [AccountLabel], [$Table].[AccountName] as [AccountName], [$Table].[AccountDescription] as [AccountDescription], [$Table].[AccountType] as [AccountType], [$Table].[Operator] as [Operator], [$Table].[CustomMembers] as [CustomMembers], [$Table].[ValueType] as [ValueType], [$Table].[CustomMemberOptions] as [CustomMemberOptions], [$Table].[ETLLoadID] as [ETLLoadID], [$Table].[LoadDate] as [LoadDate], [$Table].[UpdateDate] as [UpdateDate] from ( SELECT * FROM [dbo].[DimAccount] ) as [$Table]
USE [Contoso DW]
select [rows].[AccountType] as [AccountType], count(1) as [Count] from ( select [AccountType] from ( SELECT * FROM [dbo].[DimAccount] ) as [$Table] ) as [rows] group by [AccountType]

Thank you very much for reading and stay safe!


Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s