How traditional data warehousing concepts have changed/should change with Power BI?

10 Years ago, when I started my career as a BI engineer the first task I was given was to study Kimball methods on dimensional modeling. At that time data warehousing was a specific domain within IT industry and knowing Kimball data warehousing concepts was a must for people who worked in data and analytics. However, with Power BI become increasingly popular among both IT and business users in data and analytic field, I feel that certain DW concepts have changed/ should change by its influence.

What makes the difference?

Well, back then, almost all the reports were created by the IT department and only the IT department knew how the data is calculated, populated and which tables/fields contain what information. However, with Power BI, Business users are more like to create their own reports and perform their own analysis; IT department only needs to provide data to them. I believe this change has impacted profoundly on the design and the implementation of a data warehouse in an organization.

image

Figure 01: How the usage of a DW has changed with Power BI

Apart from that certain features available in tools like Power BI has made some of the data warehousing concepts invalidated and some features need new concepts to be introduced to data warehouse design and implementation. Let’s talk about how Power BI has changed DW design patterns.

Star or Snowflake

First and foremost, it is required to decide on whether to use Star Schema or Snowflake schema in dimensional modeling when designing a DW. In traditional data warehousing this decision is purely base on the data volume and which method is easier to implement. However, it commonly accepted that Power BI works best with Star Schema compare to Snowflakes due to how data is stored in tabular engine and how relationships are maintained. Apart from that Star schema is simple compare to Snowflake and that makes business users life much easier when they perform data modeling and analysis using Power BI.

Surrogate Keys

In Data warehousing, there are couple of ways to implement surrogate keys.

  1. Auto increment integer values
  2. Auto generated unique identifier values
  3. Hash value of natural keys

In method 2 and 3, surrogate keys are populated using string values and when it comes to Power BI, these columns act as high cardinality fields. As we need to use memory in the most optimal way within Power BI, it is ideal to avoid high cardinality columns as much as possible. Therefore, I believe that using auto-increment integer or small integer values as surrogate keys is the best methods if you are using Power BI with the Data warehouse.

Fact Table Surrogate Key is another concepts which has changed with Power BI. Having high-cardinality columns with no business value only consumes precious, limited memory available within Power BI.

Accumulated /Periodic Snapshots Fact Tables

In traditional data warehousing, there are couple of fact table types.

  1. Transaction Fact Tables
  2. Periodic Snapshot Fact Tables
  3. Accumulated Fact Tables

The idea is to extract the most granular level data from source systems and store in transactional fact tables and create periodic and accumulated facts based on that data by grouping data for different periods and stages. Populating these two types of fact tables needs additional ETL jobs. However, with Power BI, data is by default aggregated and implementing time series calculations are straight forward. Along with that, features such as Power BI aggregation, implementing and maintaining periodic and accumulated fact tables has become a resource overkill.

Precalculated Time Intelligence Columns

Implementing  calculated columns for time intelligence values such as YTD/MTD fields within is a common practice within traditional data warehousing. Since time series values are pre calculated, it does not incur any performance issues when business users query the DW. However, with Power BI, time calculations are easy to implement and vertipaq engine is capable of calculating these values on the fly in no time. Hence pre-calculated time series fields are no longer required in a data warehouse.

Table/Filed Naming

In data warehousing, we use to certain naming conventions when we name tables and fields and those naming conventions are only used among technical users. We never worry about how readable and how much it aligned with business terms as we know business users will never going to touch the DW. However, things has changed with Power BI as anyone can explore the DW now. Using suffix or prefix such as Dim/Facts to name tables makes business users life hard. Moreover, having camelCase, Pascal Case naming for tables, columns is not ideal and should use more readable business naming which can easily be identified by business users. Apart from that maintaining same names between primary keys and foreign keys helps to model in Power BI as it automatically creates relationships based on name and data type.

Date Dimension

Date dimension is the most important dimension in any data warehouse. In DW implementation, we maintain a separate ETL process to populate and update the date dimension. However, Power BI has multiple ways to implement a data dimension and it does not require any manual intervention from user. Therefore, unless there is a business requirement, I believe having and maintaining a data dimension is no longer mandatory if you are only using Power BI to analyze data in the DW.

ETL Audit Fields

Having audit information such as Data Load Date/Update Date is very helpful in fact tables as Power BI incremental refresh can leverage that information. Apart from that business users can have a visibility on how up to date their data is and when and who made changes to data. Therefore, they don’t require to comeback to IT team when there is any doubt on accuracy of the data.

Simple Calculations

When I was implementing DW back then, I used to store all the calculated fields within Fact tables irrespective of how simple/ complex those calculations are. Implementing calculated fields such as Min, Max, Average within ETL process and storing those fields in DW is no longer required as Power BI handles those calculations off-the-shelf.

Conclusion

If there is anything in this world which does not change is changing. Some of the DW concepts has changed and new concepts has come to play with the tools such as Power BI. When Power BI team add more features with the product, I believe that Data warehousing concepts will change even more in near future. Well, we might not need a data warehouse at all.

I hope this post gives you some idea on how Power BI has changed some of the traditional data warehousing concepts. Thanks for reading and stay safe. Cheers!

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s