Trying to produce an analytical report directly from an operational system is a daunting experience. Operational data models are highly normalised and optimised for data entry, not for data retrieval. Therefore, analytical queries against an operational system can largely affect performance and can put a burden on your IT infrastructure.
In my previous post, I examined the importance of data modelling when building Power BI reports. This week, I’m introducing star schemas, widely accepted as a standard data model for BI systems.
A star schema comprises a central fact table, e.g. Sales; and multiple dimension tables, e.g. Date, Customer, Product Group. The fact table contains data from business transactions or a snapshot summary of that data. It is connected through a many-to-one relationship to each dimension table. The main advantages of star schemas are:
- Simplicity – Star schemas are easy to read, understand and use. Dimensions are used to slice and dice the data, and facts to aggregate numbers. (Note: it is very important to use standard business names when naming tables and columns in your model).
- Performance – Because star schemas have a small number of tables and clear join paths, queries run faster than they do against an operational system.
- Scalability – Star schemas are extensible to accommodate changes, such as adding new dimensions, attributes or measures.
- Support – Star schemas are the most commonly used in BI implementations and therefore supported by a large number of BI tools.
Star schema design is predicated on the assumption that our transactions (facts) are very voluminous, and dimensions, on the other hand, are smaller and relatively trivial compared to the facts. The fundamentals of star schemas are therefore quite simple. Firstly, we make the fact table, which is the largest one, narrow and highly normalised. Secondly, we make dimension tables wide and denormalised by taking out as many joins as possible.
BI professionals share the same opinion that star schemas are nearly always the best way to represent analytical data.
Fact or dimension tables should not have relationships between themselves. Facts can only be related to dimension tables. A snowflake is a variation of a star schema where some of the dimensions are normalised. Snowflake models are generally easier to maintain, and apart from a small performance degradation due to additional joins, there is nothing wrong with them. We can always convert snowflake back to a star schema by denormalising dimensions. Another variation is a star schema with multiple fact tables that are not related to each other. In this case, multiple fact tables share the same dimensions. Common examples are Budget vs. Actuals or Sales vs. Purchases reports.
BI professionals share the same opinion that star schemas are nearly always the best way to represent analytical data. When building data models with Power BI, it is worth spending additional time in Query Editor to transform your data sources into facts and dimensions.
Remember, if you are facing modelling problems, consider rebuilding your model to move toward a star schema. By taking this approach you will find yourself heading in the right direction.