5/13/2024 0 Comments Tabular database columnar database![]() Here is the sample query when the Sales table stores a DateTime:ĪVERAGEX ( Sales, Sales - Sales ), Indeed, if the Sales table contains only the DateKey as an integer, then the calculation requires us to first retrieve the actual date from the Date table, and only later to perform the calculation. For example, if we need to compute the average delivery days by subtracting Order Date from Delivery Date, the calculation is much simpler to author in the scenario where the Sales table stores dates. There is a difference in the two data types when it comes to performing calculations. Our findings align with the theory: we were not expecting to spot a difference from the storage and performance points of view. From the query point of view, there are no differences between using an Integer or a DateTime. Overall, the difference is tiny and likely to be the result of a different choice of sort order for some segments in the table.īecause the two columns end up being the same size, all the performance tests we performed result in identical performance. Quite surprisingly, the Integer column produces a slightly larger column size: 2.6Gb compared with 2.5Gb when using a Datetime column. This is because each value in the dictionary is slightly smaller, but the column size is very close. ![]() The same table, with an Integer key looks like this below.Īs you see, using an Integer produces a smaller dictionary. The following figure shows the size of the Order Date column using a Datetime. This is the reason why the size of the two columns is the same. Regardless of whether we use an Integer or a Datetime, these two factors are identical. The efficiency of hash-encoding depends on only two factors: the number of distinct values of the column and the data distribution. These considerations are important in our scenario, since VertiPaq always chooses hash-encoding for columns that are involved in a relationship – as is our case. Therefore, when VertiPaq chooses hash-encoding for a column, its original data type is rather irrelevant from a technical point of view. In other words, regardless of the original datatype of the column, the column values are stored as integers. Hash-encoding is the most common technique: VertiPaq creates a dictionary of the values in the column, and instead of storing the values it stores their index in the dictionary. When a column is loaded in the VertiPaq engine (the in-memory database engine of Tabular), it is compressed using either hash-encoding or value encoding. ![]() We created two different databases: in the first one the Order Date column is stored as a Datetime, in the second one the same Order Date column is stored as an integer in the format YYYYMMDD.īefore we start the test, a bit of theory is needed. We started from the Contoso database, with 2 billion rows in Sales and dates in a 10-year range. The remaining part of the article aims to prove the previous sentences, and to provide you with the technical details about how we tested the respective performance of the two options. With that said, if your model uses Integers and you do not need to perform calculations on the dates represented in the table, then you can choose the most convenient data type – that is, the one already used in the original data source. In the most common scenarios, a Datetime proves to be better because it provides more possibilities to compute values on dates without having to rely on relationships. Depending on the specific requirements of your model, you might favor one data type against the other. Therefore, the choice is not related to technical aspects, but rather on the convenience of the design. The database size, the query speed, and any other technical detail are absolutely identical. Indeed, in Tabular there are no technical differences between using a Datetime or an Integer to create a relationship. However, Tabular is an in-memory columnar database, and its architecture is quite different from the relational databases we might be used to working with. Historically, using Integers has always been a better choice in database design. A question that is often asked during the design of a Power BI data model is whether it is better to use an Integer or a Datetime column to link a fact table with the Date dimension.
0 Comments
Leave a Reply. |