Uncategorized – Page 3 – Small Data And self service

PowerBI Direct Quey modeling using flat Table

Flat table modeling in PowerBI can generated some very heated arguments, every time someone suggest that that it may be useful for a particular use case, the reaction is nearly universal, flat table are bad, I think it may be useful in some very specific scenarios.

let’s say you have a nice wide fact table generated by dbt and hosted in a fast Cloud DWH, all dimensions are pre joined, , to be very clear you will not need to join it with another fact, it is a simple analysis of 1 fact table at a very specific grain

I will use Power generation in the Australian market for the last 5 years as an example.

Import Mode

When using Import Mode, PowerBI import the data to the internal Database Vertipaq, it is just a columnar database, with some very specific behavior, because the whole table is loaded into memory, less columns, means less memory requirement, which is faster, and because it does uses index joins between Fact and dimensions when you define relationships, counterintuitively, the cost of doing join is less expensive than loading a whole column in the base table.

In Import Mode, it is a no-brainer, Star Schema is nearly always the right choice.

Direct Query Mode

In Direct Query Mode, the whole way of thinking change, PowerBI is just sending SQL Queries to the source system and get back results, you try to optimize to the source system, and contrary to popular beliefs Star Schema is not always the most performant ( it is rather the exception), see this excellent blog for more details , basically pre join will often give you better performance.

let’s test it with with one fact table ( The Table is 80 millions with a materialized view to aggregate data)

And the glorious Model in PowerBI, yes, just 1 Table

and let’s build some visuals

Now let’s check the Data Source performance

Slow Slicer

The Slicer is rather slow, probably you will say, of course scanning a whole 80 million columns is not very smart, actually that’s not the Problem.

for example when I extend the State NSW, PowerBI will generate two SQL Queries

the first one to get the station Name and took 481 ms

And the second Query to get the regions, 361 ms

PowerBI Formula Language will take some time to stitch the results together ( 1023 ms, that’s seems very slow to me ?)

in this case it is only 5 states, not a big deal, the Query results will be cached eventually after the report users expand all the options.

Is 3 second ? a good enough experience for an end user, I don’t think so, slicers have to be instantaneous, Visual can take up to 5 second, I don’t mind personally , but as a user I have a higher expectation for the slicers responsiveness, I will still use Dual Mode with a star schema

Take Away

If your Database can give you a sub second response time for the slicer selection and you have a very limited and clear analysis to do and you have to do it in Direct Query Mode, then flat wide table is just fine as long as you are happy with the SQL Generated.

PowerBI Custom Chart Ranges

My go to progress dashboard calculates a lot of progress % metrics and graphs. However for larger projects, its often difficult to zoom into the graph. This is exactly why slicers exist

Slicers are not my specialty, actually, PowerBI is not my specialty. So it was with a little frustration in trying to add a slicer to a page to find the slicer altering all my data.

Set The Stage

My go to progress dashboard calculates a lot of progress % metrics and graphs. However for larger projects, its often difficult to zoom into the graph. This is exactly why slicers exist. I am sure there are other nice graph tools that allow for custom date ranges, but again, this is so not my specialty.

What didn’t work

I simply added a slicer on my “weekending” field. However, in doing so, all my measures are now calculating based on the filtered date range. This is likely an issue with my measures, but alas, I wanted something to just adjust the graph axis and not effect anything else

Below we can see that my measure are calculating a progress set from 0-100%. Thus when the date ranges were adjusted, the entire dashboard is now just wrong. My budgets and %’s are also not correct on the cards (which are also based on all the slicers).

The Solution – Create a Duplicate DIM_Date Table

The problem was caused because the slicer was based on the live master dimension table that was linked to my data. Just like I want my graph to adjust based on the adjusted the WBS dimension tables, if I insert a slicer linked in anyway to my FACT table, I am in a world of hurt

Thus, just create a duplicate DIM_Date table. Here I created a new table: DIM_Date_GraphRangeSlicer

I insert a formula into the chart X-Axis range to select the min and max dates from this new GraphRange table. I then setup a slicer that filters the range for this new table, not the master DIM_Date.

With these new ranges, linked to the dummy date range, I can now much better refine just the X-Axis display of the graph without impacting any of the measures used to calculate the % progress.

The Result

Putting it all together, we can now customize the X-Axis range without altering the measures or cards that are calculating key metrics off the full (or filtered based on the WBS slicers) data.

PowerBI – Resource Profiles from P6

One of the biggest critiques/limitations in using P6 data is the lack of time phased resource assignment data. The only effective way to pull time phased resource assignment data into PowerBI (or even excel ) is to copy-paste from P6 into Excel. This is what I have recommended in the past and still what I would recommend for anyone moving forward. However, that does not mean that PowerBI can’t produce time phased data using a start date, end date, and profile. What follows is a simply guide on how I have tackled the problem (and the limitations I have run into)

Note: In all likelihood this problem has already been solved my many people in many different ways. I do not want to suggest this is “the way” to solve this. More so, I want to simply raise awareness of at least my approach and welcome comments and feedback on how to really solve the problem

Part 1 The Problem

P6 data does not contain time phased distributions. Instead, the backend (and inside XER) data only contains information about the activity, the resource, and the profile applied. So, our problem is to try to extract these data elements and generate a time phased distribution of the resource according to the profile and activity start and end dates.

Here we have our Resource Details (including our “curv_id)

I’ve taken a few liberties with the data above to try to focus on the:

Key Problem: how to allocate the resources assigned to an activity according to a resource profile?

I’ve seen this done in excel a lot although, i’ve never been comfortable with the excel solutions. Typically we count the weeks and distribute the hours equally to all the weeks. I know more complex files exist that allow for spreads using profiles. So to add to the problem isn’t nessessarily to simply spread per the profile, but to perform the operation inside PowerBI (or perhaps more to the point, inside Power Query).

At this point, I’d again love to call upon anyone who has a nice solution to include links about how you tackled this as what follows is just my initial stab at this.

Part 2: Getting the data into PowerBI

Firstly, although the data above is from a P6 XER file, I wanted to make this a bit more general. Therefore, I have created a toy model approach. Thus, here is what my source data looks like

Resource Profile Data:

We can run this through PowerQuery and convert it to a usable table. Note in the above I have created a PeriodCum field. This will be used to calculate an end date for each of the 10 periods required.

Activity and Resource Data:

In this example, I am combining the Activity and Resource data into just 1 table. Obviously if you were doing this formally, you would need to build a scheme to link the Resource Assignment data into the Activity level data

And running the above through Power Query we end up with something as seen below. Note I have added a calculation for the duration (in cal day) and have converted the date formats to numbers. This make the subsequent steps a little easier

Note: a critical hard step (for me at least) came duration this stage. Because we are breaking the duration into 10 periods and will ultimately be allocating a qnty per day to each each, if we have a fraction of a day (example a duration of 15 days), my method bombed. This caused an overlap of qnty allocation on the day of the overlap. As such, I have rounded the duration to the closest 10s.

Part 3: Time Phasing (where the magic happens)

The first step of generating the time phasing is to now split the activities into the 10 periods per the resource allocation. We do this by first merging tbl_activies with tbl_profiles using the ProfileDesc field. After expanding the result, we will end up with 10 records for each activity (corresponding to each of the 10 periods).

We will now want to calculate a start and finish date unique for each of the 10 periods. In the profile table is a PeriodCum field that we can use to multiply by the duration and then add that to the start date to get a finish for each period. The result table will now look something like this. Note, at this step, its good to now use the profile allocation for each period and multiple that by the hours_total field. This will give us a hours per each period. The last step will be the divide that by the period duration to calculate what will now be an hours per day for that period.

Lastly, we now want to perform 1 additional expansion to get the DAYS for each period. Here is the code I have used. This is a nice little bit of code that can generate a sequential list from a start to an end (we are using days, but its works for any beginning to end sequence)

= Table.AddColumn(#”Changed Type2″, “Custom”, each { Number.From([PeriodStart])..Number.From([PeriodFinish]) })

resulting resource allocation table with profiles applied

In our resulting table (see above), we can clearly see that the hours per day adjusts for each period based on the profile.

Part 4: Putting It All Together

We have our activity data, we have our time phased resource data, the last step is to generate a DIM_Date table that can be used to bin the days to either weeks or months (or quarters or years or any custom grouping defined inside the DIM_Date table)

I don’t want to get to into the DIM_Date table, effectively all we need is each day assigned to a Month-Yr for the purpose of generating a nice little graph below, which is ultimately what we are after.

Conclusion

Again, I do not recommend using this approach. Instead I am more interested in how others have decided to approach this. Personally, as I noted at the beginning, my recommendation is to use a copy-paste from the P6 Resource Assignment tab. Although, this time phasing approach can be used for other (non P6) applications. But alas, I believe there are much smarter ways to achieve the spread using the DIM_Date table and perhaps custom functions. In my research for my method, I ideally wanted a “working day” spread as opposed to the calendar day spread. There are some awesome custom functions that can provide an integer for the number of working days between 2 dates. However, even when taking that approach, I ran into additional complications in getting everything to work.

So, really keen for feedback!

thanks

PowerBI Incremental refresh Parquet files, without a Database.

TL;DR, you can incremental refresh PowerBI using Parquet files stored in an Azure Storage without using any Database in the middle, you can download sample pbix here

I am using this blog post by Gilbert Quevauvilliers which is based on a technique from Rafael Mendonça, Please read it first

Maybe read this, it is using Synapse Serverless , but has a section where you can Partition your data using Python to Parquet

1-Add a new Table, Parquet

make sure it is not loaded, here is the M code

let
     Source = AzureStorage.DataLake("https://xxxxxx.core.windows.net/parquet"),
     #"Removed Other Columns" = Table.SelectColumns(Source,{"Content", "Folder Path"}),
     #"Inserted Text Between Delimiters" = Table.AddColumn(#"Removed Other Columns", "Text Between Delimiters", each Text.BetweenDelimiters([Folder Path], "D", "/", 1, 0), type text),
     #"Renamed Columns" = Table.RenameColumns(#"Inserted Text Between Delimiters",{{"Text Between Delimiters", "Date"}}),
     #"Changed Type" = Table.TransformColumnTypes(#"Renamed Columns",{{"Date", type datetime}}),
     #"Removed Columns" = Table.RemoveColumns(#"Changed Type",{"Folder Path"})
 in
     #"Removed Columns"

here is the result

3-Merge using Inner Join

to read the parquet file content we use this function , notice we used inner join in the previous step to avoid reading null Content, which generate errors when you refresh in the service

Parquet.Document([Content])

and here is the final table

we configure incremental refresh to refresh the Last 2 days

4- Testing in PowerBI Service

as you can see the second refresh is way faster then the First one

here is the partition Table

now let’s check the transaction history from Azure storage, I refreshed again just to be sure

The second refresh read substantially less data as only two files are read

I Think with PowerBI desktop supporting Parquet, we will see more exciting scenarios, I can’t wait for Dataflow to support export to Parquet !!!!

if you are still reading, I appreciate a vote on this idea, Having an option in Dataflow to export to a dynamic file name

	Querying a Fabric La… on Writing to SQL Server using…
	Benjamin on Running DuckDB at 10 TB s…
	mim on Running DuckDB at 10 TB s…
	Benjamin on Running DuckDB at 10 TB s…
	Running DuckDB at 10… on Running DuckDB at 10 TB s…

Import Mode

Direct Query Mode

Slow Slicer

Take Away

Share this:

Set The Stage

What didn’t work

The Solution – Create a Duplicate DIM_Date Table

The Result

Share this:

Part 1 The Problem

Part 2: Getting the data into PowerBI

Part 3: Time Phasing (where the magic happens)

Part 4: Putting It All Together

Conclusion

Share this:

1-Add a new Table, Parquet

3-Merge using Inner Join

4- Testing in PowerBI Service

Share this: