mim – Page 36 – Small Data And self service

Change Dimension Dynamically using Parameter in PowerBI

Edit: 16 may 2022, the hack is no more required, PowerBI released fields parameter that support this functionality out of the box.

Edit : 20 Feb 2022 , SQL Server is finally supported, I test it with an Azure SQL and it worked, it is supported when using the Enhanced Engine in Dataflow, make sure you use the Power Platform connector.

At Last, PowerBI added support for parameters that can be changed by the end user, I guess from a Business perspective, it is mostly useful when you deal with Big Data load, and you want to control exactly the Query generated at the data source level, but in this short Blog, I will show how some use cases where hard or clunky using DAX became extremely easy to do using Parameters.

pbix file here : notice it is connecting to my DB instance, so it will not work but you can see the Data Model.

I think it is wise to read the documentation here first

Chris Webb has a great use case using Azure Data explorer here

Update : I added a new use case here, changing weekend Date Dynamically

We want to change a dimension based on a user selection from a slicer, currently Only DirectQuery is supported and to be honest, the documentation does not tell which data source works, we know SQL server is not one of them, Thanks to Alex for his clarification, Luckily BigQuery Works ( that was a very nice surprise to be honest)

I am using the Covid19 data set as an example (as it is free and don’t incur any charge till sept 2021), we want to switch dynamically between countries and continent

1- Load the main Table as import mode

2- Create a parameter ” Level_Details”

3- Import dimension Table with the values countries and Continent in Direct Mode:

I created a view in BigQuery , PowerQuery stopped folding when I tried to remove duplicated, although it is free data source, it is important to use directQuery only with dimension Tables to reduce cost and Data volume

4- Include the parameter logic in Dimension Table

I created a new Column “Grouping_Details” based on the Parameter Value, it will Take either Countries or Continent

5- create a new Table that contains all the possible values for the Parameter

by the way, you can use any table, either imported, or generated using DAX, this is a very clever implementation by the PowerBI team compared to Other BI Tool.

6- Bind the value of the column “Selection” to the Parameter

here is a View of the Data Model

it is very Important that “Selection_Details” stay as a disconnected Table, otherwise it will create new filter selection in the Queries which we don’t want, it will work but we want to control exactly the Query generated by PowerBI

And the Report

The feature is in Preview and I am sure, they will introduce more Data Sources and functionalities, by adding support to BigQuery, Microsoft sent a clear message, PowerBI is the best Data Analytics tool and they will support any third Party Data Warehouse, even if it is a direct Competitor.

Personally,I am very excited by the thought that we are very close to Finally have Parameter Action In PowerBI , and that will introduce a new class of Visual Analytics Interaction that was not even Possible, Please need some Votes here

Btw, if you use BigQuery with PowerBI, I appreciate some votes here, we need the support of Custom SQL Query with Parameter

Using PowerBI with Azure Synapse Serverless, First Look

Recently I come across a new use case, where I thought Azure Synapse serverless may make sense, if you never heard about it before, here is a very good introduction

TLDR; Interesting new Tool !!!!, will definitely have another serious look when they support cache for the same Queries

Basically a new file arrive daily in an azure storage and needs to be processed and later consumed in PowerBI

The setup is rather easy, here is an example of the user interface, this is not a step by step tutorial, but just my first impression.

I will use AEMO (Australian electricity market Operator) data as an example, the raw data is located here

Load Raw Data

First I load the csv file as it is, I define the columns to be loaded from 1 to 44 , make sure you load only 1 file to experiment then when you are ready you change this line

'https://xxxxxxxx.dfs.core.windows.net/tempdata/PUBLIC_DAILY_201804010000_20180402040501.CSV',
'https://xxxxxxxx.dfs.core.windows.net/tempdata/PUBLIC_DAILY_*.CSV',

Then it will load all files, notice when you use filename(), it will add a column with the files name, very handy

USE [test];
GO

DROP VIEW IF EXISTS aemo;
GO

CREATE VIEW aemo AS
SELECT
result.filename() AS [filename],
     *
FROM
    OPENROWSET(
        BULK 'https://xxxxxxxx.dfs.core.windows.net/tempdata/PUBLIC_DAILY_201804010000_20180402040501.CSV',
        FORMAT = 'CSV',
        PARSER_VERSION='2.0'
    )
    with (
c1   varchar(255),
c2   varchar(255),
c3   varchar(255),
c4   varchar(255),
c5   varchar(255),
c6   varchar(255),
c7   varchar(255),
c8   varchar(255),
c9   varchar(255),
c10   varchar(255),
c11   varchar(255),
c13   varchar(255),
c14   varchar(255),
c15   varchar(255),
c16   varchar(255),
c17   varchar(255),
c18   varchar(255),
c19   varchar(255),
c20   varchar(255),
c21   varchar(255),
c22   varchar(255),
c23   varchar(255),
c24   varchar(255),
c25   varchar(255),
c26   varchar(255),
c27   varchar(255),
c29   varchar(255),
c30   varchar(255),
c31   varchar(255),
c32   varchar(255),
c33   varchar(255),
c34   varchar(255),
c35   varchar(255),
c36   varchar(255),
c37   varchar(255),
c38   varchar(255),
c39   varchar(255),
c40   varchar(255),
c41   varchar(255),
c42   varchar(255),
c43   varchar(255),
c44   varchar(255)
     )
 AS result

The previous Query create a view that read the raw data

Create a View for a Clean Data

As you can imagine , Raw data by itself is not very useful, we will create another view that reference the raw data view and extract a nice table ( in this case the Power generation every 30 minutes)

USE [test];
GO

DROP VIEW IF EXISTS TUNIT;
GO

CREATE VIEW TUNIT AS
select [_].[filename] as [filename],
   convert(Datetime,[_].[c5],120) as [SETTLEMENTDATE],
    [_].[c7] as [DUID],
   cast( [_].[c8] as DECIMAL(18, 4)) as [INITIALMW]
from [dbo].[aemo] as [_]
where (([_].[c2] = 'TUNIT' and [_].[c2] is not null) and ([_].[c4] = '1' and [_].[c4] is not null)) and ([_].[c1] = 'D' and [_].[c1] is not null)

Connecting PowerBI

Connecting to azure synapse is extremely easy, PowerBI just see it as a normal SQL server.

here is the M script

let
Source = Sql.Databases("xxxxxxxxxxx-ondemand.sql.azuresynapse.net"),
test = Source{[Name="test"]}[Data],
dbo_GL_Clean = test{[Schema="dbo",Item="TUNIT"]}[Data]
in
dbo_GL_Clean

And the SQL Query generated by PowerQuery ( which Fold)

select [$Table].[filename] as [filename],
[$Table].[SETTLEMENTDATE] as [SETTLEMENTDATE],
[$Table].[DUID] as [DUID],
[$Table].[INITIALMW] as [INITIALMW]
from [dbo].[TUNIT] as [$Table]

Click refresh and perfect, here is 31 files loaded

Everything went rather smooth, nothing to set up and I have now an Enterprise Grade Data warehouse in Azure, how cool is that !!!

How Much it cost ?

Azure Synapse serverless pricing model is based on how much data is processed

First let’s try with only 1 file ,running Query from the Synapse Workspace, the file is 85 MB, good so far, data processed is 90 MB, file size + some meta Data

now let’s see using the Queries generated by PowerBI, in theory my files size are 300 MB, I will be paying only for 300 MB, let’s have a look at the Metrics

My first reaction was, there must be a bug , 2.4 GB !!!, I refreshed again and it is the same number !!!

A look at the PowerQuery diagnostic and a clear picture emerges, PowerBI SQL Connectors is famous for being “Chatty”, in this case you would expect PowerQuery to send only 1 Query but in reality it will send multiple Queries , at least 1 of them to check the top 1000 rows to define the fields type.

Keep in mind Azure Synapse Serverless has no cache ( they are working on it), so if you run the same query multiple times even with the same data, it will “scan” the files multiple times, and as there is no data statistic a select 1000 rows will read all files even without order by.

Obviously, I was using import mode, as you can imagine using it with directQuery will generate substantially more queries.

~~Just to be sure I tried to do refresh on the service~~.

~~The same, it is still 2.4 GB, I think it is fair to say, there is no way to control how many time PowerQuery send a SQL Query to Synapse.~~

Edit 17 October 2020 :

I got a feedback that probably my PowerBI desktop was open when I run the test in the service, turn out it is true, I tried again with The desktop closed and it worked as expected, one refresh generate 1 query

Notice even if the CSV file was compressed, it will not make a difference, Azure synapse bill uncompressed data.

Parquet file would made a difference as only columns used would be charged, but I did not want to used another tool in this example.

Take Away

It is an interesting Technology, the integration with Azure cloud storage is straightforward, the setup is easy,you can do transformation using only SQL, Pay only what you use and Microsoft is investing a lot of resources on it.

But the lack of cache is a show stopper !!

I will definitely check it again when they add the cache and cost control, after all it is still in Preview 🙂

Three years to finish a Dashboard

in 2017, at my previous job, we were using PowerBI Desktop as our reporting solution, but there was a big limitation, we couldn’t use the service, so sharing the reports was either in Excel or pdf.

I remember I did try different solutions (Rstudio, Qlik, SSRS), they were great Products, but you need some kind of server to share the reports. At that time all I wanted is a simple web app where people can click on a slicer and get a fancy charts.

At that time Google made their reporting solution free, I was really excited about Data Studio, a free product, extremely easy to share but unfortunately a bit slow and lacked some basic functionality, I still managed to build something but it was not really good

It is all history now, moved to another job, we have PowerBI service ( and Tableau), but still for some reason, I felt like a missed opportunity, what if Data Studio became a good enough to be used as a free report tool.

If I remember correctly 2017 and 2018, there was no major progress but then they released custom viz, which basically means you can port any javacript library relatively easily , I managed to build a custom viz see example here

and in sept 2019, BI Engine showed up !!

It was really a big Deal, BI Engine is an analytics in-memory Database , and it is fast and they gave away 1 GB for free, it means you can connect your data from BigQuery and pay nothing ( with a fair limit of course), this made this report possible

In May 2020, they finally released Google Map Integration , although with a limit of 10K points, it was not useful for my use cases ( Solar farm needs a lot of point around 40k to 60K)

That was great and all, but still I couldn’t write complex measures easily ( or maybe did not know how), but something changed in August 2020

At last we have Proper support for parameter, that changed everything, now you can write any complex business logic using SQL in BigQuery and visualize the results using Google Data Studio, and you can do a lot of fancy stuff see those examples

Still there was still a major bug, Pivot table in Data Studio show 0 for null values needless to say, it is extremely annoying although you can build workaround, it was a hack and not sustainable.

That was fixed last week

So yes, it took me three years to finish this report, BI Engine + Parameter + Custom Viz and a bug fix in the Pivot Table to make this report possible

I added a workflow explanation in the report, but basically create a reporting dataset as large flat fact table and show the results in BigQuery with further control by SQL Parameter, if the native visual are not satisfying, you can show pretty much anything using Vega-lite custom viz.

One aspect was impossible to do without Parameter is the dynamic grouping of dates, in the time series, the weekend update dynamically based on the cut off selected.

Please don’t get me wrong, there is still a lot of work to be done, but the foundation of the product is already there, I can see clearly the vision of the product team, hopefully they keep investing but faster this time ( Parameter Action, support for BigQuery geography field, analytics functions, Tiles for Custom Viz ……)

Take away:

– If you need near real time reports

– You want a reporting solution and don’t have a decent budget.

– Used Data Studio in 2017 and dismiss it.

I have a good news for you, BigQuery/Data Studio is a viable option now, and you get 1 TB free for BigQuery and 1 GB compressed in -memory for BI Engine, that’s a lot of free resources, and there is no catch, you can share securely with anyone, again totally free.

Although I am a PowerBI developer and I love it, I think it is very healthy for the industry to have more choices, 2021 will be exciting !!!

Row Level Security Using Parameter in Google Data Studio

saw this on twitter (Thanks Pablo), and I had an idea of using Regex_Match as currently comparing dimension does not work !!!

How RLS Works in Data Studio

Data Studio support RLS using google account ( can work with any email address not only gmail), so when you login using your account, Data Studio get your email adress which is unique to you, then you can use that email to filter rows .

in Some cases, maybe you want to implement security using username/Password, it turn out using parameter, it is trivial to implement, notice we are using Google sheet as a data Source, if you need more control, using SQL Parameter is even more Powerful

Warning : Storing password as plain text is dangerous

now for a very simple scenario, you have data and you want people to have access only to some rows

1- Create two parameters

usernameParameter and PasswordParameter

let’s start with usernameParameter, it has to be any value, and default value null or none etc

do the same for PasswordParameter

2- Create calculated field RLSfilter

basically, if both the values typed in the parameter control match then return 1 else 0

3- Create a Filter using the calculated field RLSfilter

4- Apply the filter to Visual

go to file, report setting, and add the filter, so it will automatically apply to all the visual in your report, or you can apply to every visual where it make sense only ( you may have multiple dataset where it does not apply)

you can add an image on the filter control to hide the password when the user type it.

You can see the report here

Limitation

This RLS apply at the report level, it means report editors can see everything.

if you want the report editors not to have total access, then you need to push down the filter upstream, either using email, or leveraging SQL Parameter in BigQuery

	Querying a Fabric La… on Writing to SQL Server using…
	Benjamin on Running DuckDB at 10 TB s…
	mim on Running DuckDB at 10 TB s…
	Benjamin on Running DuckDB at 10 TB s…
	Running DuckDB at 10… on Running DuckDB at 10 TB s…

And the Report

Share this:

Load Raw Data

Create a View for a Clean Data

Connecting PowerBI

How Much it cost ?

Edit 17 October 2020 :

Take Away

Share this:

Share this:

How RLS Works in Data Studio

Warning : Storing password as plain text is dangerous

1- Create two parameters

2- Create calculated field RLSfilter

3- Create a Filter using the calculated field RLSfilter

4- Apply the filter to Visual

Limitation

Share this: