Build asymmetrical Pivot table in PowerBI

I have been asked to produce a simple construction report, we need to show the last 4 weeks of actual progress data and 6 weeks of forecast and to make thing a little bit complex the average installation since the start of the project, nothing special three measures, average to date, install per week and forecast per week

Obviously, it is trivial to be done in Excel using named sets, if you don’t know what’s named set and cube formula is, you are missing the most powerful reporting paradigm in Excel, a good introduction is here, and there are plenty of resources here.

Unfortunately named set is not supported yet in PowerBI, you can vote here,

Just for demonstration purpose, if you try to add those three measures to a matrix visual, PowerBI just repeat them for every time period, obviously that’s not good at all,  the actual installation make sense only in the past and the forecast has to be in the future, there is no option to hide a measure if there is no value in a column and even if it was possible we need to show the average installation independently of the time period, anyway this the report when you add the three measures

and because I already learned a new trick on how to dynamically add measures to a matrix visual in PowerBI,  I was tempted to try and see if it works in this scenario.

 So, let’s see how it can be done using the disconnected table

  1. Create a disconnected table with two columns Order and status
  • Add a calculated column,

As  the cut-off date change at least three times a week, the week number change accordingly, we can’t simply hard code the dates, instead let’s add a new calculated column, which will just lookup the week date from a master calendar table based on the order, when the order is -4 it will return “average to date”, I added a dummy 0.5 order just to add an empty space between actual and forecast ( cosmetic is important)

Week_Num =
SWITCH (
    [order],
    -4, “Average to Date”,
    0.5, BLANK (),
    “WE “
        & FORMAT (
            LOOKUPVALUE ( MstDates[dynamic Week End], MstDates[week_number], [order] ),
            “dd/mm/yy”
        )
)

  • Add a new measure that show specific measures (Average,Install or forecast) based on the value of column

dynamic_Pivot =

SWITCH (

    SELECTEDVALUE(pivot[order],BLANK()),

    -4,[Install_qty_average_week],

-3,CALCULATE([Installed_qty],MstDates[week_number]=-3),

-2,CALCULATE([Installed_qty],MstDates[week_number]=-2),

-1,CALCULATE([Installed_qty],MstDates[week_number]=-1),

0,CALCULATE([Installed_qty],MstDates[week_number]=0),

1,CALCULATE([Forecast_Qty],MstDates[week_number]=1),

2,CALCULATE([Forecast_Qty],MstDates[week_number]=2),

3,CALCULATE([Forecast_Qty],MstDates[week_number]=3),

4,CALCULATE([Forecast_Qty],MstDates[week_number]=4),

5,CALCULATE([Forecast_Qty],MstDates[week_number]=5),

6,CALCULATE([Forecast_Qty],MstDates[week_number]=6),

7,CALCULATE([Forecast_Qty],MstDates[week_number]=7))

And voila an asymmetrical matrix visual in all its glory 😊

Advertisements

Connecting PowerBI to Primavera, Implement Row Level security

Primavera has a very comprehensible security mechanism, based on OBS and EPS and user access, it is very mature and rigorous, but once you have a direct connection to the database’s backend all this security disappear, the connection will give you access to everything in the database.

This blog post is not a comprehensible security introduction to Primavera reporting, but rather a gentle introduction to RLS in PowerBI and how you can leverage it in the simplest form possible, and a reminder why security is a very important consideration, specially when you deal with a portfolio and multiple division in the same company.

if you don’t want to read the whole blog, I think all I want to say is

  • Connecting to Primavera database is not hard
  • Only Primavera Admin or IT should have access to Primavera Database connection.
  • There are multiple solution to implement Projects level access, particularly if you are dealing with multi divisions portfolio

What is row level security?

Is the ability to access a table but read only some rows, for example you want to read the table TASK, which show all the activities for all the projects, obviously you need your user to view only the activities that belong to the projects assign to them.

The Problem

Primavera administrator want to create a companywide reporting system based only on Primavera Database, RLS is implemented in PowerBI everyone see only the data that belongs to him, happy days, planning manager from division X like the reports but wants to use other data sources too, not only Primavera but Ecosys, cobra , progress measurement system etc, still he needs to see only his projects, Planner Y don’t care about PowerBI and wants the raw data to do his own stuff using some obscure VBA Excel, PowerBI as of this writing can not connects to multiple datasets from other reports, and you can’t mix live connection with import from other source.

Solutions

Whatever you do , you need to have only one connection to Primavera Database, don’t give multiple users access to the production database, that’s bad ( probably your DBA, will not do it anyway), I know it is still read only, but it is bad practise, if your write an expensive query against a reporting server, it is annoying,  but slowing Primavera server will get you angry planners from everywhere ( we are very famous for being grumpy)  

  1. Reporting Database

Ideal solution, just spin a cheap SQL instance on azure and make sure it is on the same region as your PowerBI ( no egress fees), implements RLS there for the people who want to author reports ( the viewer access will be done in PowerBI),any IT can easily create a small pipeline to copy data from Primavera Database ( doesn’t matter, Oracle or SQL server) and you get your  data fresh every morning or whatever schedule you want. ( bonus point, no PowerBI gateway between PowerBI and SQLServer as both are in azure)

  • PowerBI Dataflows

If  SqlServer is not an option , PowerBI dataflows seems like a perfect solution here, you connect once to the Primavera databased and you can share the results with other user,  PowerBI will be hitting the Dataflows instead of the production server, perfect, yes, unfortunately, Dataflows results are just CSV, no RLS, either you read the whole TASK table or not.

  • Analyse in Excel

Personally, I think it is the most Powerful feature of PowerBI, when you have access to a report in PowerBI, when you click analyse in Excel you get access to all the data behind the model, not only that you can create a table that fetch the data and bingo, it does honnor RLS, you see only your stuff, so the workflow will be something like this, for how to use analyse in Excel, Please read this excellent blog by Chris Webb

   Let’s see how to use RLS in PowerBIRLS is very vast subjects, and has many different implementations and nuance, in real life you need to have something like a hierarchy security like Primavera OBS, and you need to read this excellent series of blogs from Reza,  let’s say I want to grant view access to a couple of Projects to two users (Viewer @ projectscontrols and test@ projectscontrols ), those users are not necessary planners they don’t have access to Primavera

I just used Project user defined field and typed the full address, you need the email address as it is the format that PowerBI understand, the good thing about UDF, you can put multiple values separated by a comma, I pulled the following tables from P6 Database (UDFTYPE , UDFVALUE) and just using PowerQuery to split and unpivot, and I go this little tables.

Now I have the Project ID and the username of the users who can access the projects.

Edit the relationship between access table and Project tables

Manage role

Basically, table access get filtered by PowerBI, then it will filter table PROJECT, that will filter table TASK

Publish to PowerBI service to the owner account but not to  user Viewer as he  should only have access to view and build ( you need build to use the dataset in other reports), RLS works only with viewer role,

Now let’s see what the user viewer will see

Yes, only 7 projects are visible to that user, let’s try analysing in Excel,

Once I select table access, only the viewer gets selected

Now let’s query only the table TASK ( yes DAX is a query language too)

Voila we have the TABLE Task filtered, with only 7 Projects

Now you can add more tables and load those tables into another PowerBI datasets and do your own enhanced reports.

Reproducing Primavera Select Columns in PowerBI; aka Dynamically adding measures to a Matrix

TLDR, The online report is here

Primavera P6 calculated a lot of rich metrics for a schedule, we all know the basics, Start Date and Finish date, but there are all kinds of other values ( Cost, Labor hours, not Labor , Float Duration etc)

The challenge is, different people want to see different measures,  and once you publish your reports, the viewer can not change the visual, Microsoft is working on this feature, but it will take another couple of months to be released.

The trick was suggested by Kasper de Jonge and in a random chat with @DingbatData @_Ivan_Bond , they use it already to solve similar issues, actually it is very easy

Anyway the purpose to reproduce something like this from Primavera

  1. Create a new table that contains all the measures

Instead of copy and past all the measures, I just used DAX studio connected to my PowerBI desktop to generate a list of all the measures, read this link

The Category is to just to make it easy to select which measure to select, the index is to keep the same sort in the slicer, I don’t want to show actual Finish before Actual Start.

2- Create a Master Measure that check if the value is selected.

Using Switch and SelectedValue give the result, I had only to add some condition to format the results of Date to be show as date not Number, you can vote on this idea

Here is a snapshot for the first 4 measures ( currently I have 29 in the models)

Selected_Measure =
SWITCH (
    SELECTEDVALUE ( dummy_meaures[MEASURE_Values], BLANK () ),
    “Task_Count”, [Task_Count],
    “Budget Labor Units”, [Budget Labor Units],
    “Actual Finish”, SWITCH (
        [Actual Finish],
        BLANK (), BLANK (),
        FORMAT ( [Actual Finish], “mm-yy-yy” ),

    “%_Labor_units”, SWITCH (
        [%_Labor_units],
        BLANK (), BLANK (),
        FORMAT ( [%_Labor_units], “0.00%” )
    ))

3- Add the column Masure_Value to the column in the Matrix and Seletced_value in the values Area.

4-Voila

5-Bonus

I am using bookmarks to toggle the select Columns Button, I think it is freaking cool,  hopefully in the short term PowerBI add more features so we can build not only dashboard but real application interface.

Connecting PowerBI to Primavera Database – Part 2 (WBS report)

in the first blog post of this series, we showed how to connect to Primavera SQL server, in this blog we build our first report, we use only the three tables TASK, PROJECT, and PROJWBS

you can download the pbix here, an online report is published here

I will not show the details of every steps, you need to have a basic understanding of PowerQuery and DAX, but i will highlights some aspects of Primavera Database schema that you should be aware of.

Reproduce Project View

this view show all the projects grouped by WBS and show measures, start date, finish date and budget labor units

and here is the equivalent report in PowerBI

PROJWBS

the table PROJWBS store the EPS/WBS data for all projects, it is represented the database as a parent, child ID, which can’t be used directly by PowerBI, first we need to flatten the data to multiple levels so we can show it in a matrix visual, so basically moved from this format

to this format

you have multiple options either using SQL , DAX or PowerQuery, for Powerquery here is an excellent resource by the Imke Feldmann , for our example I am using DAX , the canonical reference is by Marco Russo

just make sure when you import a table from SQL server to have this filter [delete_session_id] = null , because Primavera don’t directly delete data, but instead have something called soft delete, ie; the items is not shown in the client but it is still in the database and will be deleted later, anyway for PROJWBS remove all the template WBS, (I think they are used by the EPPM web client)

TASK

task is straightforward it save all the tasks of the project, same filter [delete_session_id] = null

PROJECT

we use project table only to filter the baseline out, in Primavera current project and baseline are saved in the database exactly the same way ( that very powerful paradigm ), but for our report we want to show the activities only for the current project, too easy [orig_proj_id] = null and the best part, we don’t have to write any queries, Powerquery simply generate the SQL for the database ( that’s awesome)

Simple Data Model

the two tables are connected by the field wbs_id, we added another copy of the table TASK as a dimension table for reporting ( just activity id, and activity name), and we have this simple data model, I like to save measures in a separate dummy tables

as you can see, building a data model is relatively easy, the complexity start when you want to add more measures, for example, total float, you need to connect to the table CALENDAR, if you want cost, you need to connect to other tables, and if you want spread it will become a little trickier ( hint it is not saved in the database)

hopefully by now, instead of asking how to connect to Primavera Database, the interesting question become, in which table the data is saved and how to join two separate tables to get the report you want

if you are still reading, I will appreciate if you can vote on this idea, unfortunately you can’t dis-activate table sorting in PowerBI, in this particular report, the sorting is already defined by the WBS, if the user click on the header, the order will change, they can still reset the order using the measure sort, still very annoying

Connecting PowerBI to Primavera Database, Part 1

I think one of the most asked question when some talk about Primavera and PowerBI, is how to connect to the database, ok, the good news is, the connection itself is easy, the bad news, extracting useful information is a bit of work.

Just to show how it work, I am using a temporary installation in my personal laptop, as obviously I don’t have access to my production database.

I am using a developer edition of SQL Server 2006, and an evaluation copy of EPPM, oracle allow the use the evaluation of most of its software for the first 45 days, you can download a copy from here, you need SSMS too

For the purpose of this blog, we will query the “normal” Primvera tables, for the extended schema, which is a groups of tables and  views design specifically for reporting, but those extra tables are empty per default and you need to configure publishing service ( will discuss it in a future blog), please note I already blogged about how to connect when using Sqlite in the case of standalone P6 professional

Connect to SQL server using SSMS

When you install Primavera, you get to define 4 user account

  • sa : the database admin account (not the admin for primavera application).
  • Privuser, pubuser : used to connect Primavera app to the database
  • Pxrptuser : user account for reporting

              We will use sa to connect to the database            

When you click on connect you get this

The database itself has 320 tables; you can check that by running this SQL script

USE PMDB

GO

SELECT *

FROM sys.Tables

GO

Create a read only user

Connecting using the admin account is just very bad practise, and I don’t want to mess with the existing account, so instead we will create a read only user account

  1. Create a New Login
  • Create password
  • Map the user the PMDB
  • Assign a new role

Instead of having access to the 320 tables, we create a new role (read_only) and we just assign the 3 most important table in the database, you can add later more tables, we granted select only, so no read access

Connect PowerBI to SQL using read only user

and Voila our Tables are now visible in PowerBI

so the answer to how to connect to Primavera Database from PowerBI is you need a user name, password and the server name, the challenge is how to extract meaningful reports from those tables ?

what’s next

at this stage, you need to get yourself familiar with Primavera Schema, yes it is 320 tables, but the basic one are three, and usually for my reporting I use around 10, I wrote an introduction to Primavera schema 6 years ago, I hope it is still relevant

Part 2 is published here

For security implication please read this

Using WebJobs to scrap public website and copy data to Azure blob storage.

When I start that AEMO Dashboard , I had a hard time dealing with PowerBI gateway, it is just setting there, my laptop has to be  online whenever I need to schedule a refresh, it just annoyed me, and I could not understand how cloud based data needs on-premises gateway anyway,   obviously later I learned that strictly speaking it was not required, there was just undocumented feature to get away of it ( the trick is in the first blog, thanks @Rad_Reza ).

but before I was aware of that, I went to some rabbit holes dealing with new tools that are out of my comfort zone, and I think they are worth sharing.

my first thought was instead of accessing the data directly from the website, let’s instead copy the data to a cloud storage then read it from there, I have already a google storage account, it is very generous with a free 5GB storage, my data is not big around 2 GB of zipped csv..

first setback, there is no native connector to Google storage and even if there was ,we have something called egress fees, in a nutshell, cloud storage is really low cost, loading data is free, but getting your data out is not free, unless it is for the same provider and the same region, most of the cloud vendors use the same model,  as my data will be processed in PowerBI, the clear choice is azure blob storage

Azure Blob Storage

the setup is very simple I used the following options :

  • the same region as my PowerBI region ( otherwise your pay the egress fees)
  • for replication I used LRS 

as PowerBI don’t support data lake V2, I used the classical Blob Storage.

Let’s move some data 

anyway, now I have my storage, I need a tool to copy the files from here http://nemweb.com.au/Reports/Current/Daily_Reports/ to my storage account.

Azure data factory

 When you want to copy data, the official tool in azure is data factory, I tried to play around with copy activities, it is straightforward, my first attempt did work and it was fast , actually too fast 😊, no zip was transferred but rather an  HTML 

probably copy data just handle this case just fine, but when you use your own credit card on a cloud tool and you don’t know what you are doing, better stay back and take the time to understand how it works, I deleted the new created resources and went to the second option, Python !!!

 PYTHON

Normally I go with R but blob storage has no API for R, I have very limited experience with Python , just using it for the excellent package altair , let’s try something new.

I was very pleasantly surprised, the amount of documentation for Python is just amazing, actually once I asked a question on stackoverflow and got a very succinct answer in less than a minute, no one was judgemental or downvoted my question ( the question was very basic). the only drawback is that sometimes the code works well for python 2, but I am using Python 3 anyway enough talking let’s show some pseudo code.

step 1 : get a list of files name from the web site

url = “web address where the files are saved”

result = urlopen(url).read().decode(‘utf-8’)

pattern = re.compile(r'[\w.]*.zip’)

here is a snapshot of the results, the full list is 60 items.

[‘PUBLIC_DAILY_201904260000_20190427040503.zip’, ‘PUBLIC_DAILY_201904260000_20190427040503.zip’, ‘PUBLIC_DAILY_201904270000_20190428040502.zip’]

step 2 : get a list of files name from the blob storage

 in the first run, the list is empty as we did not load anything yet, I load a couple of files manually just to test if it is working, the API for blob storage are very simple, you only need to provide your storage account name and key and  I love that.

block_blob_service = BlockBlobService(account_name=’’xxxxxx’,                                                          account_key=’xxxxxx’’)

generator = block_blob_service.list_blobs(container_name,prefix=”current/”)

the same you get a list of names.

step 3 : get a list of files that exist in web  and don’t exist in the storage

the code in Python is very simple, it is simply substraction of two sets, and then you converted to a list using function list ( i get why people like Python)

files_to_upload = list(set(List_website)-set(list_azure))

step 4 : Upload the new files to Azure Blob Storage

the same here, the Azure API are very simple and clear, I had only when issues, when the script upload in a loop, it does not wait until the transfer is completed before jumping to the next file, my workaround was just to use sleep ( sync is supported but not in this scenario where the input is from an url), anyone i got the answer in stackoverflow

for x in files_to_upload:

    block_blob_service.copy_blob(container_name,x,url+x)

    copy_status = block_blob_service.get_blob_properties(container_name,x)
    #use code below to check the copy status, if it’s completed or not.

    while(str(copy_status.properties.copy.status) != “success”):

        copy_status = block_blob_service.get_blob_properties(container_name,x)

basically wait till the status of the copy is success before moving to the next item, ( did I say I love Python syntax)

the full script is here 

 WebJobs; a Free Job Schedule  

ok, so we do have a script that works, now we need to run it on a schedule, once per day at 5 AM,  keep in mind the whole purpose of this workflow is not to use on-premise software, I just need to find the service that runs a script on the cloud on a schedule, as I am already on azure, let’s stick in that ecosystem.

and it is a personal project, I prefer a free solution,  my script runs only every 24 hours, for a couple of minutes,  a quick google search and i find this little treasure, I will not repeat here the steps, WebJobs is a service that just do that.

note that the package azure-blob-storage is not a base package in Python you need to install first in WebJobs, the schedule functionality is very flexible as it is using CRON, I wish we had something like that in PowerBI Dataflows.

End results 

Every day at exactly 5 AM, a new file show up in the azure storage, although I don’t need those files, I am using now another approach to load the files directly in PowerBI, it is important to build a data lake ( yes, I just said that, I am just joke, data lake is folder in the cloud where you save the raw files, nothing more), storage is cheap but most importantly the requirement may change, I may need to report on another dimension and it is crucial to keep the raw unprocessed data.

Take away

  • Python is awesome
  • Azure API for python are straightforward
  • Azure is awesome.
  • Be careful of Egress fees
  • CRON is awesome wish it was supported in PowerBI dataflows.
  • Wish PowerBI dataflows could save a raw file, Powerquery is amazing but it does not copy raw files.
  • Wish WebJobs add support to R

Tracking AEMO data using PowerBI

I was looking for the Power Production  of a particular solar farm, and I couldn’t find any public dashboard that show this level of details, all I could find was high level aggregated data (Later after I built the dashboard I found this excellent resources Nemlog)

The dashboard is published here  https://djouallah.github.io/AEMO-POWERBI/  , it is refreshed every day at 5 AM

Capture

How it works

Australian Energy Market Operator (AEMO) publish all kind of datasets,  one I believe is real time (require a  subscription ) but for my particular use case, I m interested in this dataset

http://www.nemweb.com.au/#daily-reports

there are two folders :

  1. Current, last 60 days of data ( current day not included, Updated at 4 AM)
  2. Archive : the last 13 Months of data ( current month not included, Updated Monthly)

Pulling data from a website and building a dashboard in PowerBI is straightforward,  it took me a couple of hours on a weekend to do it, the problem is how to maintain it.

Ideally, you build a dashboard and all the refresh is done by the service, which was not the case here

  • Pulling the data directly from the archive is very slow, it takes nearly 3 hours ( unzip, filters only the data we are interested in), and is not sustainable as the earliest month will be removed from the website, I like to keep the history, and it is really bad practise to download the same data every day
  • To keep the history, we need to save the archive somewhere else, too easy , just save it on a local laptop
  • History issues solved, now we created a new problem, on-premise data require a gateway, basically you need to install a software on your laptop, and obviously the laptop must be on when you do the refresh

After playing around of some options, I come up with this workflow

  • Create a local folder that contains all the archive files.
  • Create a PowerBI data model on the desktop just to process the archive data
  • Export to clean tables ( price and Production ) to CSV using DAX studio !!!!!!
  • Load the CSV to azure blob storage ( to get rid of the gateway)
  • Load the current zip files from the web site , it does not require the gateway, but you need the following consideration                                                                                 – Use relative path in web.content functions ( see Chris Blog) and @TheBIccountant 

Web.Contents("http://www.nemweb.com.au/REPORTS/CURRENT/",[RelativePath = "Daily_Reports/" ])

Don’t use Web.Page function but parse it using XML or csv , ( Thanks Reda Rad for the advise)

so you can use something like this

Table.FromColumns({Lines.FromBinary(Web.Contents("http://www.nemweb.com.au/REPORTS/CURRENT/",[RelativePath = "Daily_Reports/" ]), null, null, 65001)})

  • Append the data from azure blob storage and the current folder from the web site, the refresh is now very fast, as PowerBI just read the csv without any transformation
  • Publish to web

Good so far, I manage to get rid of the gateway, the dashboard is refreshed automatically in the service, no maintenance for 60 days.

as the current folders contains data for the current 60 days only, you need to update the initial CSV files.

  • Download the pbix from the service, export the csv , and upload to blob storage, you need to do that only once every 60 days.

PRO

  • PowerBI Publish to web is an amazing service and it is totally free
  • Powerful solution without writing any codes
  • PowerBI free license is free 🙂

Cons

  • Publish to web is not suitable for real time, as it takes nearly 1 hours to propagate the update to the web site, that’s why I can’t publish the current day data, which is updated every 5 minutes.
  • Publish to web does not include export data from the visual
  • pricing for azure blob storage can be tricky : storage itself is very cheap, data upload is free, download in the same region is free ( for example blob to PowerBI service), but when you read data from the blob to PowerBI desktop you incurs charges, so just be careful, it is not your Onedrive model, where download is free.

we showed here a simple workflow using PowerBI free license and azure blob storage (Dropbox), it is very easy but with one inconvenient you need manual operation once every two months, that’s a bit annoying.

edit 23-June-2019 :after I published this blog, I got an excellent feedback from Maxim Zelensky, actually using PowerBI dataflows ( require a PRO license), we can fully automated the whole process, as with dataflows we can have a self reference query, I am not going to repeated here, go and read it

edit 24-June-2019: as it is a personal project, and the data is public, I am not really excited about using a paid service to host the CSV files, I moved the two csv files from blob storage to dropbox, it is totally free, so the whole dashboard infrastructure is free, Good work Microsoft

edit 26-June-2019 : a proper solution will be to save the raw data in a data lake, see here