in the first blog post of this series, we showed how to connect to Primavera SQL server, in this blog we build our first report, we use only the three tables TASK, PROJECT, and PROJWBS
you can download the pbix here, an online report is published here
I will not show the details of every steps, you need to have a basic understanding of PowerQuery and DAX, but i will highlights some aspects of Primavera Database schema that you should be aware of.
Reproduce Project View
this view show all the projects grouped by WBS and show measures, start date, finish date and budget labor units
and here is the equivalent report in PowerBI
the table PROJWBS store the EPS/WBS data for all projects, it is represented the database as a parent, child ID, which can’t be used directly by PowerBI, first we need to flatten the data to multiple levels so we can show it in a matrix visual, so basically moved from this format
to this format
you have multiple options either using SQL , DAX or PowerQuery, for Powerquery here is an excellent resource by the Imke Feldmann , for our example I am using DAX , the canonical reference is by Marco Russo
just make sure when you import a table from SQL server to have this filter [delete_session_id] = null , because Primavera don’t directly delete data, but instead have something called soft delete, ie; the items is not shown in the client but it is still in the database and will be deleted later, anyway for PROJWBS remove all the template WBS, (I think they are used by the EPPM web client)
task is straightforward it save all the tasks of the project, same filter [delete_session_id] = null
we use project table only to filter the baseline out, in Primavera current project and baseline are saved in the database exactly the same way ( that very powerful paradigm ), but for our report we want to show the activities only for the current project, too easy [orig_proj_id] = null and the best part, we don’t have to write any queries, Powerquery simply generate the SQL for the database ( that’s awesome)
Simple Data Model
the two tables are connected by the field wbs_id, we added another copy of the table TASK as a dimension table for reporting ( just activity id, and activity name), and we have this simple data model, I like to save measures in a separate dummy tables
as you can see, building a data model is relatively easy, the complexity start when you want to add more measures, for example, total float, you need to connect to the table CALENDAR, if you want cost, you need to connect to other tables, and if you want spread it will become a little trickier ( hint it is not saved in the database)
hopefully by now, instead of asking how to connect to Primavera Database, the interesting question become, in which table the data is saved and how to join two separate tables to get the report you want
if you are still reading, I will appreciate if you can vote on this idea, unfortunately you can’t dis-activate table sorting in PowerBI, in this particular report, the sorting is already defined by the WBS, if the user click on the header, the order will change, they can still reset the order using the measure sort, still very annoying
I think one of the most asked question when some talk about
Primavera and PowerBI, is how to connect to the database, ok, the good news is,
the connection itself is easy, the bad news, extracting useful information is a
bit of work.
Just to show how it work, I am using a temporary
installation in my personal laptop, as obviously I don’t have access to my
I am using a developer edition of SQL Server 2006, and an evaluation copy of EPPM, oracle allow the use the evaluation of most of its software for the first 45 days, you can download a copy from here, you need SSMS too
For the purpose of this blog, we will query the “normal” Primvera tables, for the extended schema, which is a groups of tables and views design specifically for reporting, but those extra tables are empty per default and you need to configure publishing service ( will discuss it in a future blog), please note I already blogged about how to connect when using Sqlite in the case of standalone P6 professional
Connect to SQL server using SSMS
When you install Primavera, you get to
define 4 user account
sa : the database admin account (not the admin
for primavera application).
Privuser, pubuser : used to connect Primavera
app to the database
Pxrptuser : user account for reporting
use sa to connect to the database
When you click on connect you get this
The database itself has 320 tables; you can
check that by running this SQL script
Create a read only user
Connecting using the admin account is just very bad
practise, and I don’t want to mess with the existing account, so instead we
will create a read only user account
Create a New Login
Map the user the PMDB
Assign a new role
Instead of having access to the 320 tables, we create a new role (read_only) and we just assign the 3 most important table in the database, you can add later more tables, we granted select only, so no read access
Connect PowerBI to SQL using read only user
and Voila our Tables are now visible in PowerBI
so the answer to how to connect to Primavera Database from PowerBI is you need a user name, password and the server name, the challenge is how to extract meaningful reports from those tables ?
at this stage, you need to get yourself familiar with Primavera Schema, yes it is 320 tables, but the basic one are three, and usually for my reporting I use around 10, I wrote an introduction to Primavera schema 6 years ago, I hope it is still relevant
When I start that AEMO Dashboard , I had a hard time dealing with PowerBI gateway, it is just setting there, my laptop has to be online whenever I need to schedule a refresh, it just annoyed me, and I could not understand how cloud based data needs on-premises gateway anyway, obviously later I learned that strictly speaking it was not required, there was just undocumented feature to get away of it ( the trick is in the first blog, thanks @Rad_Reza).
but before I was aware of that, I went to some rabbit holes dealing with new tools that are out of my comfort zone, and I think they are worth sharing.
my first thought was instead of accessing the data directly from the website, let’s instead copy the data to a cloud storage then read it from there, I have already a google storage account, it is very generous with a free 5GB storage, my data is not big around 2 GB of zipped csv..
first setback, there is no native connector to Google storage and even if there was ,we have something called egress fees, in a nutshell, cloud storage is really low cost, loading data is free, but getting your data out is not free, unless it is for the same provider and the same region, most of the cloud vendors use the same model, as my data will be processed in PowerBI, the clear choice is azure blob storage
Azure Blob Storage
the setup is very simple I used the following options :
the same region as my PowerBI region ( otherwise your pay the egress fees)
for replication I used LRS
as PowerBI don’t support data lake V2, I used the classical Blob Storage.
When you want to copy data, the official tool in azure is data factory, I tried to play around with copy activities, it is straightforward, my first attempt did work and it was fast , actually too fast 😊, no zip was transferred but rather an HTML
probably copy data just handle this case just fine, but when you use your own credit card on a cloud tool and you don’t know what you are doing, better stay back and take the time to understand how it works, I deleted the new created resources and went to the second option, Python !!!
Normally I go with R but blob storage has no API for R, I have very limited experience with Python , just using it for the excellent package altair , let’s try something new.
I was very pleasantly surprised, the amount of documentation for Python is just amazing, actually once I asked a question on stackoverflow and got a very succinct answer in less than a minute, no one was judgemental or downvoted my question ( the question was very basic). the only drawback is that sometimes the code works well for python 2, but I am using Python 3 anyway enough talking let’s show some pseudo code.
step 1 : get a list of files name from the web site
url = “web address where the files are saved”
result = urlopen(url).read().decode(‘utf-8’)
pattern = re.compile(r'[\w.]*.zip’)
here is a snapshot of the results, the full list is 60 items.
step 2 : get a list of files name from the blob storage
in the first run, the list is empty as we did not load anything yet, I load a couple of files manually just to test if it is working, the API for blob storage are very simple, you only need to provide your storage account name and key and I love that.
step 4 : Upload the new files to Azure Blob Storage
the same here, the Azure API are very simple and clear, I had only when issues, when the script upload in a loop, it does not wait until the transfer is completed before jumping to the next file, my workaround was just to use sleep ( sync is supported but not in this scenario where the input is from an url), anyone i got the answer in stackoverflow
ok, so we do have a script that works, now we need to run it on a schedule, once per day at 5 AM, keep in mind the whole purpose of this workflow is not to use on-premise software, I just need to find the service that runs a script on the cloud on a schedule, as I am already on azure, let’s stick in that ecosystem.
and it is a personal project, I prefer a free solution, my script runs only every 24 hours, for a couple of minutes, a quick google search and i find this little treasure, I will not repeat here the steps, WebJobs is a service that just do that.
note that the package azure-blob-storage is not a base package in Python you need to install first in WebJobs, the schedule functionality is very flexible as it is using CRON, I wish we had something like that in PowerBI Dataflows.
Every day at exactly 5 AM, a new file show up in the azure storage, although I don’t need those files, I am using now another approach to load the files directly in PowerBI, it is important to build a data lake ( yes, I just said that, I am just joke, data lake is folder in the cloud where you save the raw files, nothing more), storage is cheap but most importantly the requirement may change, I may need to report on another dimension and it is crucial to keep the raw unprocessed data.
Python is awesome
Azure API for python are straightforward
Azure is awesome.
Be careful of Egress fees
CRON is awesome wish it was supported in PowerBI dataflows.
Wish PowerBI dataflows could save a raw file, Powerquery is amazing but it does not copy raw files.
I was looking for the Power Production of a particular solar farm, and I couldn’t find any public dashboard that show this level of details, all I could find was high level aggregated data (Later after I built the dashboard I found this excellent resources Nemlog)
Current, last 60 days of data ( current day not included, Updated at 4 AM)
Archive : the last 13 Months of data ( current month not included, Updated Monthly)
Pulling data from a website and building a dashboard in PowerBI is straightforward, it took me a couple of hours on a weekend to do it, the problem is how to maintain it.
Ideally, you build a dashboard and all the refresh is done by the service, which was not the case here
Pulling the data directly from the archive is very slow, it takes nearly 3 hours ( unzip, filters only the data we are interested in), and is not sustainable as the earliest month will be removed from the website, I like to keep the history, and it is really bad practise to download the same data every day
To keep the history, we need to save the archive somewhere else, too easy , just save it on a local laptop
History issues solved, now we created a new problem, on-premise data require a gateway, basically you need to install a software on your laptop, and obviously the laptop must be on when you do the refresh
After playing around of some options, I come up with this workflow
Create a local folder that contains all the archive files.
Create a PowerBI data model on the desktop just to process the archive data
Export to clean tables ( price and Production ) to CSV using DAX studio !!!!!!
Load the CSV to azure blob storage ( to get rid of the gateway)
Load the current zip files from the web site , it does not require the gateway, but you need the following consideration – Use relative path in web.content functions ( see Chris Blog) and @TheBIccountant
Append the data from azure blob storage and the current folder from the web site, the refresh is now very fast, as PowerBI just read the csv without any transformation
Publish to web
Good so far, I manage to get rid of the gateway, the dashboard is refreshed automatically in the service, no maintenance for 60 days.
as the current folders contains data for the current 60 days only, you need to update the initial CSV files.
Download the pbix from the service, export the csv , and upload to blob storage, you need to do that only once every 60 days.
PowerBI Publish to web is an amazing service and it is totally free
Powerful solution without writing any codes
PowerBI free license is free 🙂
Publish to web is not suitable for real time, as it takes nearly 1 hours to propagate the update to the web site, that’s why I can’t publish the current day data, which is updated every 5 minutes.
Publish to web does not include export data from the visual
pricing for azure blob storage can be tricky : storage itself is very cheap, data upload is free, download in the same region is free ( for example blob to PowerBI service), but when you read data from the blob to PowerBI desktop you incurs charges, so just be careful, it is not your Onedrive model, where download is free.
we showed here a simple workflow using PowerBI free license and azure blob storage (Dropbox), it is very easy but with one inconvenient you need manual operation once every two months, that’s a bit annoying.
edit 23-June-2019 :after I published this blog, I got an excellent feedback from Maxim Zelensky, actually using PowerBI dataflows ( require a PRO license), we can fully automated the whole process, as with dataflows we can have a self reference query, I am not going to repeated here, go and read it
edit 24-June-2019: as it is a personal project, and the data is public, I am not really excited about using a paid service to host the CSV files, I moved the two csv files from blob storage to dropbox, it is totally free, so the whole dashboard infrastructure is free, Good work Microsoft
edit 26-June-2019 : a proper solution will be to save the raw data in a data lake, see here
Feel free to provide comments directly to my LinkedIn Post that references this article and contact me directly – Darrin Kinney
Analyzing resources in P6 is a common responsibility of all planners. However, ultimately the data typically stays in the realm of the planner, and never properly given to the specific project managers and project engineers who actually need to execute the work. Thus, there exist a huge opportunity in the digital world to extract data from P6, and present it to the masses. Planners are penultimate professionals in this. We live for this. We take the resource assignment data from P6 and work our magic in excel creating a suite of reports and graphs (s-curves).
However, all the excel work is customized. The ability to quickly drill into the data from the data table or the graph is just too difficult for management. The need to create custom graphs, takes time. Instead, we can create something quite amazing using PowerBI.
Using PowerBI we can create a simply easy to use dashboard that provide nearly unlimited flexibility to display both schedule and resource information to any user on a project. The below is a walk through to create a simple view with a simple schedule. However, I have run this through a schedule with 5,000 activities and multiple resources including both manhour resources and quantity resources. It is situations where you have data overload that PowerBI shines. Get the data out from the planner, and into the hands of the project management group.
Step 1 – Get resource assignment data
Using the resource assignment tab in P6, remove all the grouping and just display everything. You will want to ensure some key fields are available: Start Data, End Data, Resource Data, Resource ID and Type, Activity Name and ID, plus some WBS and grouping data (ex contract).
Step 2 – Copy-Paste Resource Assignment Data into Excel
A simply copy-paste from P6 into excel will suffice. It is really that easy. Sometimes the data from P6 may not contain the right descriptions, so this is an opportunity to use some vlookups (or Index(Match()). The data also needs to be presented in a table. Select All and use CNTL-T to convert to a table.
Step 3 – Unpivot Data using Power Query
This step can be done directly inside PowerBI Desktop, or in perhaps a more flexible excel environment. I prefer to do as much data handling in excel to reduce the complications once inside PowerBI; however, many options exist.
Inside Excel (ensure you have power query add in), select the columns with the dates, and click “UnPinvot Columns”. Close and Exit.
After running the unpivot, you will see each data column turned into a row. This is an easier data format for use in databases and a trick universal not just for powerBI, but anytime you might be dealing with databases.
Step 4 – Import Into PowerBI
PowerBI has very quick and easy import routines. This post is not meant to be a specific click by click guide. There are easy import routines you can find to import excel files. Again, as indicated above, just make sure your data is converted into a table. You can directly create a datasource using something similar to the below
The parameters you will want to use are roughly as noted below
The PowerBI file use AREA CHART. This is an easy to use built in graph feature of PowerBI. You can use potential extension of line graphs to show a % Complete curve. However, from a base usage, the area chart provides the easiest visual.
A nice trick in the overall dashboard is to align the data range of the GANTT with the GRAPH. I have found some limitations to this exist, but in this example, it worked quite nicely.
The real power of this visual is through the use of slicers. Slicers allow for immediate filtering based on a variety of selections. This visuals uses a range of filters for Contract, Facility, SubFacility and ResourceID. Depending on the structure and data you extracted from the P6 resource assignment, your options here are unlimited.
This example just scratches the surface of what is possible. I found the above to be immediately useful to our project team in clearly visualizing the resources required over time. There is some manual effort to keep the source excel updated after changed to P6 – this is not a live linked datasource. There are numerous possible development extension people can use to customize this.
The biggest critique to the dashboard is the lack of visibility into the Baseline dates and plan curves. For this, we can only hope someone builds a more robust custom visual to accommodate this. In the end, happy planning.
The Interactive report is published here, you can download the PBIX here.
In PowerBI or Tableau or any BI solution by defaults when you have multiple slicers, the filter conditions is always, AND, in this particular case, the user want to filter a schedule based on a start date OR the end date OR the activity status, basically how to reproduce the following Primavera Filter using DAX
Using the normal slicers will not work as it will simply show the activities where all the conditions met, instead we will use some DAX techniques (disconnected slicers and filter using the result of a measure) to get a new behavior.
Let’s build a simple data mode.
1-The main table is task: it contains Activity ID, Activity Status, Start, Finish
2- add a date Table, Start_Date, don’t link it to any table, use it in a slicer , the table is generated using Powerquery,
3- add another Date Table, Mstdate, don’t link it either, use it in a slicer
4- add Activity status Table, don’t link it, , use it in a slicer