Using QGIS to prepare Custom Maps in PowerBI

This blog post is to document my first experience with mapping in PowerBI, usually the reports we produce are time series and pivot tables, but in this case, there is a geographic dimension to the data, so I thought it is a good opportunity to try map chart in PowerBI, it turn out, it is a bit difficult than I thought.

So the data is in the thousand of piles, a lot of piles in a huge area, my first attempt was just to load the data in PowerBI and view it in a map, for some reason, PowerBI show an empty map.

PowerBI expect the data to be in latitude and longitude format, my the data is using easting and northing, I had to convert it, there are plenty of online converter, but there are not practical, as they don’t support batch processing, converting point by point is not an option.

After some googling, I find this excellent open source software QGIS, it was very straightforward, the software automatically convert the coordinates reference system (I think the technical term is reprojection), my data is GDA94 / MGA zone 55 and the result should be in WGS 84.

Voila the data is ready for PowerBI,

map-1

That’s a bit disappointing, PowerBI complain it cannot show all the points ( PowerBI has a maximum of 3500 points per chart).As a comparison, this is how Tableau show the data

tableau

Tableau doesn’t have a limitation on the number of points.

 

Alternative approach

 

As PowerBI cannot show all the point, one solution is to create a shape file that group the points into smaller areas, and again, it was trivial to be done in QGIS.

QGIS will group the point based on a filed you provide.

Group

 

sub-array

And volia

qgis

 

QGIS save the layer in ESRI shapefile, PowerBI require TopoJSON, I used the excellent tool mapshaper.org

Just make sure you import all the files not only .shp

export

Now the TopoJSON is ready to be loaded in PowerBI

topojson

 

Time to celebrate 🙂  not really there is a problem, I want to show different colour based on the status, if a sub-array is completed, I want it to show Yellow, if it is > 75 %, I want another colour and so on, the shape Map in PowerBI does not offer this option.

Fortunately Synoptic Panel   has more options, it use SVG as a map format, which I got from mapshaper.org

filled map

 

Preparing the custom map is a one time operation,  the color will change as per the construction progress, you need just to assign which measures you want to show

measures

What’s Next

As long as your data model is properly built,  you can do a lot of interesting stuff, you can filter by type of works (piling, Tracker installation, PV Modules), you can click on one sub-array and see in details what’s have been installed and what’s missing.

 

 

Advertisements

Build a Dashboard using Data Studio

Google datastudio is a new reporting service from google, it is still in beta, has rudimentary capabilities compared to established players ( PowerBI, Tableau and Qlik) and can be very slow, yet it has two killer features, it is free, with amazing sharing options.

Google business model for this product is very simple, the visualisation layer is free, but the preferred data source is BigQuery which is a paid product, but other source usage are free.

In the last two months, I have been experimenting with the product, I never managed to make it produces any meaningful charts, that was my mistake as I was thinking from a semantic model background, which is not valid here.

Data studio do not support multiple datasets, you can use multiple datasets in the same reports but not in the same chart, but you can filter multiple datasets using a common dimension, when solution is to create a master datasets that combine all the tables, if is the same concatenate table used in Qlik.

Now if you have a data workhouse you could simply write an SQL query to combined all the dimensions you want to report on, I don’t, so I will use what I have PowerBI desktop !! You can use R studio too, or any data management solution you are familiar with.

So the approach is build a data model in PowerBI desktop,  Query the model using Dax and export the result to CSV,  then load the CSV to Datastudio and share the report, Datastudio is only used as reporting service.

1- I will use this datamodel.

Data_Studio_1

2- And use this query to extract the dimensions and measures I want to report on


EVALUATE
VAR Earned_Table =
FILTER (
ADDCOLUMNS (
SUMMARIZECOLUMNS (
TBPAYCODE[Commodity_ID],
TBCommodity[Commodity],
TBWBS[WBS_L3],
TBDate[Project Cut Off],
"Budget", tb_Measures[budget_hours],
"Period", tb_Measures[Earned_Hours_Period],
"Cumulative", tb_Measures[Earned_Hours_Cumulative]
),
"TYPE", "Earned"
),
MIN ( TBDate[date] ) <= [Cut off] && MIN ( TBDate[date] ) >= DATE ( 2012, 1, 1 )
)
VAR Plan_Table =
FILTER (
ADDCOLUMNS (
SUMMARIZECOLUMNS (
TBPAYCODE[Commodity_ID],
TBCommodity[Commodity],
TBWBS[WBS_L3],
TBDate[Project Cut Off],
"Budget Hours", tb_Measures[budget_hours],
"Planned Hours", tb_Measures[planned_Hours],
"planned Hours Cumulative", tb_Measures[planned_Hours_Cum]
),
"TYPE", "Planned"
),
MIN ( TBDate[date] ) >= DATE ( 2012, 1, 1 )
)
VAR Spent_Table =
SELECTCOLUMNS (
FILTER (
SUMMARIZECOLUMNS (
//TBPAYCODE[Commodity_ID],
//TBCommodity[Commodity],
TBWBS[WBS_L3],
TBDate[Project Cut Off],
"Budget", tb_Measures[budget_hours],
"Period", tb_Measures[spent_Hours_Period],
"Cumulative", tb_Measures[spent_Hours_Cum]
),
MIN ( TBDate[date] ) <= [Cut off] && MIN ( TBDate[date] ) >= DATE ( 2012, 1, 1 )
),
"Commodity_ID", BLANK (),
"Commodity", BLANK (),
"wbs_l3", [WBS_L3],
"TBDate", [Project Cut Off],
"Budget Hours", [budget_hours],
"Planned Hours", [planned_Hours],
"planned Hours Cumulative", [planned_Hours_Cum],
"Table", "Spent"
)
RETURN
ADDCOLUMNS (
UNION ( Earned_Table, plan_table, spent_table ),
"Data_Date", IF ( [Project Cut Off] = [Cut off], 1, 0 )
)

Basically I am using union to concatenate the values for Earned Hours, Planned Hours and spent Hours, I added a column type to distinguish between the different tables and in the case of spent, I assign the value blank for Commodity_ID and Commodity as the measures don’t have the same granularity.

I added a column ‘Data_Date” so I can filter only the dates where it is equal to the cut off.

3-Load the CSV

Data_Studio_2

If a field aggregation is none, then it is a dimension, otherwise it is a metric (measure), now, there are some functions to do calculations like sum, average but it is not DAX, it is really limited.

As an example for the measure Planned Hours = SUM(CASE WHEN REGEXP_MATCH(TYPE, ‘^Planned.*’) THEN Period ELSE 0 END)

4-build the report

Data_Studio_3

You can view the report here , and the datasets here , (sorry does not support internet Explorer, Only Chrome, Firefox, and Safari,  Internet Explorer works now, although not officially supported)

Personally the most interesting aspect of Data Studio is the integration with Google Drive, the reports you build will show in your Google drive and you can organize them in different folders, for example then you can group reports by departments, shop etc

Data_Studio_4

I think it has potential, they got the sharing right, now they need to add some basic features ( pivot table, controlled cache ….) , and more options in the calculated field formula, ( hint return null for example is not supported,  ok there is a workaround, Just create Metric*Metric/Metric, if it is 0 then it is an error, GDS return null in this case) and you will have data visualisation for the masses.

Using Powerquery List.Dates to generate time phased Budget and account for holidays

There are plenty of templates on the internet on how to spread a budget using Excel formula, in this blog post we will use the Powerquery function “List.Dates”, and a calendar table to account for holidays and periods with less working hours, for example under extreme weather conditions, the working hours per day will be less than standard working hours.

Obviously Planning software handle this scenario very well, but contrary to popular belief, Excel is still the most widely used planning software among non-planners. (I have yet to see a cost controller or QS using a planning software, even MS Project).

The function is straightforward:

List.Dates(start as date, count as number, step as duration)

In this case, count will be the durations in days, and step will be 1 day

Let say you have a list of tasks, with a start and finish date, and some kind of budget, (cost, revenue, quantities, Effort required etc).

Budget_0

Let’s load the table Task and generate the spread.

  • First we calculate the duration ( Finish-Start)
  • We use the List.Dates to generate the list of dates between the start and finish of every task.
  • Expand the list to become a column.
  • Merge the list of date with the calendar table to get the working hours per day.
  • Now we sum the total working hours per id.
  • The % by day = working hours per day/ Total working hours per task.

Budget_1

Voila, you have % per day.  Now to spread the budget, just multiply the budget by % per day.

Then we load some dimensions table in order to slice and dice the pivot table and WE add a measure to calculate cumulative Budget time. As we are using a calendar table, we can show the budget per day, Month, year or any time period.

Here we got a nice little Star schema

Budget_2

 

 

 

Now a Pivot table with conditional formatting will show a Gantt like chart.

Budget_3

And a pivot chart to show the Budget per Period and the cumulative.

Budget_4

You can download the template here, you need Excel 2013+, if you don’t have it, you can always download PowerBI desktop for free, and import the Excel file, but you need to redo the charts though.

 

Integrated Project Control system using PowerBI

One of the most popular discussion in planning forum is how to have an integrated project control system, every practitioner has a different opinion how it must be done, and of course you get a lot of marketing from people trying to sells their systems

In this blog, we share high level description of a data warehouse built using PowerBI desktop.

Data warehouse are not widespread in the construction industry, because the reporting specifications are different from project to project, and every client has a different systems and tools, and trying to have a standard system works only if you are the main contractor but if you are a subcontractor you have to adopt the client system.

Another reason is; it required a specialised IT skills, we are just business users not programmers, we do understand data very well, but not necessarily having the skills and tools to manage it, The good news is, with the rise of self-service Business intelligence, we have exactly that, Powerful data management tools yet accessible (assuming you want to learn something else than Excel).

So at high level this is how it works.

 Project Controls Data Warehouse

  • The Data warehouse was built using PowerBI desktop, I know it should be called sematic model, (for me data model, data warehouse are fundamentally the same thing), initially it was using Excel PowerPivot but it did not scale well with the increase of the volume of data.
  • As the data is not always in the format we want, PowerQuery is very handy in this case, as virtually it can transform any source of data, example lookup the subsystem using the tag field, trying to do that if you have 8 Million rows using Excel or Access is not feasible.
  • We maintain Master tables to integrate all the different source of data (tags, WBS, subsystems etc)
  • Every week, we get new Export from the source systems (Cobra, proprietary database systems etc), we load the new data and keep the historical records, it took 15 minute to refresh, which is quite impressive, Cobra alone is a folder 60 Excel file, and nearly 2 Giga in size.
  • Usually you publish your reports into PowerBI.com service to end users once your refresh your data model, in our case we can’t use the cloud for privacy reason, instead we use Excel as a reporting tool that pull the data from PowerBI desktop, the advantage of this approach is that we have different reports for different users, Skyline, Gantt chart, Client reports (in their required format), management reports etc.

Some thoughts.

  • As you can see Primavera P6 is used only as a forecasting tool and to calculate the critical path, the earned value calculation is done in the data model, personally I think P6 should not be used as the centre of your project control system, I remember the first time I start learning Primavera P3 ( a long time ago:), we kept asking the trainer how it is possible to track the spent hours at the activity level, the answer is we don’t, actually deciding at which level you track you spent hours it is the most important decision to make when you start a new project.
  • the basic idea here is in order to have an integrated project controls system is stop trying to have one, data will be always in silo, don’t try to change other department how they manage their specific data, it will not work and they will not listen to you anyway, So instead of trying to have one system to rule them all, just use the existing systems and build a data warehouse for reporting and Integration with P6.

Mapping Primavera P6

If you work for the main contractor or doing an EPC contract, then there is already a well-established project controls systems and procedures applied enterprise wide, all the projects use the same approach and methodology, all you have to do is just learn it and fellow the procedures.

Now when you work for a subcontractor it become tricky, although there are already systems and procedure in place, you need to fellow the main contractor specifications.

In this blog, I will share a generic approach that is agnostic to the client’s tools, it is not a step by step tutorial as every client has a different systems and terminology, but I hope you can adapted it to your particular case.

First you can implement the mapping system in any database systems you are familiar with, you can use MS access, SQL server or any open source database,

I will use Cobra as the progress measurement systems and Primavera as the planning tool, this is only an example, any other combination will works, as we exchange data using Excel and we use activity codes as a reference,  all the planning tools support some kind of activity codes.

Why do we need mapping in the first place?

Updating a schedule is a time consuming task!! And it is prone to errors and personal judgement of the planners involved,  generally  the update process involve two steps.

  • Updating the facts ( Actual Start, Actual Finish, Earned Hours , remaining duration)
  • Review the update and analyse the results (critical Path, Forecast Curve) and check for any discrepancies, and thinking of mitigation plans if there are slippage.

If step 1 is done manually then obviously the planner will have less time for step 2, which I think is the most important Task.

I think a better approach is to make step 1 fully automated and let the planners do step 2 which cannot be automated (unless computer become self-aware).

 

How to do the mapping then.

  • Manual, most used approach, the planner open the progress spreadsheet and manually copy the hours and the new status, either by filtering or a pivot, or on ad hoc basis (nice word to say guess work).

 

  • Primavera activity ID is manually assigned to cobra, Cobra then will make an export with the new status grouped by activity id, the disadvantage of this approach is, it is hard to maintain when the take-off change which is natural fact of life in construction projects and there is no transparency why the mapping was done in such way, still it is way much better than the first one.

 

  • Dynamic mapping: instead of manually assigning activity id in cobra, we define a new primary key in P6 that based on fixed rules will automatically generated link to Cobra.

How this dynamic mapping works:

Capture

We don’t use Primavera activity id, instead we define a new natural key, let’s call it P6_PK, this key define the exact scope of work of one activity, in this example we have two type of activities:

We can use this P6_PK as the activity id, but if you have already a baseline, you can’t touch the activity id and sometimes the default length of the activity is too short.

Let’s define some terms:

Activity Level 3: P6_PK  is  Area|P6 price code

Activity Level 4: P6_PK  is  Group|P6 price code

– Group: a group of tags that have common characteristic: by subsystems, major package, cables grouped by destination, or origin or whatever the schedule is grouped by. Or simply how you think the work will be done.

– P6 price code: the MTO has a very detailed breakdown for the price, obviously we can’t use them otherwise we end up with a massive schedule, instead we group the code to a manageable level.

In P6 (or MS Project or whatever you are using) define new codes, Area and P6 price code.

Create two tables.

P6 price code: define how you group your MTO Price code to P6 price code

Group:    define how you want to group your tag.

The Database will look up the group from the group table and P6 price code from your P6 Price code, then generate P6_Pk from that then we get the activity id.

Obviously I just outlined the fundamental concept, in real project, it is much complex, P6 price code may have different level of details in the same schedule, and some tags may have different groups based on the type of activities, we may keep excavation at the level 3 but pouring concrete for foundations at level 4.  (Hint you can new dimensions to your dictionary tables)

Please notice I use Level 3/4 just for the purpose of mapping, it is not to reflect the level of the schedule which is an open question. (Two planners will tell you different thing)

 

The output of the system:

  • The new status of the activities. ( import back to P6)
  • The new scope that are not mapped.( update the new groups and price code eventually)
  • The activities that used be mapped but now lost the scope ( statue them as completed, as there is no more scope)

The time to update P6 was significantly reduced (by order of magnitude), the system was implemented using PowerPivot/Power Query as filtering and manipulating huge list of data is more convenient in Excel.

 

The challenge

The biggest challenge was how to combined all the weekly export from cobra in Excel and combined them in one table and add other data from other systems to get a format required for the mapping, if you are an SQL ninja it is doable but if you are a planner what you do, there was no easy answer before, but now we have it, Build a data-warehouse in PowerBI, but that’s a blog for another day.

How to connect to Primavera SQLite database

if you have a standalone installation of Primavera using SQLite, and you want to see the data stored there, either for reporting or just by curiosity to study the schema, it is pretty easy to do that using Powerquery.

unfortunately there is no native connectors to SQLite but you can use ODBC, please fellow this tutorial how to download and install the drivers ( sorry you need admin right, that’s why you should be always nice with the IT).

we don’t use standalone installation at work, I just download one installation from Oracle site in my home computer, the installation is valid for 1 month, once it is installed i copied the SQLite file generated by Primavera with all the template Projects.

the connection string is like this

Source = Odbc.DataSource(“driver={SQLite3 ODBC Driver};database=C:\sqlite\PPMDBSQLite.db;longnames=0;timeout=1000;notxn=0;dsn=SQLite Datasource”, [HierarchicalNavigation=true])

use the same password and username used for installation

don’t query directly the file, but use a copy, there is a risk that you corrupt your file.

Capture

keep in mind the tables in Primavera are highly normalized, and the spread data are not stored ( for example forecast labor units per day/task), so depending on your reporting requirements it can be a non trivial task.

anyway My PowerBI template now use either xer or sqlite to generate reports.

Using PowerQuery to read Primavera Xer file

You can download the file here  or view the reports online

The last time I tried parsing an Xer file, I was using VBA and it was a horrible experience, anyway that was three years ago, The problem was not VBA per se, but supporting the different combination of Excel and Windows, I still receive sometimes emails from users complaining that Xer reader do not work for them, after a while I concluded it is not worth the effort.

Yesterday for no reason, I got this idea why not trying Powerquery , in 10 minutes I had a working prototype without writing a single line of code!!!  even better it reads in bulk, copy a bunch of xer in the same folders, and there are loaded, only conditions they needs to have the same revisions( don’t mix P6 rev 7 with rev 8), as different revisions of P6 introduce some minor changes in the Xer schema, ( they add or remove some columns)

It is only a proof of concepts, I only loaded some tables, (activities and activities code), the data is loaded in the Data Model, and you need at least Excel 2013 with the free PowerQuery Installed.

1

Make sure you write the path to where your Xer is stored

2

You need to refresh the Queries here

6-6

Here is a pivot with some measures, now you can group only by Project and Activity codes and obviously by activity id.

6-3

I test it with two Xer that contains 47 projects