Load polygons geometry into PowerBI data model

Starting a new project that involve plotting a relatively big map ( 3000 polygons), it should be straightforward, as I have done it before, but a limitation in my prefered PowerBI custom visuals had an interesting result.

1-  synoptic panel: in my view it is the best visual to show custom map, ( floor plan, general layout etc), unfortunately not this time, my new map has 3000 polygons, when I tried to plot some attributes, I get the equivalent of Windows screen of death for PowerBI, too many values !!!

1

That’s bad, I filled a bug report, the author was kind enough to reply that it will be fixed in a next release, (something to do with incremental loading) fine let’s try other options

2- Shape map : (it did not even render the polygons correctly, and to be honest the visual did not receive any update since the first time I use it, 2 years ago)

3- Mapbox: showstopper, to load your own shape file and keep it private, you need a business plan.

I am stuck, I can’t plot a 3000 polygons map in PowerBI, let’s try Rstats.

4-Leaflet: is very versatile mapping engine, right now I use it in other projects (outside of PowerBI) and it plot 60K points and nearly 500 polygons in sub 2 seconds, but there is a problem, Leaflet output is HTML which is not supported under PowerBI, I never manage to make decent screenshot, something to do with the zoom.

5-ggplot2 + SF : quick google and got this code

library(sf)

library(tidyverse)

map_shp <- read_sf(‘C:/Users/mimoune.djouallah/test.geojson’) map =dataset %>%

left_join(map_shp, ., by = c(‘id’ = ‘id’)) %>%

ggplot() +

geom_sf(aes(fill = type),lwd = 0)

map

2

Great, I have my map now ( this is only a portion, I can show the whole layout as it is proprietary), there is only a little problem, code take 25 seconds to render, it is very annoying, especially as in PowerBI you touch anything, and all the visuals render again.

my first thought maybe the join between PowerBI dataset ( dataframe) and the geojson is slow !!! the beauty of R integration in PowerBI is: you click on R icon, a new window open with an empty dataset, then you add the columns you want to analyse, yes that simple no ETL, no messing with data, you have to the mighty data model.

Loading Geometry data into PowerBI data model ?

 

After reading the SF documentation, I learned all I need is a dataframe  with a column that store geometry data, too easy just use QGIS , save as csv and make sure to select geometry, WKT

Capture

Now I loaded the csv file to the data model and join it with the attribute

3

Now here is the new code, notice, we don’t load any external file, all data is from the data model

library(sf)

library(tidyverse)

map_file= st_as_sf(dataset, wkt = “geometry”)

map = ggplot(map_file) +

geom_sf(aes(fill = type),lwd = 0)

map

basically, you need just to tell SF which column represent the geometry , unfortunately it is still slow maybe now it is 24 seconds

Another round of googling and I find the issue ggplot2 is very slow to render, no problem, this is R after all, there are a least a couple dozen of packages that do the same thing 🙂

6-Pure SF

 Turn out SF is not only to manage Geometric dataframe but it can plot too,( who would thought that)

library(sf)

map = plot(st_as_sf(dataset, wkt =geometry”)[“type”])

map

 render in less than 2 seconds, happy dance  

 

Bonus point, Cross-filing geometry

 My purpose to load geometry was just to speed ggplot2, but hold on, the geometry is in the data model, not only the attribute change ( install quantities, category etc), I can even filter polygons on the fly !!!, this is quite amazing, I can load all the layers and filter out any shape as I want.

4

Using QGIS to prepare Custom Maps in PowerBI

This blog post is to document my first experience with mapping in PowerBI, usually the reports we produce are time series and pivot tables, but in this case, there is a geographic dimension to the data, so I thought it is a good opportunity to try map chart in PowerBI, it turn out, it is a bit difficult than I thought.

So the data is in the thousand of piles, a lot of piles in a huge area, my first attempt was just to load the data in PowerBI and view it in a map, for some reason, PowerBI show an empty map.

PowerBI expect the data to be in latitude and longitude format, my the data is using easting and northing, I had to convert it, there are plenty of online converter, but there are not practical, as they don’t support batch processing, converting point by point is not an option.

After some googling, I find this excellent open source software QGIS, it was very straightforward, the software automatically convert the coordinates reference system (I think the technical term is reprojection), my data is GDA94 / MGA zone 55 and the result should be in WGS 84.

Voila the data is ready for PowerBI,

map-1

That’s a bit disappointing, PowerBI complain it cannot show all the points ( PowerBI has a maximum of 3500 points per chart).As a comparison, this is how Tableau show the data

tableau

Tableau doesn’t have a limitation on the number of points.

 

Alternative approach

 

As PowerBI cannot show all the point, one solution is to create a shape file that group the points into smaller areas, and again, it was trivial to be done in QGIS.

QGIS will group the point based on a filed you provide.

Group

 

sub-array

And volia

qgis

 

QGIS save the layer in ESRI shapefile, PowerBI require TopoJSON, I used the excellent tool mapshaper.org

Just make sure you import all the files not only .shp

export

Now the TopoJSON is ready to be loaded in PowerBI

topojson

 

Time to celebrate 🙂  not really there is a problem, I want to show different colour based on the status, if a sub-array is completed, I want it to show Yellow, if it is > 75 %, I want another colour and so on, the shape Map in PowerBI does not offer this option.

Fortunately Synoptic Panel   has more options, it use SVG as a map format, which I got from mapshaper.org

filled map

 

Preparing the custom map is a one time operation,  the color will change as per the construction progress, you need just to assign which measures you want to show

measures

What’s Next

As long as your data model is properly built,  you can do a lot of interesting stuff, you can filter by type of works (piling, Tracker installation, PV Modules), you can click on one sub-array and see in details what’s have been installed and what’s missing.

 

 

Integrated Project Control system using PowerBI

One of the most popular discussion in planning forum is how to have an integrated project control system, every practitioner has a different opinion how it must be done, and of course you get a lot of marketing from people trying to sells their systems

In this blog, we share high level description of a data warehouse built using PowerBI desktop.

Data warehouse are not widespread in the construction industry, because the reporting specifications are different from project to project, and every client has a different systems and tools, and trying to have a standard system works only if you are the main contractor but if you are a subcontractor you have to adopt the client system.

Another reason is; it required a specialised IT skills, we are just business users not programmers, we do understand data very well, but not necessarily having the skills and tools to manage it, The good news is, with the rise of self-service Business intelligence, we have exactly that, Powerful data management tools yet accessible (assuming you want to learn something else than Excel).

So at high level this is how it works.

 Project Controls Data Warehouse

  • The Data warehouse was built using PowerBI desktop, I know it should be called sematic model, (for me data model, data warehouse are fundamentally the same thing), initially it was using Excel PowerPivot but it did not scale well with the increase of the volume of data.
  • As the data is not always in the format we want, PowerQuery is very handy in this case, as virtually it can transform any source of data, example lookup the subsystem using the tag field, trying to do that if you have 8 Million rows using Excel or Access is not feasible.
  • We maintain Master tables to integrate all the different source of data (tags, WBS, subsystems etc)
  • Every week, we get new Export from the source systems (Cobra, proprietary database systems etc), we load the new data and keep the historical records, it took 15 minute to refresh, which is quite impressive, Cobra alone is a folder 60 Excel file, and nearly 2 Giga in size.
  • Usually you publish your reports into PowerBI.com service to end users once your refresh your data model, in our case we can’t use the cloud for privacy reason, instead we use Excel as a reporting tool that pull the data from PowerBI desktop, the advantage of this approach is that we have different reports for different users, Skyline, Gantt chart, Client reports (in their required format), management reports etc.

Some thoughts.

  • As you can see Primavera P6 is used only as a forecasting tool and to calculate the critical path, the earned value calculation is done in the data model, personally I think P6 should not be used as the centre of your project control system, I remember the first time I start learning Primavera P3 ( a long time ago:), we kept asking the trainer how it is possible to track the spent hours at the activity level, the answer is we don’t, actually deciding at which level you track you spent hours it is the most important decision to make when you start a new project.
  • the basic idea here is in order to have an integrated project controls system is stop trying to have one, data will be always in silo, don’t try to change other department how they manage their specific data, it will not work and they will not listen to you anyway, So instead of trying to have one system to rule them all, just use the existing systems and build a data warehouse for reporting and Integration with P6.

How to use Excel as a Front End to Power BI Desktop

You can download the data model, the excel frond ends and the data sets in this folder.

The techniques used in the spreadsheet for connecting to PowerBI Desktop are not officially supported by Microsoft,  ( as of  24 may 2018, Microsoft don’t mind people using this approach ) but I think are useful to understand how the MS BI stack works, especially if you are coming from an Excel Background, and Installing SQL server developer is either intimidating or not allowed.

I really appreciate if you vote for this idea to make this scenario offically supported by Microsoft

As far I am aware there are two approach to access PowerBI desktop, or more correctly the SSAS instance launched by Power BI Desktop.

  • Connecting using a live connection, detailed here
  • Connecting using PowerQuery, detailed here

 

The live connection is very interesting but it has the drawback that if you close PowerBI Desktop the pivot table stop working, so you can not share the spreadsheet, one solution is to use cube formula as they are persistent, if you don’t use know what a cube formula is then you are missing of the most powerful feature of Excel.

But what if you want to have a pivot chart, or a pivot table that you can keep using even if you lose the connection, or if you want to share the results with people that have not access to the data model,  turn out it is possible, welcome to Excel Pivot cache

 

  • Invoke the function SSAS_QUERY
  • 1
  • The Parameter is optional:
    • Either you write a DAX Query to retrieve the fields you want, MDX is supported too, personally I find MDX more suitable to import measures with different dimensions, I understand that DAX support this scenario too using crossjoin but I never manage to make it works correctly.
    • or Just click ok, than you can browse the SSAS cube, you can select any dimensions and measures you want, but mind that for a big cube, a query fetch the result faster.
  • Keep Powerquery as a connection only.
  • Insert a pivot table, use an external data source, choose connection, select the Powerquery Query, and voila

2

 

 

Excel cache the data in the Pivot, not only that it is extremely compressed.

Notice here PowerPivot is not used at all, Excel is acting as visualisation layer to PowerBI Desktop, leveraging two well known capabilities cube formula and Pivot cache.

Microsoft plan to release SQL server v.next this year, and then we can deploy the data model built with Power BI /Powerpivot /Powerquery into a production system, that’s what I call a natural growing path from self service to corporate BI.

Edit Dec-17 : i add a new file that use only VBA, so Powerquery is not required, it can be useful if you are still on Excel 2007, or you can’t install Powerquery.