P6 Date Formats – Quick Data Hacks!

In dealing with P6 data, sometimes what you expect, is not what you get. When it comes to date formats, this is quite relevant. Here is a guide to transform XER and P6 copy-pasted values into proper date formats.

In dealing with P6 data, sometimes what you expect, is not what you get. When it comes to date formats, this is quite relevant. Here is a guide to transform XER and P6 copy-pasted values into proper date formats.

Problems with XER file format

When dealing with a native XER file, you need to be careful because you can’t always use the field you want. Below is a screen print from a typical TASK dataset.

XER_date_format

When an actual start/finish date has been captured (as seen in the “act_start_date” and “act_end_date” fields above), then the dates that are stored in the “early_start_date” and “early_end_date” fields are no longer valid.

Thus, you need a routine to check if an actual start date exists, and use that in lieu of the early dates. Ultimately there is no good way to deal with, excel to write a routine somewhere in your data import routines – if importing in a database. Another option is to edit your XER import excel file to add 2 new columns for simply “start” and “finish” that will run your check for you in the native excel file before you import the data into a database.  This is an easy hack anywhere you manage your data: However, a core issue here is:

Where will you clean your data?

For me excel is often easier, but that does mean your processes will not be fully automated. Either way, the fix is fundamentally as seen below

 START=IF(ISBLANK(act_start_date),early_start_date,act_start_date)

FINISH=IF(ISBLANK(act_end_date),early_end_date,act_end_date)

 

Problems with P6 format (copy-paste)

If you are not dealing with an XER, you will likely simply copy-paste directly from P6 into Excel. Here again, we have to deal with a few (minor annoying) complications. Everyone I know in the planning world deals with this and has their own routines. I wish I could post them all, because some truly elegant solutions exist. The below is not meant to be “do it this way”, just more of an indication that if perhaps you are running into difficulty “this will work!”.

Start Finish
22-Oct-18 A 9-Sep-19
09-Sep-19*
09-Sep-19*
10-Sep-19 10-Sep-19
05-Jul-19* 10-Sep-19
27-May-19 A 10-Sep-19
03-Dec-18 A 10-Sep-19

Above we can see a typical copy-paste result from P6. Obviously, this is filled with non-date formatted cells. This is caused by the ” A” indicator for activities that have an actual date, and a “*” for activities that have some sort of constraint applied. I have seen a few equation based solutions to strip the bad characters out. I find a code based solution to be slightly more elegant. It also means I do not have to deal with adding extra columns to my file, or performing any copy-paste values. But again, I am sure we all have nice solutions.

The easy fix I use: Scrubb_Data().

Sub Scrubb_Data()

Dim xChars As String
xChars = “*”

Sheets(“P6_CURRENT”).Select
ActiveWindow.SmallScroll ToRight:=3
Columns(“V:V”).Select
Selection.TextToColumns Destination:=Range(“V1”), DataType:=xlDelimited, _
TextQualifier:=xlDoubleQuote, ConsecutiveDelimiter:=True, Tab:=False, _
Semicolon:=False, Comma:=False, Space:=True, Other:=True, OtherChar:=xChars, FieldInfo _
:=Array(Array(1, 3), Array(2, 9), Array(3, 9)), TrailingMinusNumbers:=True

End Sub

This routine only works on 1 column at a time. For me, I can simply copy and paste the select statement and insert a routine for each excel column with a P6 date  (typically just start and finish). After pasting in P6 data, I will open the routine and click “play”. You will need a macro enabled file and will need to be at least a little comfortable with “view code”. There are a few options to auto call this function using a button, call the function when you close the file, many different options.. They all require a bit of VBA knowledge, although, like seriously, who isn’t in our world.

For the above, I ripped most of this from a routine someone else wrote (a routine that stripped the ” A” off the string. I had to add the xChars = “*” aspect.

For a find/replace statement, it is possible to use the below string to strip the “*” off. Here we have to use ~*~ because if we use *, it will replace the entire string. Again, a million ways to handle this.

Capture

Conclusion

Again, there are a 100 ways to skin this cat, and all achieve the same result. When we deal with data from P6, I find it so amazing that when you ask 10 people how we deal with the data, you will end up with at least 20 replies. Ultimately,  I believe this issue is a telling critical flaw in the underlying software. In the digital world, we need a completely different paradigm shift in the way we store and manage data.

Specifically in the Project Management world, I doubt anyone who uses tools such as JIRA or DevOps have to deal with annoyances such as this. Thus, perhaps we too should be using those tools!

Advertisements

Digital Strategy (the beginnings) – by Darrin Kinney

Project Controls is the heart of every major construction project. Our teams are the ones that typically wade through information that never stops and constantly changes. Although at the core, the fundamentals of our strategy are likely the same as what everyone who manages data will encounter.

Digital Strategy for project controls is a tricky beast. We are bombarded by systems that all promise nirvana (and many can deliver nirvana for some projects). However, at the core, before we even discuss systems, we need to understand what our strategy is, and what key concepts we want to push. Below are some ideals of mine. With everything, this is a straw-man discussion. Every project will have its own culture, its own focus, its own needs. Your strategic approach has to cater to what you want from your project. Not to mention that the major companies all have quite rigorous project control plans and detailed procedures. These are great documents that do spell out our scope of services.

Project Controls is the heart of every major construction project. Our teams are the ones that typically wade through information that never stops and constantly changes. Although at the core, the fundamentals of our strategy are likely the same as what everyone who manages data will encounter.

The issues are not related to our knowledge of project controls (and project management) fundamentals. We know how to build schedule, track progress, build baselines, budgets, and track costs parsed into 1000s of potentials contracts and POs. We know how to calculate and track Earned Value, CPI, and SPI. We know change management, trends, change orders, etc. We know how to compile all the information into weekly and monthly reporting. We have the procedures and management plans.

We are the technical experts of our discipline.

However, too often, we are not able to step back and understand that the way in which we work can change, needs to change – and more importantly – the technology that is available now can be utilized to seriously disrupt the way in which we apply our technical knowledge and skills.

Below are some ideas, some concepts we can consider in everything we do to perhaps bring a more digital and smart way of operating in the new world we live in. These are not innovative in themselves, we live with these ideas already. My intent is to try to capture these ideas and through the tactical implementation of these ideas bring real innovation.

Strategy – Strategy is not Keywords

Too often, we write our strategy or integration documents and load them up with keywords and slick powerpoint. We have to embrace that we touch real things. Don’t use stick figures and cartoons  – use real live examples, get your hands dirty using the systems on your job. Understand the details. Stop writing keywords on a white board and open the application!

Digital Strategy Keywords

The real idea is to be grounded in practical things, not abstract processes.

Open your schedule, look at the activity definition that exists. Open your material management system and see if the items can be tagged with schedule activities. Look into your progress measurement sheets and determine if you can insert scheduleID to smartly update the progress in your schedule (sound like a broken record!).

Strategy – Enter data once

This is talked about by everyone, although, we have all seen it – everyone has their own spreadsheet. Every group has their own systems: SAP, Contracts, Procurement, Engineering, Cost Control, Planning: I have illustrated in the past the issue with something as standard as “where is the master contract list”. Do we really have a strategy to enter data once? If so, really delve into the data people keep on their computers in their own spreadsheets. Try as best as we can to pull key reference data out of a source system

There exist systems that can facilitate this approach and allow for integration. Although be careful and understand that integration is not strategy. The implementations of strategy will likely require integration.

Data Integration vs Digital Strategy

Enter data once is integrated into the concept of data integration. Integration is not specifically the idea that “systems talk to each other”, integration is the ability to just smartly manage your job. Unless common keys exist between systems, you can’t hope to build that integration inside a black box your digital team will build.

A good example of this is the implementation of a web based cash flow

Web Based Cashflows – Sharepoint

Strategy – Be Visual

Project Controls teams are the masters of reporting. Being visual is not about (only) creating dashboards or complex databases the feed our dashboards. Being visual is doing exactly what we already do. Stay away from data tables – Long Live Death by S-Curves!

The tabular data presentation methods we use are antiquated. Looking at an engineering progress report with 50 rows and 20 columns of various EV metrics – too often that doesn’t answer me the question of whether or not I am on plan or not. That is what I meant to be visual. Look at the questions be want answers for, the core underlying questions and FOCUS. If the question is infact “I was to see the productivity factor for 50 different areas both weekly and cum”. Perhaps choose a dot-plot. Place the values all on a graph, perhaps scale the size of the dot on the budget hours, use a horizontal axis for % and the vertical axis to show the PF. That is what I mean to “Be Visual”.

Eng12

Strategy – Push processes and data ownership to the person actually responsible, not the “function” that is responsible

Payment certificates, progress measurement reports, monthly reports: the list is never ending. We have data that originates with contractors and a multitude of sources. However, too often, the sources of information do not have access to the real database  or word/excel file used by the project. If possible, structure your data sources such that owner can manage their data.

I will likely post a few case studies about this to get people thinking of possibilities.

 

Strategy – Use Agile management

  • Focus on Output not Documentation
  • Respond to Change as opposed to fixation on Original Plan
  • Focus on interactions between people, not underlying systems or tools

These topics can be very disruptive: Tell a company like Fluor or Bechtel to NOT focus on plans or procedures? Good luck. However, this is a disruptive day and age and we do need to take this approach. I have worked in this business for over 20 years and honestly, in general, project controls already (generally) follows these agile fundamentals. We have plans and systems and reams of documentation, but in the end, we do (informally) have to follow these simple strategic cultural elements

What is needed here is the firm support from the management team that a proactive view of the

“what is going to happen tomorrow”

is way more valuable that the concept of

“what was planned to happen tomorrow”

The concept that

“I need this new report now”

is more relevant than

“This is the report that is defined in the PPM”

AGILE management is not able sprints or standups. Its a culture to embrace a stance that while we have general goals and objectives, the focus needs to be on what are we doing today. The reason to have daily stand ups is to have clarity on what we are doing today. The reason to have sprints is to outline near term goals. The reason we have restrospectives is to analyze the difference between what we thought we were going to do, and what we did (and more importantly how to improve the next period). This is just good project management 101! The fact someone calls it AGILE, is not relevant.

JIRA in Construction

Strategy – Allow users to LINK to key fundamental data

This is aligned with “enter data once”. But, this is a core strategic ideal that needs to be championed. We will have data housed in source systems, but have we enabled the wider project team to access this information. That is the primary idea here.

Is information shared by way of excel exports that are distributed via email to various people at various times? Or, do we have a project where information is open and accessible. The latter is what our strategic approach to everything should be.

 

Strategy – Facilitate live real time data

Ultimately you will need some sort of approach to cadence and integration. Will we be using real time data, will we instead focus on cutoffs.

This can if we really delve deeper into what this really means, be very disruptive. With real time data, the need for monthly reporting is useless, the need for weekly reporting – is useless. Reporting is a natural output 24/7 from the way we operate. This cultural approach to construction management is where I think we need to go to be hyper reactive to change.

 

Strategy – Be Hyper Reactive

As I mention above, we want to have a core ideal that allows us to change course immediately when required. We have submitted the same schedule and cost reports every month – WHY? Do not be afraid to change things up, support a workplace where the team can be hyper reactive to respond to everything.

For schedule management, this requires the approach to insert activities into the schedule with daily vigor to reflect what is happening today.

For cost management this requires detailed ticket management approach to our tasks to allow visibility into everything we do 24/7

 

Conclusion – Do not be afraid to push boundaries

This list above is again just a beginning of a discussion. These are discussion topics that are occurring in every company around the world.

As we head into the future, we have an entire generation of new employees who are not afraid of technology. Quite the contrary, the newer generations of people will feel more comfortable updating a web form vs updating a word file. They will feel more comfortable posting progress updates to a social media site vs writing a daily report. They will be more comfortable making mistakes, and fixing them.

The dashboard craze is here to stay. The use of dashboards and the approach needed to create PowerBI or other application based analytics tick many of the strategic elements I have outlined here.

In the end – do not be afraid to push boundaries with your approach to construction management.

Reproducing Primavera Select Columns in PowerBI; aka Dynamically adding measures to a Matrix

TLDR, The online report is here

Primavera P6 calculated a lot of rich metrics for a schedule, we all know the basics, Start Date and Finish date, but there are all kinds of other values ( Cost, Labor hours, not Labor , Float Duration etc)

The challenge is, different people want to see different measures,  and once you publish your reports, the viewer can not change the visual, Microsoft is working on this feature, but it will take another couple of months to be released.

The trick was suggested by Kasper de Jonge and in a random chat with @DingbatData @_Ivan_Bond , they use it already to solve similar issues, actually it is very easy

Anyway the purpose to reproduce something like this from Primavera

  1. Create a new table that contains all the measures

Instead of copy and past all the measures, I just used DAX studio connected to my PowerBI desktop to generate a list of all the measures, read this link

The Category is to just to make it easy to select which measure to select, the index is to keep the same sort in the slicer, I don’t want to show actual Finish before Actual Start.

2- Create a Master Measure that check if the value is selected.

Using Switch and SelectedValue give the result, I had only to add some condition to format the results of Date to be show as date not Number, you can vote on this idea

Here is a snapshot for the first 4 measures ( currently I have 29 in the models)

Selected_Measure =
SWITCH (
    SELECTEDVALUE ( dummy_meaures[MEASURE_Values], BLANK () ),
    “Task_Count”, [Task_Count],
    “Budget Labor Units”, [Budget Labor Units],
    “Actual Finish”, SWITCH (
        [Actual Finish],
        BLANK (), BLANK (),
        FORMAT ( [Actual Finish], “mm-yy-yy” ),

    “%_Labor_units”, SWITCH (
        [%_Labor_units],
        BLANK (), BLANK (),
        FORMAT ( [%_Labor_units], “0.00%” )
    ))

3- Add the column Masure_Value to the column in the Matrix and Seletced_value in the values Area.

4-Voila

5-Bonus

I am using bookmarks to toggle the select Columns Button, I think it is freaking cool,  hopefully in the short term PowerBI add more features so we can build not only dashboard but real application interface.

Major Milestone Tracking – By Darrin Kinney

A key quality of project controls management, is the communication of major milestones. The whole point is to review the changes of all the dates over time.

A key quality of project controls management, is the communication of major milestones. Every lead will have several spreadsheets with all our milestones listed on rows, and columns for the various interpretations (baseline, contract date, prior forecast, current forecast, contractor forecast, etc). So when we talk about dates, the difficulty is that everyone will have a different date in their mind. Continue reading “Major Milestone Tracking – By Darrin Kinney”

Connecting PowerBI to Primavera Database – Part 2 (WBS report)

in the first blog post of this series, we showed how to connect to Primavera SQL server, in this blog we build our first report, we use only the three tables TASK, PROJECT, and PROJWBS

you can download the pbix here, an online report is published here

I will not show the details of every steps, you need to have a basic understanding of PowerQuery and DAX, but i will highlights some aspects of Primavera Database schema that you should be aware of.

Reproduce Project View

this view show all the projects grouped by WBS and show measures, start date, finish date and budget labor units

and here is the equivalent report in PowerBI

PROJWBS

the table PROJWBS store the EPS/WBS data for all projects, it is represented the database as a parent, child ID, which can’t be used directly by PowerBI, first we need to flatten the data to multiple levels so we can show it in a matrix visual, so basically moved from this format

to this format

you have multiple options either using SQL , DAX or PowerQuery, for Powerquery here is an excellent resource by the Imke Feldmann , for our example I am using DAX , the canonical reference is by Marco Russo

just make sure when you import a table from SQL server to have this filter [delete_session_id] = null , because Primavera don’t directly delete data, but instead have something called soft delete, ie; the items is not shown in the client but it is still in the database and will be deleted later, anyway for PROJWBS remove all the template WBS, (I think they are used by the EPPM web client)

TASK

task is straightforward it save all the tasks of the project, same filter [delete_session_id] = null

PROJECT

we use project table only to filter the baseline out, in Primavera current project and baseline are saved in the database exactly the same way ( that very powerful paradigm ), but for our report we want to show the activities only for the current project, too easy [orig_proj_id] = null and the best part, we don’t have to write any queries, Powerquery simply generate the SQL for the database ( that’s awesome)

Simple Data Model

the two tables are connected by the field wbs_id, we added another copy of the table TASK as a dimension table for reporting ( just activity id, and activity name), and we have this simple data model, I like to save measures in a separate dummy tables

as you can see, building a data model is relatively easy, the complexity start when you want to add more measures, for example, total float, you need to connect to the table CALENDAR, if you want cost, you need to connect to other tables, and if you want spread it will become a little trickier ( hint it is not saved in the database)

hopefully by now, instead of asking how to connect to Primavera Database, the interesting question become, in which table the data is saved and how to join two separate tables to get the report you want

if you are still reading, I will appreciate if you can vote on this idea, unfortunately you can’t dis-activate table sorting in PowerBI, in this particular report, the sorting is already defined by the WBS, if the user click on the header, the order will change, they can still reset the order using the measure sort, still very annoying

Connecting PowerBI to Primavera Database, Part 1

I think one of the most asked question when some talk about Primavera and PowerBI, is how to connect to the database, ok, the good news is, the connection itself is easy, the bad news, extracting useful information is a bit of work.

Just to show how it work, I am using a temporary installation in my personal laptop, as obviously I don’t have access to my production database.

I am using a developer edition of SQL Server 2006, and an evaluation copy of EPPM, oracle allow the use the evaluation of most of its software for the first 45 days, you can download a copy from here, you need SSMS too

For the purpose of this blog, we will query the “normal” Primvera tables, for the extended schema, which is a groups of tables and  views design specifically for reporting, but those extra tables are empty per default and you need to configure publishing service ( will discuss it in a future blog), please note I already blogged about how to connect when using Sqlite in the case of standalone P6 professional

Connect to SQL server using SSMS

When you install Primavera, you get to define 4 user account

  • sa : the database admin account (not the admin for primavera application).
  • Privuser, pubuser : used to connect Primavera app to the database
  • Pxrptuser : user account for reporting

              We will use sa to connect to the database            

When you click on connect you get this

The database itself has 320 tables; you can check that by running this SQL script

USE PMDB

GO

SELECT *

FROM sys.Tables

GO

Create a read only user

Connecting using the admin account is just very bad practise, and I don’t want to mess with the existing account, so instead we will create a read only user account

  1. Create a New Login
  • Create password
  • Map the user the PMDB
  • Assign a new role

Instead of having access to the 320 tables, we create a new role (read_only) and we just assign the 3 most important table in the database, you can add later more tables, we granted select only, so no read access

Connect PowerBI to SQL using read only user

and Voila our Tables are now visible in PowerBI

so the answer to how to connect to Primavera Database from PowerBI is you need a user name, password and the server name, the challenge is how to extract meaningful reports from those tables ?

what’s next

at this stage, you need to get yourself familiar with Primavera Schema, yes it is 320 tables, but the basic one are three, and usually for my reporting I use around 10, I wrote an introduction to Primavera schema 6 years ago, I hope it is still relevant

Part 2 is published here

For security implication please read this

Using WebJobs to scrap public website and copy data to Azure blob storage.

When I start that AEMO Dashboard , I had a hard time dealing with PowerBI gateway, it is just setting there, my laptop has to be  online whenever I need to schedule a refresh, it just annoyed me, and I could not understand how cloud based data needs on-premises gateway anyway,   obviously later I learned that strictly speaking it was not required, there was just undocumented feature to get away of it ( the trick is in the first blog, thanks @Rad_Reza ).

but before I was aware of that, I went to some rabbit holes dealing with new tools that are out of my comfort zone, and I think they are worth sharing.

my first thought was instead of accessing the data directly from the website, let’s instead copy the data to a cloud storage then read it from there, I have already a google storage account, it is very generous with a free 5GB storage, my data is not big around 2 GB of zipped csv..

first setback, there is no native connector to Google storage and even if there was ,we have something called egress fees, in a nutshell, cloud storage is really low cost, loading data is free, but getting your data out is not free, unless it is for the same provider and the same region, most of the cloud vendors use the same model,  as my data will be processed in PowerBI, the clear choice is azure blob storage

Azure Blob Storage

the setup is very simple I used the following options :

  • the same region as my PowerBI region ( otherwise your pay the egress fees)
  • for replication I used LRS 

as PowerBI don’t support data lake V2, I used the classical Blob Storage.

Let’s move some data 

anyway, now I have my storage, I need a tool to copy the files from here http://nemweb.com.au/Reports/Current/Daily_Reports/ to my storage account.

Azure data factory

 When you want to copy data, the official tool in azure is data factory, I tried to play around with copy activities, it is straightforward, my first attempt did work and it was fast , actually too fast 😊, no zip was transferred but rather an  HTML 

probably copy data just handle this case just fine, but when you use your own credit card on a cloud tool and you don’t know what you are doing, better stay back and take the time to understand how it works, I deleted the new created resources and went to the second option, Python !!!

 PYTHON

Normally I go with R but blob storage has no API for R, I have very limited experience with Python , just using it for the excellent package altair , let’s try something new.

I was very pleasantly surprised, the amount of documentation for Python is just amazing, actually once I asked a question on stackoverflow and got a very succinct answer in less than a minute, no one was judgemental or downvoted my question ( the question was very basic). the only drawback is that sometimes the code works well for python 2, but I am using Python 3 anyway enough talking let’s show some pseudo code.

step 1 : get a list of files name from the web site

url = “web address where the files are saved”

result = urlopen(url).read().decode(‘utf-8’)

pattern = re.compile(r'[\w.]*.zip’)

here is a snapshot of the results, the full list is 60 items.

[‘PUBLIC_DAILY_201904260000_20190427040503.zip’, ‘PUBLIC_DAILY_201904260000_20190427040503.zip’, ‘PUBLIC_DAILY_201904270000_20190428040502.zip’]

step 2 : get a list of files name from the blob storage

 in the first run, the list is empty as we did not load anything yet, I load a couple of files manually just to test if it is working, the API for blob storage are very simple, you only need to provide your storage account name and key and  I love that.

block_blob_service = BlockBlobService(account_name=’’xxxxxx’,                                                          account_key=’xxxxxx’’)

generator = block_blob_service.list_blobs(container_name,prefix=”current/”)

the same you get a list of names.

step 3 : get a list of files that exist in web  and don’t exist in the storage

the code in Python is very simple, it is simply substraction of two sets, and then you converted to a list using function list ( i get why people like Python)

files_to_upload = list(set(List_website)-set(list_azure))

step 4 : Upload the new files to Azure Blob Storage

the same here, the Azure API are very simple and clear, I had only when issues, when the script upload in a loop, it does not wait until the transfer is completed before jumping to the next file, my workaround was just to use sleep ( sync is supported but not in this scenario where the input is from an url), anyone i got the answer in stackoverflow

for x in files_to_upload:

    block_blob_service.copy_blob(container_name,x,url+x)

    copy_status = block_blob_service.get_blob_properties(container_name,x)
    #use code below to check the copy status, if it’s completed or not.

    while(str(copy_status.properties.copy.status) != “success”):

        copy_status = block_blob_service.get_blob_properties(container_name,x)

basically wait till the status of the copy is success before moving to the next item, ( did I say I love Python syntax)

the full script is here 

 WebJobs; a Free Job Schedule  

ok, so we do have a script that works, now we need to run it on a schedule, once per day at 5 AM,  keep in mind the whole purpose of this workflow is not to use on-premise software, I just need to find the service that runs a script on the cloud on a schedule, as I am already on azure, let’s stick in that ecosystem.

and it is a personal project, I prefer a free solution,  my script runs only every 24 hours, for a couple of minutes,  a quick google search and i find this little treasure, I will not repeat here the steps, WebJobs is a service that just do that.

note that the package azure-blob-storage is not a base package in Python you need to install first in WebJobs, the schedule functionality is very flexible as it is using CRON, I wish we had something like that in PowerBI Dataflows.

End results 

Every day at exactly 5 AM, a new file show up in the azure storage, although I don’t need those files, I am using now another approach to load the files directly in PowerBI, it is important to build a data lake ( yes, I just said that, I am just joke, data lake is folder in the cloud where you save the raw files, nothing more), storage is cheap but most importantly the requirement may change, I may need to report on another dimension and it is crucial to keep the raw unprocessed data.

Take away

  • Python is awesome
  • Azure API for python are straightforward
  • Azure is awesome.
  • Be careful of Egress fees
  • CRON is awesome wish it was supported in PowerBI dataflows.
  • Wish PowerBI dataflows could save a raw file, Powerquery is amazing but it does not copy raw files.
  • Wish WebJobs add support to R