Believability & Relevance Weighted Project Review Framework
I recently completed Ray Dalio’s book “Principles for Life and Work”. For those who haven’t heard of Dalio, he is what you might describe as a hyper-successful individual. He started from a poor background and founded a hedge-fund called Bridgewater Associates that has grown to become the largest and most successful hedge-fund of all time, managing over $130B for clients.
In his book, he attributes his success to his approach to solving problems and managing repeat challenges that he is faced with — what he refers to as “another one of those”, as opposed to any individual brilliance.
One of Dalio’s most persistent principle’s is his approach to decision making in an organisation with many individuals, and inevitably, many differences in opinion. He calls his process “believability weighted decision making”.
The idea is that his company makes decisions not as an autocracy (where an individual boss makes the final decision) or a democracy (where each individual gets an equal vote), but under a system where each individual gets a vote, but each individual’s vote is weighed in proportion to how believable the individual is on the topic for which they are voting.
Ray also advocates for project and individual performance reviews to be conducted often and bilaterally (meaning that even the boss should be reviewed!), and more generally, he encourages people to systematise and automate their management processes into algorithms where possible. He credits a lot of his success to this systematic approach to management, and asserts it is generally far more accurate, reliable and timely than more typical subjective assessments.
Energised by Ray’s approach to management (and approaching my first interview for a role with management responsibilities), I set out to try to replicate Ray’s approach of designing a management framework for project reviews that would be reproducible, automatable and insightful.
What I landed on was an extension of Ray’s believability weighted voting system applied to project performance reviews. The framework is detailed below.
Step 1 — Define Believability and Relevance weightings
A project has a certain number of roles / individuals, and each individual has a set of some measured personality / responsibility dimensions. E.g. consider the below:
Now, we also know that it is more important for certain roles in the project to be competent at certain dimensions. For example, for this example project, it is absolutely imperative that the Data Scientist displays a high degree of technical competency in their domain, however it might not be so important that the Data Scientist is strong at teamwork (since his work is largely performed independently). Conversely, the Project Manager might have the opposite relationship for these two dimensions. We can perform a subjective assessment of what project measured dimensions are important for which individuals as so (scored from 1 to 10):
Of course, these are just example values — this assessment should be performed by an experienced manager who knows what dimensions are required, and which are more just beneficial in different roles.
Dalio was quite passionate that everyone in a team project should be able to review every other team member in a non-hierarchical way. This means that even the Data Scientist should be granted an opportunity to review the Project Manager (just as the inverse would normally be performed in typical review processes). Having said that, there is no ignoring the fact that some team members are certainly more believable in their assessments of certain team members than others. For example, we may know for our case that the Data Science Team Lead is a good judge of the Data Scientist (since he is mainly involved in the project to oversee the work done by the Data Scientist), however he may also be a poor judge of the Project Manager, since we expect they will share little interaction on the project. By assessing how believable the reviewers are with respect to the reviewees we might arrive at a believability weighting matrix as so:
For example, in the matrix above we have assumed that the review provided from the Project Manager to / on the Stakeholder lead, should be given a high weight (8), whereas the review provided from the Data Scientist to the Stakeholder Lead should be given a lower weight (4). These weights are subjectively defined before project reviews are collected and may be defined based on factors like:
- Level of project collaboration required (for example, the Data Scientist and the Integration Engineer have to collaborate extensively, therefore their reviews for one-another should be given a high weight).
- Understanding of one-another’s roles & responsibilities (for example, the DS team lead has a better understanding of the role, responsibility and objectives of the Stakeholder lead than the Integration Engineer, so their review should be given more weight).
- Reviewer seniority (reviewers with a stronger track record are naturally more believable).
Step 2 — Collecting Reviews
Once that is done, each individual is sent a review to complete. In the review, they have to assess every other team member on every dimension, for example, the Project Manager’s review of the Data Scientist might look as follows:
Once all team members have completed their reviews, it’s time to look into the results — so how can we use all of this information? Let’s take a simple example, consider that the Data Scientist had received the following reviews for their work from the different reviewers (1–5 from “Strongly Disagree” to “Strongly Agree”):
But now — remember that not each reviewer is equally important when reviewing the Data Scientist’s performance. Further, not every dimension is equally important for the Data Scientist to demonstrate. We can overlay the importance of each reviewer to the Data Scientist as well as each Dimension for the Data Scientist to demonstrate below:
E.g. we can see from the above that 25% of the Data Scientist’s responsibilities as measured by dimensions are “Technical Competency” and 36% of their aggregate assessment will come from the Data Science Team Lead.
Now that that is done, we can calculate an aggregate score that consolidates everything within this scope into a single neat score between 1 and 5 that represents how well the Data Scientist was perceived to have performed the core functions of their role on this project. This is calculated by summing the product of each individual reviewer-dimension score in this matrix, with both the normalised reviewer believability score for the reviewer and the normalised dimension relevance score for the dimension.
So how did our Data Scientist perform? After calculating this out, we see that the Data Scientist had an aggregate project score of 2.9. This is below average (3 being “Neutral”). So why didn’t the Data Scientist perform well? Well we can visualise the total points available for each dimension alongside the total points scored by the Data Scientist:
What we observe from this is that there appears to be a disconnect between the dimensions we expect are most important for the Data Scientist to fulfil (with the highest blue bars), and the reviewed source of where the Data Scientist got most of the points that went into their aggregate performance score. For example, it was expected of the Data Scientist that 25% of their role’s responsibilities across all dimensions lay solely in “Technical Competency” (i.e. 1.25 points), however, the Data Scientist’s reviews indicate that he performed below average in this dimension (only capturing 0.5 of these points or 40%). Conversely, the Data Scientist appears to be over-indexing in “Solution-Design / Creativity”, where this dimension isn’t a heavily weighted requirement for their role. The most important dimensions for the Data Scientist to improve moving forward are those with the highest deficit bars (in yellow). This is by-far technical competency, but they should also focus on improving their “Delivery and Implementation” moving forward.
So how can we use this information?
After collecting all reviews, we can repeat the process shown above for the Data Scientist for all team members. Using this information we are able to aggregate different views of this data to perform both project oversight level analysis, or identify personalised areas of improvement for our team members. We can for example visualise:
- How well our team members dimensional strengths align to their roles’ responsibilities (as shown above for the Data Scientist).
- Root cause analysis for project failure — which key dimensions and people failed on the project (and which were most consequential).
- Which team members were assessed by their peers to have performed their roles well for the project (as measured by the aggregate project performance score — 2.9 for the Data Scientist).
- How / if team members are improving their aggregate performance index on different projects over time (as measured by the aggregate project performance score for each project).
The strength of this framework is in its reproducibility, and the depth of rich, believable information retrieved with relatively low overhead from team members (however — of course the manager responsible for the review process has the burden of setting up and administering this process).
I hope to try to put it into practice in the future on some of my own projects.