The ability to learn how to judge outcomes is probably a core skill

One of the things I’ve been mulling about, and am perhaps ~60% confident of, is that: as a leader, one of the core skills one needs to acquire is the ability to learn how to judge outcomes in unfamiliar contexts.

Breaking that down bit-by-bit:

“as a leader” – here, I mean “leader” in a very general sense. Traditional forms of leadership include management roles, but I’d say “leader” also applies in a conversation with an LLM, where you’re the “leader” because you’re “leading” with questions and guiding the conversation. Or if you’re going to a gym class, your gym trainer is the “leader” as they’re guiding you to make certain kind of progress.
“one of the core skills” – I believe that this skill is fundamental to one’s success as a leader (i.e. necessary but not sufficient).
“needs to acquire” – I believe that the ability to learn this skill itself can be learned.
“learn how to [..] in unfamiliar contexts” – A leader generally needs to adapt to different contexts; it’s generally not possible to magically get good without any “reps”, so learning how to get better in varying contexts – especially those in which one has had less experience – is necessary for getting better overall.
“judge outcomes” – This is one sub-part of the Study component of Plan-Do-Study-Act.

I’ll try to illustrate this point with three examples:

Interactions with an LLM
Being the Tech Lead for a team of Software Engineers
Founding a company and leading it through different stages of growth

Before you read further, I should warn you that I likely have ~low believability on most of these. At the time of writing, I have:

Used LLMs for various things such as translation, language learning, programming etc. but I’m not at the cutting-edge in terms of trying the latest techniques, the large variety of tools etc.
Have been Tech Lead for a small team of 2-3 Software Engineers for about ~1.5 years.
Have not founded a company. However, I have listened to 100 hours+ of podcasts and likely read 200K+ words worth of books and blog posts on topics such as leadership, engineering management, and product management.

So maybe all of this is wrong. 😄 After the examples, I’ll discuss some potential counter-points to this idea. In the final section, I wrap up with some pointers to research.

Outcome judgement with an LLM

One of the things I’ve learned when interacting with LLMs is that in case I need the response from the LLM to match some specific criteria or meet a certain standard, I actually need a way of checking that the output does match the criteria or meet the standard.

“Isn’t that just a long-winded way of saying that LLMs are fallible?” – No, it’s a more action-oriented frame, rather than an observation-oriented frame.

Specifically, whenever I ask an LLM a question, I try to be conscious about asking myself “how is fitness-for-purpose defined here” and “how am I going to judge the fitness-for-purpose of the output.” This includes factors such as correctness and thoroughness (more meaningful in the context of programming), but extends beyond them. However, sometimes I still do tend to make mistakes such as:

Asking a question and then realize I don’t have a way of judging the answer.One interesting aspect I find about certain tools’ LLM functionality (notably, Slack) is how they try to consistently provide citations, which I find helpful in verifying answers. I hope this is adopted more broadly as a good UX pattern.
Not asking an LLM a question where I can easily verify whether it is correct, and verifying 2-3 answers is likely faster than trying to learn the answer myself.

With an LLM, often I’m operating in an unfamiliar context, e.g. I might discuss PostgresQL query plans, or Japanese knowledge, or perhaps ask for some suggestions on how to structure some code. This means constantly learning to what extent the LLM can help with certain kinds of questions.

Focusing on being able to judge outcomes helps me provide a better frame for my discussions with an LLM. Before I adopted this frame, I would often get frustrated at getting poor answers, but focusing more on outcomes helps me think of the LLM as just another tool which sometimes makes stuff up, and I find that I can be more deliberate and less emotional when looking at an LLM’s output.

Outcome judgement as a Tech Lead

As a Tech Lead for a product area that was largely developed when I was not present at the company, one of the challenges has been diving into unfamiliar and undocumented code.

At one time, a few months in after having been made Tech Lead, my manager came to me and brought up a concern with me that some work that another engineer was doing was seemingly taking too long. At the time, my initial response was that I thought that the engineer was focused working on that, so if it is taking long, it must be because the problem was more complicated than we had anticipated. I also mentioned that the engineer had told me that he was having issues with collaborating across timezones with people from other teams.

Only later did I consider some other possibilities:

Maybe the engineer was stuck in a mental block, and was spinning his wheels.
Maybe me pairing on the issue would’ve helped understand some things faster.
Maybe I could’ve alleviated some of the timezone-related friction by talking to someone on the other team synchronously (e.g. by gathering context + questions and forwarding them).

and so on.

More importantly, what I had missed during this whole situation was – not only did I not have any estimate for how long the task should take at the beginning (I could’ve thought about it as a range with a high uncertainty, instead of it being a point estimate, but I didn’t think at that level), but that I was not thinking about understanding the details of the work, so that I could have a better estimate in the future in case we needed to change something in that area in the future.

More generally, I noticed that my feeling around not wanting to appear micro-managey was overriding one of my core responsibilities as a Tech Lead, which was to make sure our team delivered on our outward-facing commitments within reasonable time.

Ben Kuhn points out something similar in Some mistakes I made as a new manager:

I read a bunch of management books that warned me against micromanaging my reports, so I resolved not to do that. I would give my team full autonomy, and participate in their work only by “editing” or helping them reach a higher quality bar. “These people are smart,” I thought. “They’ll figure it out, or if they get stuck they’ll ask me for help.”

[..]

Instead of “don’t micromanage,” the advice I wish I’d gotten is:

Manage projects according to the owner’s level of task-relevant maturity

People with low task-relevant maturity appreciate some amount of micromanagement (if they’re self-aware and you’re nice about it).

Since that time, I’ve been trying to be more cognizant about estimating work and understanding uncertainty when working in different parts of the codebase. For example, I’ve tried to adopt the following:

When someone is working in an area I’m unfamiliar with, try to ask them more questions to better understand footguns and tech debt in different parts of the codebase.
Trying to ensure that our team posts regular updates in the async stand-ups, to identify if people are blocked, or even just to check if something is taking longer than I thought it might originally (in which case, I potentially need to re-calibrate my expectations etc.).
Trying to use time-tracking for my own work, to get a better sense of how much time is spent where.

Outcome judgement as a founder

One of the challenges as a founder seems to be that as a company goes through various stages of growth is how should the founder navigate that. One of the primary questions seems to be around time allocation: should the founder focus on areas which they’re particularly good at or on areas where they’re particularly bad at? Should they focus more on strategy, vision and alignment, or should they focus more on digging into the details of different parts of the business? (Meta: Are these actually false dichotomies?)

Depending on who you ask, the answer to this question is likely going to vary.

For simplicity, let’s consider a founder operating as the CEO. The CEO of a company that has ~1000 people needs many different skills compared to the CEO of a company of ~10 people. At the scale of ~1000 people, the CEO of a tech company likely has leaders from several key areas – Sales, Marketing, Product, Engineering, Design, HR etc. – reporting to them.

If the CEO has an Engineering or Design background, they probably do not have deep experience leading Marketing or Sales departments of 50~150 people. And yet, they need to be able to make sure regardless of who exactly is leading Marketing or Sales, they’re doing a good job along various axes (deliverables, cross-functional collaboration, hiring and retention, etc.) when taking into account various external factors (macroeconomic trends, industry trends, competition etc.). These external factors are likely to continue evolving over time, including situations which the company may not have encountered in its history.

At the time of hiring or promoting someone into a VP of Sales or VP of Marketing role, the CEO needs to be responsible for figuring out whether the person is likely to be successful in the upcoming role, and part of this requires understanding how successful they were in the past. Once the person is appointed to this post, the CEO needs to be responsible for coaching the new VP (or identifying someone else who can coach them!), which requires the CEO to judge the outcomes of the department under the new VP, as well as the VP’s role in them.

Counter-points

The original statement at the start of this post was:

as a leader, one of the core skills one needs to acquire is the ability to learn how to judge outcomes in unfamiliar contexts.

Here is a non-exhaustive list of potential counter-points to the above statement:

This skill does not appear to be an important or fundamental aspect of being a successful leader.
There are many other skills which are more important for being a successful leader.
There is probably no general “ability to learn outcome judgement”; many domains are too ill-structured to support easily learning outcome judgement, or for transferring outcome judgement skills from other domains.
The “ability to learn outcome judgement” is an innate skill, not something that can be acquired/trained.

Let me go over each of these points one by one.

This skill does not appear to be an important or fundamental aspect of being a successful leader.

I currently believe that “goodness” (= having decision-making skills which increase likelihood for success for the overall organization) and “success” as a leader are two different things. I don’t have good sense of how correlated these are (I hope that they are strongly correlated), but they are certainly not perfectly correlated. E.g. sometimes people “fail upwards”, succeed due to nepotism, can hide the problems they cause due to having more charisma etc.

By definition, a leader needs to delegate at least some decisions to other people. Without developing outcome judgement, the only options the leader has are to:

Only delegate decisions which do not have a meaningful impact on the overall area they are responsible for: This likely introduces a bottleneck where the leader has to make a lot of decisions, as well as reduces autonomy and purpose for the delegatees.
Trust the person and not check their work: This increases the length of feedback loops, where the delegatee only learns from their mistakes at a much later stage, if at all. This also means that when disputes arise and are escalated to the leader, the leader cannot make decisions with strong justified conviction.

There are many other skills which are more important for being a successful leader.

This may very well be the case. For example, depending on the context, listening skills, persuasion skills, domain expertise etc. may be more important. I have not done a deeper analysis or looked at the literature here.

There is probably no general “ability to learn outcome judgement”; many domains are too ill-structured to support easily learning outcome judgement, or for transferring outcome judgement skills from other domains.

I think this may be true. Given the prevalance of people changing careers and succeeding in widely different contexts, and to a much lesser extent the presence of polymaths, I believe that there might be some common underlying ability which enables people to accelerate their development of expertise across contexts. However, as Commoncog’s summary of Accelerated Expertise points out:

everything in the expertise literature is difficult to generalise. Some methods work well in some domains but not in others [..]

a great many things about training can probably never be known. [..] it is nearly impossible to isolate the factors that result in successful training in real world contexts

Studying expertise and training methods is hard, and despite a lot of studies in the area, there are many aspects to expertise we don’t understand well.

So it’s not obvious that such a general ability exists. And even if it does exist, perhaps it makes sense to focus on methods which are known to work in a particular domain (and not all domains may have such methods/training material etc. easily available!). This is also why this blog post doesn’t have a “here’s how you acquire this ability” section.

The “ability to learn outcome judgement” is an innate skill, not something that can be acquired/trained.

There are several ideas connected to outcome judgement which can be systematized, taught and learned, such as running Plan-Do-Study-Act loops. Additionally, domain-specific skills can also be taught and learned, to varying extents depending on the domain. So it seems plausible to me that this meta-skill can also be taught/learned/acquired (although doing so may be hard, because of reliance on tacit knowledge etc.).

Wrapping up

If you’re in a leadership position, but haven’t thought about “Trust but verify” as a useful frame for leading,The interaction of “verify” with “trust” is probably a worthy topic to discuss by itself, but I haven’t thought about it deeply, so I don’t have any comments to add there. I do recommend reading Ben Kuhn’s blog post linked earlier, especially the point about task-relevant maturity, which seems like a helpful frame.

it might be valuable to do so.

There is some research supporting this line of reasoning by Claus Langfred in the context of self-managing teams:

A high level of trust can make the members of self-managing work teams reluctant to monitor one another. If low monitoring combines with high individual autonomy, team performance can suffer. Data from 71 self-managing teams of MBA students demonstrated this effect. High trust was associated with higher team performance when individual autonomy was low but with lower performance when individual autonomy was high. (paper)

Langfred also has a longitudinal study: The downside of self-management: A longitudinal study of the effects tf conflict on trust, autonomy, and task interdependence in self-managing teams (paywall).

However, since this research is focused on self-managing teams and MBA students, it’s not obvious that the findings should carry over apply in other contexts such as paid work, where relationships often have more hierarchy involved. It seems plausible based on my own experience and what I’ve heard in various podcasts, but I haven’t found any related research. That said, I have not done a literature deep dive, so it’s quite possible that there is existing research which supports/contradicts the viewpoint I’ve advocated for in this post. If you know of such research, please do email me (contact info on homepage). 😄