According to a recent survey from Rethink Priorities, when asked what best describes their current career respondents replied:

- Building flexible career capital and will decide later (18.7%)
- Still deciding what to pursue (17.2%)

I was struck how perfectly this aligns with the optimal solution to the Secretary Problem: given N options, and subject to certain constraints, you should evaluate 37% of them before committing (those two survey responses add up to 35.6%). This is mostly coincidental, but leads to a longer exploration of real-life applications of SP-like dynamics.

Of course, SP is just a toy model. For our use, the two most problematic assumptions are **Binary Payoff:** the evaluator’s goal is merely to maximize their probability of selecting the best candidate, meaning 2nd best is just as bad as the worst, and **No Opportunity Cost:** evaluation is treated as free, there’s no cost to making a selection after 100 seeing candidates rather than after 10. I refer to an adjusted SP without these assumptions as the Modified Secretary Problem (MSP).

This post is in two parts:**1. Optimal Stopping Points for the MSP****2. Discussion**

All code is available here as a Colab Notebook.

### Optimal Stopping Points for the MSP

**Setup:** To adjust against the disanalogous assumptions, we’ll make the following modifications:

- Utility is proportional to the quality of the selected candidate. Unlike the regular SP, 2nd best is almost as good as 1st, and much better than worst.
- Utility is proportional to the number of candidates remaining after selection. This gives an opportunity cost to evaluation, and introduces a kind of explore-exploit tradeoff.

Additionally, we’ll consider possible scenarios where:

- Candidate quality is uniformly, normally or log-normally distributed
- Utility is either gained:
- Purely based on whether or not you picked the best candidate (BINARY)
- Directly proportional to the quality of the selected candidate (PROPORTIONAL)
- Directly proportional to the quality of the selected candidate, multiplied by the number of remaining candidates (TIME)
- Gained at each time step directly proportional to the quality of the candidate currently being evaluated, plus the quality of the selected candidate multiplied by the number of candidates remaining at the end (TIME_CONTINUOUS)

Since BINARY payoff doesn’t depend on the quality distribution, we run a 3x3 matrix of scenarios, plus the initial base condition.

**Approach:** There is an analytical solution to the Secretary Problem described in Appendix A, but we’ll be focusing on a numerical solution (i.e. simulation). I start by replicating the original result to test the validity of the simulation approach, then find optimal stopping points for the 11 modified scenarios.

**Pseudocode**

- Generate n=100 candidates with quality sampled from the scenario’s distribution
- For each possible stopping point (S), run t=10,000 trials
- Find the best_initial candidate from the first S candidates
- Iterate through the remaining candidates in order, selecting the first candidate with quality better than the best_initial candidate

This is an implementation of the optimal strategy for the original SP described in Analysis of heuristic solutions to the best choice problem (Stein, Seale and Rapoport, 2003).

**Results**

First, we manage to replicate the original result, yielding a stopping point of 34%. This is close to the optimal solution of 37%, but clearly noisy.

Next, we’ll run the MSP for the matrix of initial conditions.

**Table of Results**

BINARY | PROPORTIONAL | TIME | TIME_CONTINUOUS | |
---|---|---|---|---|

UNIFORM | 35 | 9 | 2 | 3 |

NORMAL | N/A | 8 | 1 | 5 |

LOG_NORMAL | N/A | 24 | 7 | 12 |

**Charts**

**Sanity Checks**

Let’s begin with a quick sanity check of the results. All scenarios remain characterized by a U-shaped curve. Stopping too early results in having a too low bar, and satisfying too early. But stopping too late results in not having enough candidates left to evaluate, and risking going home empty handed (technically, defaulting to the last candidate in the pool).

The time-sensitive conditions all result in substantially earlier stopping points than their non-temporal equivalents. Again, this makes sense. Taking into account opportunity costs will push the decision time earlier, providing more time to “exploit” the benefits of a good candidate selection.

It also makes sense that TIME_CONTINUOUS stops a bit after TIME. Since utility is gained during the evaluation process, the effect of opportunity costs are dulled.

### Discussion

**The Difficulty of Reasoning from Toy Models, and Ambiguity of Increased Realism**

We’ve adapted the secretary problem to resolve a couple key disanalogies to the real-life problem of selecting a career. These tend to push the optimal stopping point much earlier, with one scenario advocating a stopping point of just 1%. This bears two interpretations:

- Object-level: Take this result literally, and apply it to major life decisions
- Meta-Level: Since the solution is so highly subject to model parameters, don’t take any of this too literally

Note that the model parameters are not arbitrary, so this is different than just conducting sensitivity analysis and declaring the whole model unreliable. I genuinely feel that the modifications made to the original SP make the model more realistic.

Having said that, it is not necessarily true that a more realistic model will yield better results. Going from UNIFORM+PROPORTIONAL to NORMAL+PROPORTIONAL, you are arguably getting a more realistic model, but the stopping point goes from 7% to 19%. NORMAL+TIME_CONTINUOUS is perhaps even more realistic, but results in a subsequent drop of the stopping point from 19% down to 4%.

So the fact that this discussion is more nuanced than the original problem has some benefits, but doesn’t necessarily indicate that results are more applicable.

This is reminiscent of The Atlantic’s “The Curse of Econ 101“, arguing that too naively applied, economic reasoning can be more misleading than useful. Econ-101 is an important step on the path to reasoning rigorously about difficult problems, but there’s no guarantee that taking the step will make your decisions better in the short-run.

**Intuitions are Arbitrarily Bad**

Given the flaws of formal models, we might wish to retreat to a more intuitive stance. Abstractly, you could even extend this to a broader critique against rationalism, or against modernism, or against planning and so forth.

Tanner Greer disagrees. Although intuition, and it’s cousin cultural tradition, were helpful in the past, our current world is too bizarre. He goes on to conclude:

The trouble with our world is that it is changing… What traditions could their grandparents give them that might prepare them for this new world? By the time any new tradition might arise, the conditions that made it adaptive have already changed… This may be why the rationalist impulse wrests so strong a hold on the modern mind.The traditions are gone; custom is dying. In the search for happiness, rationalism is the only tool we have left.

Intuition is not quite the same as custom, but it’s related. Your intuitions might stem from an evolutionary background, or advice from your parents, or a broader set of cultural norms. But these are all maladapted to the current moment, and to your current circumstances.

At the extremes, intuition can easily veer into neurosis. It’s easy to feel “I can’t commit to a career path until I’ve seen more of them, I’m always afraid that there’s a better opportunity around the corner.” Or alternatively, “What I have now is good enough. I should be grateful for this opportunity, and not try too hard to improve my life.”

Formal models might not be right, but they can at least help disabuse us of even worse mental models.

**Further Disanalogies and Alternative Strategies**

So far, we’ve analyzed different optimal stopping points, but only for a single strategy (look at the first K candidates, identify the best_initial, then pick the best candidate from the remaining pool better than best_initial). This is one reasonable approach, but it’s not the only one. From Stein et al. on alternative strategies:

The Cutoff Rule is the one we’re familiar with, and results in the highest peak, making it superior for the original SP. However, the Successive Non-Candidate Rule peaks earlier, making it potentially superior for scenarios that incorporate opportunity cost. Modifying the simulation code to incorporate this strategy is a promising avenue for future work.

In real life, there are all sorts of other strategies we can imagine. The candidate pool is not just an ordered list of opportunities to step through linearly, it’s a huge and dynamic space of possibilities you can jump to in any order.

Job searches in particular are not nearly as blind. You can take a job, consider the particular features that would have improved it, and then seek out subsequent opportunities on the basis of that knowledge. You might also ask friends, read about other people’s careers, take some kind of career aptitude evaluation, and so on.

Additionally, the SP and MSP consider only relative knowledge. As the evaluator, all you know about an candidate is how it ranks compared to previous candidates. In real life, there is some capacity, albeit limited, for more absolute evaluations. A job in a coal mine would (probably) not just be worse than any job I’ve had before, it would be clearly and dramatically so.

Finally, job searches are highly path-dependent. You don’t just “try out” being a PhD student to see if you like it. You pursue that particular credential in the service of gaining access to specific further opportunities, some of which you might not even get to evaluate before taking on a massive commitment. Similarly, you don’t get to “try out” being a billionaire startup CEO until you spend years in other “jobs” on the path to get there.

**Empirical Data from the Effective Altruism Community**

According to a recent survey from Rethink Priorities, when asked what best describes their current career, Effective Altruists replies included:

- Building flexible career capital and will decide later (18.7%)
- Still deciding what to pursue (17.2%)

These percentages add up to 35.6%, which is surprisingly close to the optimal solution to the original SP (~36.8%), but very far from the solutions I propose here.

It’s worth acknowledging that this is not a uniform sample of people at random points in their career. The survey also notes that the mean age is just 30 (median 27). So the respondents are largely early-career, and precisely in the period of life where “exploration” takes precedence over “exploitation”.

There are two additional dynamics to consider.

First, many Effective Altruists view causes as having incredibly high variance, on a very skewed distribution. This results in a very high “Moral Value of Information“. Per Bykvist, Ord, and MacAskill in Moral Uncertainty:

it’s plausible that the most important problem really lies on the meta-level: that the greatest priority for humanity, now, is to work out what matters most, in order to be able to truly know what are the most important problems we face.

Analogously, career choice may involve various meta-strategies, including:

- Spending (relatively) a lot of time evaluating different paths before committing
- Working directly on cause prioritization
- Building flexible career capital while we (collectively) make progress on identifying the most important problems

This tendency is further encouraged by the EA community’s appreciation for exponential growth curves. Rather than more linear views where it’s important to take advantage of known and proxiomate opportunities, the exponential view broadly encourages investment on the meta-level, or investments in the rate of growth itself.

Second, and in stark contrast, Effective Altruists might face an increased sense of urgency, and an need to begin doing direct work as soon as possible. As I argued earlier:

According to an Open Philanthropy estimate and AI Expert Surveys, there’s a 50% chance of transformative Artificial Intelligence emerging by around 2050… If you take this idea seriously, we should be obsessed with the short term to the exclusion of all other timescales.

So while a human lifespan (in the UK) is 81 years, with a retirement age of 65, timelines might be aggressively compressed if everything changes in 29 years. For the median EA at age 27, working life might only last until age 56.

**Proleptic Career Choice**

So far, we’ve assumed that the perceived quality of applicants is stable. In fact, the process of exploration may itself entail a shift in the evaluator’s desiderata. Perhaps working a job causes them to change their beliefs about the quality of subsequent jobs, altered their personal circumstances, or even affected a deep transformation on the level of values. Consider:

- Carol, an ambitious young Stanford grad, initially ascribes high value to Venture Capital and Entrepreneurship. After taking a job as an associate at an investment firm and seeing hundreds of failed startups, she becomes more hesitant to start a company herself.
- Seeking financial stability, Peter initially places the greatest value in investment banking, followed by software engineering. After a stint in software, he’s earned enough money to retire, and now place more weight on non-financial aspects of future jobs, causing i-banking to fall in relative rank.
- After moving to Chicago and experiencing frigid winters, Eve starts to value warmth more heavily and places higher value on future jobs in California and Florida.

Particularly savvy agents may actually take a job they don’t value, expecting it to change their values for the better:

- A burgeoning Effective Altruist from London has no first hand experience with direct aid, and can’t really relate to the plight of the very poor. Nevertheless, they take a job in global development, hoping that they’ll develop a better appreciation for the role once they’re already doing it.

Agnes Callard describes this internal tension at length:

One characteristic of someone motivated by these complex reasons… is some form of embarrassment or dissatisfaction with oneself.She is pained to admit, to herself or others, that she can “get herself” to listen to music only through those various stratagems. She sees her own motivational condition as in some way imperfectly responsive to the reasons that are out there. Nonetheless, her self-acknowledged rational imperfection does not amount to akrasia, wrongdoing, error, or, more generally, any form of irrationality. Something can be imperfect in virtue of being undeveloped or immature, as distinct from wrong or bad or erroneous. (There is something wrong with a lion that cannot run fast, but there is nothing wrong with a baby lion that cannot run fast.) When the good student of music actively tries to listen, she exhibits not irrationality but a distinctive form of rationality.….Thus I will defend the view that you can act rationally even if your antecedent conception of the good for the sake of which you act is not quite on target—and you know that. In these cases, you do not demand that the end result of your agency match a preconceived schema, for

you hope, eventually, to get more out of what you are doing than you can yet conceive of.I call this kind of rationality “proleptic.”

In some version of this view, career choice is not merely a matter of evaluation and selection, but of active exploration, information-seeking, and intentional self-modification.

It’s important to understand this as a dynamic process. Leaving behind the toy model, I’m suggesting that career choice takes part in a self-modulating cycle of:

- Trying out a jobs
- Updating your beliefs and values as a result
- Imposing a new ranking function on the basis of those changes
- Seeking out a next job on the basis of that novel ranking
- …and so on

This is not merely path-dependence. It is a kind of profound illegibility. If the loss function is itself updating in real-time, all optimization techniques fail.

I can think of one avenue for salvation. Earlier, we discussed the case of an Effective Altruist trying to “change their values for the better.” Rather than “values” on the level of “care for animals” or “financial stability”, agents could be modeled as having “meta-values” on the level of, for example:

- Taking on values that lead to long-term satisfaction.
- Aligning emotional motivations with cognitive beliefs about what is right.
- Better approximating a “correct” moral view.

If at least these meta-values were stable, the problem would be at least partially resolved.

**See also**

Robert Wiblin – How replaceable are the top candidates in large hiring rounds?

Stein, Seale and Rapoport, 2003 – Analysis of heuristic solutions to the best choice problem

Chapter 1 of Algorithms to Live By.

Robert Wiblin – The ‘secretary problem’ is too bad a match for real life to usefully inform our decisions — so please stop citing it