I was going to take a Braves Journal hiatus after Sam called Ububba and me “weirdos,” but then I realized I actually am a weirdo (not speaking for Ububba here) so I decided to answer Alex’s request to explain how a published article can claim that Jerry Manuel was the 10th best manager of the last 40 years.

Let’s take a step back. How do you rate a manager? What you have to do is infer what the team would have been like if another manager were there. Of course that’s impossible, so anybody who tries to do this and puts forward anything interesting is at least trying.

Second, it’s almost impossible for a manager to be worth as much as say, a star player. We know this in several ways: a) they make a lot less; b) when a manager comes on the market, the other teams don’t usually act like changing managers is worth 10 games; and c) we’ve already managed to figure out win responsibilities from the players’ output alone, and it sums to pretty close to the team’s output. (To be fair, this assumes that the manager knows who to put on the field.)

So enter Brian Goff of Western Kentucky. His article “Contributions of Managerial Levels: Comparing MLB and NFL” just appeared in the journal Managerial and Decision Economics. (I’m not sure that this is the best edited journal, since his name appears on the cover page as “Goff Brian.”)

In this article he estimates a nested random effects model to try and figure out the relative contribution of owners, GMs and managers. (Players have no effect, since they’re just, in his model, the raw materials with which owners, GM and managers work. This simplifies things a lot, since a team has only one owner, one GM and one manager at a time, but lotsa players.)

So this is a nested model, which means that first we estimate an effect for each owner, then within each owner, we estimate a GM effect for each GM (so that, for example Omar Minaya will have a different effect with the Expos than with the Mets) and within each owner-GM pair, we estimate an effect for each manager (so that Joe Torre not only has a different effect with the Dodgers, Braves, Mets, etc., but his effect changes when, as in LA, the ownership changes).

There are a few other variables in the model which account, crudely, for market size and some other variables which account, crudely, for inheriting good or bad teams. But that’s about it.

What Goff finds is that managers matter a fair bit, though he presents his results in a way that is difficult for a fan (or even an economist/statistician) to get a handle on. His critical result is that “8.5% of the variance in winning is attributable to variation between MLB managers with the GM effect nearly the same at 6%.” Owners, by contrast, contribute nothing. (Take that, George Steinbrenner!).

This is not phrased in the most intuitive units. From 1970-2011 (the period of Goff’s data), the Atlanta Braves averaged a 51.6 percent winning percentage with an annual variance of 0.007. If the manager explains 8.5 percent of that, then variance explained is 0.0006, or about 2.4 percentage points in winning percentage standard deviation (take the square root of variance); roughly 4 games a year. The GM is responsible for another 3 games. That sounds roughly right to me, though the standard of comparison is not exactly clear. (This is not a precise calculation, but, hell… this is a blog.)

But nobody care about this part — they want the dirt on individuals; Forbes Magazine wants to know who is a genius and who sucks. And Goff obliges. He “discovers” that Bobby Cox is the best manager and John Schuerholz is the third-best GM. No gigantic surprises there. But Alex was clearly surprised that Jerry Manuel was the 10th best manager and that Danny Murtaugh is better than Earl Weaver and Walter Alston.

Two things to note, however. With respect to Alston, he’s only in the data for six years — his last six, after Sandy Koufax had already retired. Danny Murtaugh is similarly being measured only in the last six years of his career and those are six pretty damn good years: http://www.baseball-reference.com/managers/murtada01.shtml

Finally we come to Jerry Manuel. I live in New York and I know he wasn’t a good manager. But there are two important things to remember here: first, these rankings have big margins of error, and the closer you get to the middle, the more arbitrary the ranking are. Goff doesn’t present any uncertainty about his estimates — without those, you can’t make any conclusions about rankings. If Jerry Manuel is 10th best, but insignificantly different from whoever is 20th best, then the rankings are just too volatile to be useful — it’s sort of like using one year’s home run totals to rank home run hitters — it’s good data, but too far from perfect to actually rank hitter A over hitter B.

Plus, the methodology ranks Jerry Manuel as a function of the people around him. If you have Omar Minaya as your GM and Fred Wilpon as your owner, managing to generate a winning overall record makes you a minor genius — that’s an implication of the nested model.

So, as always, IWOTM rules. (Plus the White Sox.)

Note from Alex: I got an email from Brian Goff. Here is his explanation for his methodology:

“As a quick synopsis, I’ve taken team yearly data winning percentage for 1970-2011 and estimated a hierarchical regression (same thing known in other circles as a random effects model with different managerial levels nested within higher levels). Besides the individual manager-GM-owner effects, I’ve calculated an “endowment” variable for each person that equals the winning percentage of the team in the year prior to that individual’s arrival. This endowment measure then converges back to 50 percent (league average) over a few years (I did a search for the optimal years for convergence) so that the initial endowment for the individual does not matter once the person has been there for a while. I’ve also included metro area population (updated each ten years at census midpoint), first year expansion, and a term to account for correlation from one year to the next, AR(1).

So, whether in some broad sense these are the “best,” I can’t say. In the narrow sense of my study, my results indicate which individuals had the highest contributions to winning percentage after taking into account the other management decision makers with the team, the state of the team at the person’s arrival, city size advantage, and year-to-year carryover or momentum effects. Hope this clarifies things.”


Later, he emailed me back to write this:

“The issue of managers whose careers are not fully captured by my sample 1970-2011 means that I’m really comparing the back of someone like Murtaugh’s career to the full career of others. At the time of putting the data together, I didn’t have access to full GM and owner data for some teams prior to 1970. Even if I did, the farther one goes back, other issues of non-comparability arise such as racial integration, divisions, … I would not suggest 1970 as the only reasonable starting point, but as defensible one. Maybe down the line I can go back and redo the list to contain only those managers whose careers started after 1970.

On your other questions, let me emphasize that that the focus of my study is to estimate individual manager, gm, and owner contributions to winning by means of estimating coefficients for each person taking account of hierarchy of management and controlling for a few other factors (team performance prior to the manager’s arrival, population, expansion). Nobody, to my knowledge, had accounted for the hierarchical managerial aspect. In addition, tying each manager to an “endowment” was a key feature. Many of the questions that arise in the sabermetrics world, while interesting in themselves, were not at the heart of my study.

With that said, the topic of the best measure of team or managerial performance is tricky. I suppose that I would make the case that whatever may be most highly correlated with long run winning percentages, yearly winning percentage (or league position) is really the yardstick of success. The 1965 or 1966 Giants weren’t better than the Dodgers those years in spite of their 1960s winning percentage being higher for the decade. One can certainly make a reasonable argument that alternative performance measure like Pythagorean winning pct or similar measures such as simply run differential may in some respects provide a better measure of team strength or performance (whether in the short run or long run). Nonetheless, it’s winning that ultimately matters. In fact, looking at winning percentage relative to scoring (whether by computing a ratio or in a regression) has actually been used as a measure of managerial efficiency in a short-run, “technical efficiency” sense. It gets out the idea of how well a manager is making use of the runs they score and give up. In these models, higher run differential but lower winning is a measure of poor performance by a manager, not better.