A critical look at the Cubs batting order: RE24 and other musings
By ryanrc,
Transactions and Roster Construction
Dr. Robert Conan Ryan here - your resident management/entrepreneurship professor and economist.
I've been having tons of fun playing with simulations of batting orders, including revisitations of classic ones like the 1927 Yankees; similarly, I've looked at modern roster patterns that don't resemble anything anyone would recognize 100 years ago. My quest: to unlock counter-intuitive and superior results that hunt for a classical/modern synthesis.
Here's three principles I uncovered, which should apply to the Cubs' current lineup, The Cubs are a particularly tricky team for batting order construction because of the overall parity between player talents and the various internal contradictions in the player skillsets; thus, these principles clarify the problem.
#1: MICROTOOLS LED TO STRONG PARITY ACROSS A CONTENDING TEAM'S ORDER.
The ideal batting order evolved across baseball history to match the increasingly complex distribution of skills in the league; therefore, optimization of contemporary lineups is far less lopsided affair than in the past. It is now very sensitive to a team's "portfolio of microtools".
#1a) Classic batting orders were based on player fit into consistent archetypes, with "macro tools" like the basic hit, slug, and smallball (leadoff) archetypes. Contending teams still had large chasms between the handful of archetypal superstars in the top of their order and the rank-and-file replacement players in the back (glove-first players).
#1b) So called "modern" lineups - the first few decades of modern free agency coinciding with the serious for-profit shift in club philosophy - relied more on the "micro tool" manipulation of splits, pinching, and platoons to keep costs down and make the back of the order more useful. On the other hand, steroids seemed to uphold the persistence of superstar archetypes to fill classic roles for big market super-squads.
#1c) Contemporary lineups have a much smoother distribution of niche talents, with most major league players on contending teams having at least some significant hitting value, thereby improving parity between the front and back of a contender's batting order. Even the superstars are less likely to fit a classic archetype across all metrics. Furthermore, the international nature of the game has drawn in talents from various cultures around the world, each with their own quirks, especially abilities to hit homespun, familiar styles of pitching that also has come into the majors.
#2. SIMPLE METRIC ORDERS DON'T WORK. WE LIVE IN THE "DOZEN+ VARIABLE ORDER" ERA
#2a) Classic orders were based on two archetypal stats. The 927 Yankees worked like this: Leadoff, highest average, best baserunner. Second, lowest strikeout rate, best small ball skill player for advancing runners. Third, best home run slugger, highest walk rate. Fourth, best extra base hitter with runners in scoring position. Fifth and sixth, the best OBP and then SLG among remaining players. The rest was not heavily analyzed, as it was generally assumed the back of the order of most teams was close to parity; however, it was common to "reset" the bottom order at 6 with a second leadoff guy, and then the rest in order of OPS.
#2b) This era was deeply divided by the designated hitter rule and corrupted by steroids. So, I won't bother analyzing it carefully; however, the major updates involved using a more fine-tuned bench for matchups. Classic bench players were usually runners and fielders, with awareness of lefty/righty builds; however, big market teams started carrying an extra bat-first player that could platoon either in the DH role or to substitute for elite glove-first or opposite handed starters. this was a departure from the older rosters that tended to prefer the best bats always start the game to establish a lead.
#3) Contemporary media people circulate metric systems that aren't accurate. For example, the most popular single variable pattern is to just sort players by OBP, from top to bottom, down the order. The best two-variable improvement is to favor slugging over obp around #4-5 and around #7-8, To make it a three variable pattern, restore the idea of the leadoff player being an elite runner, even if they are merely #2-3 best in OBP, and put great runners also around the #2, #6-7, and #9 holes. This new step assumes the back of the order plays more smallball and the front plays more home run derby. By this point, it becomes obvious that most teams will struggle to build an ideal squad on 3 variable systems - yet, even this isn't enough.
Now add left/right splits, which may radically shake up an order. Okay, it gets worse: add the value of high contact and low strikeout profiles, particularly around #2-3, #5, and perhaps #7. even still, runners in scoring position can sometimes create counter-intuitive flips, where high contact players actually give you more chances to succeed by extending rallies than relying on a strikeout-prone homer masher. So, #3, #5, and #7 become the critical rally-extending spots. But it doesn't end there - you can further improve by finding a fast, high OBP player in the #9 hole, but if, and only if, you have enough parity across your roster that the back of the order is highly productive. In such a case, the #9 spot can steal thunder from the #6 because of the frequency of opportunities the #9 players have to "reset the order" in crucial late-game innings. Again, once you build a bench, things get even more complicated, because you can use subs as golden bats and golden runners to squeeze more flexibility out of the batting order. I
What are the dozen core variables that are minimally capable of predicting a good batting order?
Baserunning tool, Contact rate, Smallball tool (sacrifices, bunts, place hitting success rate), xOBP, xSLG, xWOBA (recent rolling average), AVG, WAR, HR/AB, K/AB, RISP, L/R Splits, RE24 Splits
#3: RE24 COMBINED WITH STATCAST METRICS MAKES BATTING ORDERS TRULY DYNAMIC
RE24 Is a statistic that didn't exist in older eras, so we only need to evaluate the contemporary situation. Runs Expected 24 is a basic matrix of outcomes that may occur whenever a player comes to bat. Depending on 24 different split situations- such as 1 runner on, 2 out, or 2 runners on, 1 out - an average player is expected to produce an average result in that situation in a particular stadium or generically across the whole league in a particular season. Then, we can measure how well a player performs in every situational split versus the average player. Optimizing a batting order becomes dynamic, because every time you move one player up or down in the order, it has a chain reaction of effects on whether other players are more or less likely to find themselves facing their ideal split combinations. It gets even MORE COMPLICATED if you add weights to the RE24 matrix based on advanced Statcast analysis, such as predicting RE24 based on a players' average exit velocity, squared up rate, chase rate, and so on. Suddenly, you cannot merely assume the best hitters should hit clustered at the top of an order, because it can cause a chain reaction of inefficient RE24 across a 9 inning game.
For example, suppose the Cubs have Nico Hoerner lead off because of his baserunning, high average, poor homerun rate, and balanced pitch type splits. However, his RISP is elite, and now it has a low percentage chance of being used where its most needed in the RE24 cycle across a 9-inning game. If a team is so lucky to have TWO elite RISP players, you may pass on this problem. However, if he's by far the best in RISP, which is the case of the current Cubs roster, having him in the one hole creates a suboptimal TEAM result. AS the elite rally extender, his profile is closest to ideal RISP in the #3 or #5 spots, the ideal places to ensure an inning doesn't end.
Here's another example. Some analysts prefer Michael Busch as leadoff because of his high on base percentage and general hitting ability, with decent running ability. However, they again overlook that his RE24 analysis would be suboptimal for the team unless there were enough alternatively good all-around hitters with elite wXOBA and high HR/9 for the #2 and #4 spots. The Cubs do have options: Suzuki or Bregman could fill those holes.
Here's a back-order quandary: where do you put PCA? Clearly when he's hot, he could be an asset just about everywhere because of his rare toolset. However, his strikeout rate and low OBP are a huge buzzkill for the top of the order. Due to all of his internal contradictions, my analyses usually place him in the #6 (when hitting well) or #8 hole (when slumping). Still, he's an outrageously good asset to have in the late order because of his many ways to help win a game - he's just too random to be trusted.
PUTTING IT ALL TOGETHER: TWO BASE ORDERS FOR THE CUBS
Look, the true answer is obviously to adapt the order continuously to the situation Craig Counsell finds himself facing. The Cubs are an unusually deep and balanced order- part of Jed Hoyer's roster design strategy is to have a smooth order with no true weak spots. When your "weak spots" are projected to be Matt Shaw and Dansby Swanson - both strongly tooled hitters and excellent runners - you really can't go wrong. Players go hot and cold, have good and bad days in practice, play with minor injuries, have very long and detailed split data, etc. Thus, this base order is simply based on long term projections, and is likely to more closely track as the season progresses.
Today's managers will need to constantly tweak these orders to deal with so many factors and the streaky habits of ballplayers. Still, we can use a long run analysis to project the best base orders versus LH and RH pitching. Using all twelve variables, and assuming all the younger players take a step forwards this year, here's what I came up with. Assume vs LHP, Shaw plays RF, Suzuki is DH, and Ballesteros sits out, with Kevin Alcantara at center.
VS RHP, base: VS LHP: DOMINANT LOGIC
HAPP SHAW BEST RE24 FROM LEADOFF SPOT
BUSCH SUZUKI HIGHEST WAR HITTER, TOP 3 OBP, TOP 3 SLG, CAN RUN
HOERNER HOERNER RALLY EXTENDER (ELITE RISP, ELITE CONTACT, LOW K RATE)
SUZUKI KELLY/HAPP BEST REMAINING BALANCE OF ALL 12 STATS
BREGMAN BREGMAN HIGHEST OBP REMAINING, WORST RUNNER
PCA ALCANTARA LEADOFF/POWER HYBRID, HIGH K RATE
BALLESTEROS HAPP/AMAYA HIGHEST REMAINING OBP, CONTACT RATE, AND RISP
KELLY/AMAYA BUSCH POWER WITH PLATE DISCIPLINE
SWANSON SWANSON RESETS ORDER, HIGH K RATE, SPORADIC HEROICS
Note: In an early draft of the vs RHP order, I had moved Ballesteros up to #4 cleanup, assuming he lived up to full expectations, which elevated Suzuki to #3 and demoted Hoerner #6 *PCA's HOLE) with PCA hitting at #7 for Ballesteros. This is the best LONG RUN ALTERNATIVE VERSION, assuming he heats up, but I'm trying not too make this analysis too complex to understand, so let's leave it at that.
WRAP-UP
Here's a fun game: Using contemporary thinking, how would you change the 1927 Yankees order?
Let's remember, before we proceed, that when you have a dead spot in the order from pitching, you have a fundamental dilemma. If the pitcher is 9th, then you starve the elite bats the next time around to the top. However, the pitcher at 7th would increase their at-bats substantially. Thus, most lineups are improved by simply moving the pitcher up to 8th and placing a speedy high average player into the 9 hole. You still don't want to move a serious bat down to #9 because of how it would reduce the overall RE24 of the middle of the order, and because they won't get enough at bats across a season that way. The exception is of course the playoffs: you don't care about maximum use of at bats when you're focusing more on situational heroics.
I ran simulations that showed the ideal playoff order would be a bit counter-intuitive (I didn't optimize the regular season for this exercise). Think about how the Dodgers have gotten tremendous work from Kike Hernandez in October, despite merely average season performance, year after year. Well, despite the many criticisms of using Mark Koenig high in the order, when modernists would prefer elevating Meusel, I found their runs expected wasn't as good as mine.
Classic: Modernists: My order, best runs expected:
Combs Combs Combs
Koenig Ruth Ruth
Ruth Meusel Koenig
Gehrig Gehrig Gehrig
Meusel Collins Meusel
Lazzeri Lazzeri Lazzeri
Dugan Koenig Dugan
Collins Pitcher Pitcher
Pitcher Dugan Collins
Here's a link to the stats!
1927 New York Yankees Statistics | Baseball-Reference.com
- Read more...
- 3 comments
- 1,034 views

