Proof of the Pythagorean Win Percentage?

chuckywang · May 8, 2007

This is to all math/stat oriented people on NSBB.

With the recent discussion of the Cubs Pythagorean Win Percentage here, I want to pose the question of how to prove this a more robust statistic: the expected number of wins given the number of games played.

The formal problem description is this:

Suppose the Cubs have scored X number of runs, while giving up Y number of runs over G number of games. Assume each "run distribution" is equally likely.

For example, if G = 3, X = 10, and Y=9, then the Cubs run distribution could be (0,10,0), or (1,9,0), or (4,3,1), etc, each all equally likely. Likewise, the opponent run distribution could be (1,4,4), (0,9,0), (2,5,2), etc, again each all equally likely.

What is the expected number of wins of the Cubs, in terms of G?

Can anybody think of a analytical way to solve this problem, or would I need to bust out Matlab?

Rob · May 8, 2007

This is to all math/stat oriented people on NSBB.

With the recent discussion of the Cubs Pythagorean Win Percentage here, I want to pose the question of how to prove this a more robust statistic: the expected number of wins given the number of games played.

The formal problem description is this:

Suppose the Cubs have scored X number of runs, while giving up Y number of runs over G number of games. Assume each "run distribution" is equally likely.

For example, if G = 3, X = 10, and Y=9, then the Cubs run distribution could be (0,10,0), or (1,9,0), or (4,3,1), etc, each all equally likely. Likewise, the opponent run distribution could be (1,4,4), (0,9,0), (2,5,2), etc, again each all equally likely.

What is the expected number of wins of the Cubs, in terms of G?

Can anybody think of a analytical way to solve this problem, or would I need to bust out Matlab?

I think the problem right there is assuming each run distribution is equally likely. If G = 162, X = 850, and Y = 750, there is a near zero chance that the set begins (0, 0, 850, 0, 0, 0, etc...)

kroth1342 · May 9, 2007

This is to all math/stat oriented people on NSBB.

With the recent discussion of the Cubs Pythagorean Win Percentage here, I want to pose the question of how to prove this a more robust statistic: the expected number of wins given the number of games played.

The formal problem description is this:

Suppose the Cubs have scored X number of runs, while giving up Y number of runs over G number of games. Assume each "run distribution" is equally likely.

For example, if G = 3, X = 10, and Y=9, then the Cubs run distribution could be (0,10,0), or (1,9,0), or (4,3,1), etc, each all equally likely. Likewise, the opponent run distribution could be (1,4,4), (0,9,0), (2,5,2), etc, again each all equally likely.

What is the expected number of wins of the Cubs, in terms of G?

Can anybody think of a analytical way to solve this problem, or would I need to bust out Matlab?

I think the problem right there is assuming each run distribution is equally likely. If G = 162, X = 850, and Y = 750, there is a near zero chance that the set begins (0, 0, 850, 0, 0, 0, etc...)

I would agree - I think the starting point would be a distribution of all run outcomes. I'd think this distribution wouldn't be a normal curve, but rather right skewed. Not sure how to proceed from there.

Sarcastic · May 9, 2007

I don't think there can be a mathematical "proof" for a formula for number of games won based on runs scored and allowed, since you can't predict run distribution with 100% accuracy. The pythagorean win record isn't a thereom, it is just an estimate created to give the smallest margin of error for the most cases.

EDIT: The best evidence that the formula is accurate is simply to compare expected wins to actual wins, and determine the margin of error.

chuckywang · May 9, 2007

I assumed that every run distribution is equally likely to simplify things. Obviously this is not true in reality, but I wonder how well it performs.

Mephistopheles · May 9, 2007

its a sum of squares regression, not a mathematical property. to find the correlation, RMSE or anything isnt all that hard. i just need to go to sleep because i have two finals tomorrow and one starts in 7 hours

Jehrico · May 9, 2007

I used to be pretty damned good at all of that stuff when I was in college. I took alot of stats and analysis classes. I haven't used any of it in over 8 years. You guys just made me realize that about half of my college education is now lost and worthless. Thanks alot. :x

Rob · May 9, 2007

At any rate, there are slightly better ways to predict won loss records than straight pythagorean formula.

For instance, using X=((rs+ra)/g)^.285 as the exponent adjusts for league-wide scoring better, and you're better off using AEqR and AEqRA... where runs are predicted from the batting line and then adjusted for the strength of opposition pitching and the opponent's lineup, respectively.

Ender · May 9, 2007

There isn't enough data to come up with a meaningful expected W-L record.

Derwood · May 9, 2007

i like pizza

Sign In

Proof of the Pythagorean Win Percentage?

chuckywang

Recommended Posts

Rob

kroth1342

Sarcastic

chuckywang

Mephistopheles

Jehrico

Rob

Ender

Derwood

Create an account or sign in to comment

Create an account

Sign in

Member Statistics

Prospect News & Highlights

Josiah Hartshorn

Recent News

Notes & Rumors

Recent Blogs

Cubs Resources

Minor League Resources

MLB Draft Resources

History Resources