Jump to content
North Side Baseball
Posted

This is to all math/stat oriented people on NSBB.

 

With the recent discussion of the Cubs Pythagorean Win Percentage here, I want to pose the question of how to prove this a more robust statistic: the expected number of wins given the number of games played.

 

The formal problem description is this:

 

Suppose the Cubs have scored X number of runs, while giving up Y number of runs over G number of games. Assume each "run distribution" is equally likely.

 

For example, if G = 3, X = 10, and Y=9, then the Cubs run distribution could be (0,10,0), or (1,9,0), or (4,3,1), etc, each all equally likely. Likewise, the opponent run distribution could be (1,4,4), (0,9,0), (2,5,2), etc, again each all equally likely.

 

What is the expected number of wins of the Cubs, in terms of G?

 

Can anybody think of a analytical way to solve this problem, or would I need to bust out Matlab?

Recommended Posts

Old-Timey Member
Posted
This is to all math/stat oriented people on NSBB.

 

With the recent discussion of the Cubs Pythagorean Win Percentage here, I want to pose the question of how to prove this a more robust statistic: the expected number of wins given the number of games played.

 

The formal problem description is this:

 

Suppose the Cubs have scored X number of runs, while giving up Y number of runs over G number of games. Assume each "run distribution" is equally likely.

 

For example, if G = 3, X = 10, and Y=9, then the Cubs run distribution could be (0,10,0), or (1,9,0), or (4,3,1), etc, each all equally likely. Likewise, the opponent run distribution could be (1,4,4), (0,9,0), (2,5,2), etc, again each all equally likely.

 

What is the expected number of wins of the Cubs, in terms of G?

 

Can anybody think of a analytical way to solve this problem, or would I need to bust out Matlab?

 

I think the problem right there is assuming each run distribution is equally likely. If G = 162, X = 850, and Y = 750, there is a near zero chance that the set begins (0, 0, 850, 0, 0, 0, etc...)

Posted
This is to all math/stat oriented people on NSBB.

 

With the recent discussion of the Cubs Pythagorean Win Percentage here, I want to pose the question of how to prove this a more robust statistic: the expected number of wins given the number of games played.

 

The formal problem description is this:

 

Suppose the Cubs have scored X number of runs, while giving up Y number of runs over G number of games. Assume each "run distribution" is equally likely.

 

For example, if G = 3, X = 10, and Y=9, then the Cubs run distribution could be (0,10,0), or (1,9,0), or (4,3,1), etc, each all equally likely. Likewise, the opponent run distribution could be (1,4,4), (0,9,0), (2,5,2), etc, again each all equally likely.

 

What is the expected number of wins of the Cubs, in terms of G?

 

Can anybody think of a analytical way to solve this problem, or would I need to bust out Matlab?

 

I think the problem right there is assuming each run distribution is equally likely. If G = 162, X = 850, and Y = 750, there is a near zero chance that the set begins (0, 0, 850, 0, 0, 0, etc...)

 

I would agree - I think the starting point would be a distribution of all run outcomes. I'd think this distribution wouldn't be a normal curve, but rather right skewed. Not sure how to proceed from there.

Posted

I don't think there can be a mathematical "proof" for a formula for number of games won based on runs scored and allowed, since you can't predict run distribution with 100% accuracy. The pythagorean win record isn't a thereom, it is just an estimate created to give the smallest margin of error for the most cases.

 

EDIT: The best evidence that the formula is accurate is simply to compare expected wins to actual wins, and determine the margin of error.

Posted
I assumed that every run distribution is equally likely to simplify things. Obviously this is not true in reality, but I wonder how well it performs.
Posted
its a sum of squares regression, not a mathematical property. to find the correlation, RMSE or anything isnt all that hard. i just need to go to sleep because i have two finals tomorrow and one starts in 7 hours
Posted
I used to be pretty damned good at all of that stuff when I was in college. I took alot of stats and analysis classes. I haven't used any of it in over 8 years. You guys just made me realize that about half of my college education is now lost and worthless. Thanks alot. :x
Old-Timey Member
Posted

At any rate, there are slightly better ways to predict won loss records than straight pythagorean formula.

 

For instance, using X=((rs+ra)/g)^.285 as the exponent adjusts for league-wide scoring better, and you're better off using AEqR and AEqRA... where runs are predicted from the batting line and then adjusted for the strength of opposition pitching and the opponent's lineup, respectively.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
The North Side Baseball Caretaker Fund
The North Side Baseball Caretaker Fund

You all care about this site. The next step is caring for it. We’re asking you to caretake this site so it can remain the premier Cubs community on the internet. Included with caretaking is ad-free browsing of North Side Baseball.

×
×
  • Create New...