I was inspired by Aaron to spend some time on the Baseball Savant website. I chose to study my favorite engima: how Jorge Soler can hit the ball so damn hard and yet have mediocre results. I'm still getting my arms around using the site to do good analysis, so I figured I'd take an easy first step and just create some fun charts. So, I downloaded Jorge's available data and pulled it into my favorite data analysis tool.
The first chart I looked at was a scatter plot of Exit Velocity vs Launch Angle.
This chart shows every single batted ball captured by statcast in 2015 and 2016. The color coding works like this: outs are in gray; singles are in yellow; doubles are in light green; home runs are in dark green. This is just one player's data, but there are already a few obvious things from this chart. Hitting the ball hard is good - that's where all the home runs and doubles live. Getting some loft on the ball is also obviously beneficial for hitting for power, but there are a lot more hits when the ball is at a low launch angle than if the ball gets too much loft without the velocity required to get it out of the park.These charts were fascinating to me. I was expeciting a normal (or bell curve) distribution for both charts, but the exit velocity chart is very obviously skewed (the tail is much longer on the left side and the curve peaks towards the right). The chart shows what our eyes see - Jorge consistently hits the ball hard. What is interesting about the combination of the charts is that launch angle appears to be much more highly correlated with the outcome than is exit velocity. You can see this by how just about every outcome type shows up for each of the exit velocity groupings, but each of the launch angle groupings typically only has a few potential outcomes. This makes good intuitive sense - you can't hit a ground ball when you get loft at the launch; conversely you can't get a flyout when you pound it into the ground.Another way to look at this is through the use of box plots. For each of the bars, the center line is the average; the box itself represents the middle 50% of the data; the lines show the maximum and minimum values for each type of event. Again, this is just another way of looking at the same data, so it shows the same things. For example, doubles and home runs have a very similar profile from exit velocity. But home runs tend to have just a bit more loft to them when leaving the bat.Most of this is pretty intuitive. But it will be more fun to start diving into compartive data and looking at the profile of how Jorge hits the ball versus average hitters and elite hitters.Leave some comments below in how you'd like to see the data and what comparisons would be interesting.