Tuesday, July 15, 2014

So who wins the Goon of the Year?

So now that it’s off-season, it’s the perfect time to determine this year’s recipient of “the Goon of the Year” award. Let’s look at some pretty Gephi pictures.imageSo we have the beautiful galaxy of all the fights that occurred in this 2013~14 season. Obviously, we want to examine the big cluster in the middle.

imageLet’s zoom in a little bit more…

imageThere you go. We can see that this little planet called “Tom Sestito” has some kind of gravitational force that attracts all the goons in the NHL galaxy.

Tom Sestito wins this year’s “the Goon of the Year” award. He led the league with 213 Penalty Minutes. He had 19 major penalties, 7 different dance partners, and 1 game misconduct. Congrats!

There are no new scripts for this blog post because I’m using the R script I came up with 6 months ago. It can be found here.

Friday, July 11, 2014

Steve Downie

I haven’t been sleeping well the last few days, so I decided to get back to writing R scripts.

The Pittsburgh Penguins recently signed a new goon. The notorious Steve Downie. But how notorious is he? First, I’ll lay out some statistics about Steve Downie.

  • He played 62 games last season, split between Colorado and Philadelphia. 
  • He had the 26th highest Penalty Minutes in the entire league at 106 PIM.
  • If he played a full season, he would’ve had 140 PIM, which would rank him in top 10 most penalized players.

So great. He takes a lot of penalties. One might argue, “Sure! That’s expected of him because he’s a goon. He’s expected to fight and get a few game misconducts here and there!” (he had 4 fights and 2 game misconducts last season to be exact). So then let’s look at his Minor Penalty statistics to see if he takes primarily dumb penalties or fighting penalties:


There are a few takeaways here:

  • Steve Downie earned the 18th highest minor penalties last season with 33.
  • If he played a full season, he would’ve had 44 Minor penalties. 
  • This would catapult him into top 3 in NHL for taking dumb penalties.

Another thing I’d like to take away from the table above is that the players with red boxes around their names are goons (plus Steve Downie). The rest are top 6 or top 4 defensemen who log a lot of minutes for their respective teams. Out of the 30 top most penalized players in terms of Minor Penalties, only six of those players are bottom six goons. And Pittsburgh Penguins just acquired one of those six. 

So then we can play devil’s advocate, and question the sample size. Below is the R scripts I ran for calculating “Number of Minor Penalty Minutes per Game,” and “Number of Fights Per Game,” since Steve Downie entered the league in year 2007. And below the data.frame(), I have also displayed the linear regressions for both statistics. 


Year 2012~13 was an injury year for Stevie. But the rest of the numbers seem consistent. Attached below is a simple graph of the two statistics I calculated:


As the graph suggests, Steve Downie’s Minor (or dumb) Penalty Minutes per Game sits at around 1.2 every year. This should put him in top 5 in the league. His Fights per Game ratio seems to be dwindling a little bit each year. You can expect him to fight about .19 percent of the time, or once every 15 games. Bottom line is, we picked up someone who takes a lot of dumb penalties. Expect the Pittsburgh PK unit to lo a lot of time next year.

So that’s it! That’s all I have for you. As always, you can find all the scripts on my Github and larger images for the attached images can be found on my imgur. Thanks, and I’ll try to write more.

Sunday, December 1, 2013

Visualizing Goons

Okay, so it’s finally here. I spent an unnecessarily large portion of the Thanksgiving Break finishing up the HockeyFights.com scraper. It’s in my Github (hockeyfights.R). The reason this scraper took so long was because the HTML table format of the HockeyFights.com is incredibly annoying to work with. In addition, the website blocks you if you scrape too often.

Anyway, the script spits out “final.table.” It has all the fights that occurred in the 2013/2014 season. All the goon names are displayed in the first two columns. The third column, displays who won the fight, but in a Gephi-friendly format. It was a bit tricky to write because first, you have to determine who “won” the fight based on the awful hockeyfighst.com tables. Then, you have to put it in a Gephi acceptable format. What I mean by that is, if goon1 beats goon2, in order to show that as a Directed Edge in Gephi, you have to use a concatenated string form like this: “goon1;goon2.” Makes sense? So here are some entries:


Once you have that, the rest is just importing the data and playing around with it with Gephi. Here is something I created:


All the pussy goons who got into just one or two fights are all in the “parameter.” The real enforcers are in the middle, beautifully intertwined with one another. They are in a league of their own. Tom Sestito looks like the front runner in this year’s goon race. Anyway, I will create more visual graphs when I have the time. It’s amazing how much you can accomplish once you have some nice clean data. 

One last note before I finish. I’ve noticed that you can’t view the images in full size with Tumblr. So I decided to upload all the images I’ve used for the blog to imgur. My imgur URL is: http://goonstats.imgur.com . Here, you can find all the old images from the blog and view them in full size. Anyway, cheers!

Sunday, November 24, 2013

Busy being Lazy

Nope, no post today. I didn’t even get to finish my HockeyFights scraper. And also, I’m pretty behind with my research. I’ve been busy being lazy. Sigh….

Happy Thanksgiving, and I’ll be back next week.

Sunday, November 3, 2013

74, The Magic Number

So, let’s answer this fucking question: “Does body size matter in goaltending?” I spent a good portion of this week writing a script for scraping Goalie data from NHL.com. It can be found in my Github, along with all the other R codes I wrote so far for this blog. To answer this question, I’m using the complete goalie data from the 2012~13 season. 

Now that I’m armed with the NHL Goalie data that tells you Height, Weight, Birthplace, Save Percentage, Win, Losses, all the useless statistics you can all think of, the very first thing I did was to plot every goalie according to their Height and Weight (fortunately, there were only 82 goalies that played in the 2012~13 season).  And here’s what we get:image

Alrighty… It looks like there are a lot of goalies that are 74 inches tall, but not too much normality when it comes to weight. So let’s verify this. The average height of a goalie is 73.7 inches, median of 74 inches, and the standard deviation of only 1.8 inches. When it comes to weight, the average is 197.7 lbs, median of 197 lbs, and the standard deviation of whopping 14 lbs. So now we’re settled on the concept that Height has a pretty good normality level at 74 inches (in a very elementary statistical way), let’s one up this. 

In this scatter plot below, I’ve plotted the same Weight vs. Height scatter plot, but added an extra element to it. Each plot is scaled by the number of wins these goaltenders had in 2012~13 season (so the bigger the size of the plot, the more wins the goalie had):image

What’s amazing about this is that some of the best goalies in the league (starting goalies with over 20 wins in a shortened 48 game season) are all 74 inches tall (or very close to it)! Look at how many starting goalies are all centered around that 74” ball park: Braden Holtby, Tukka Rask, Sergei Bobrovsky (who won the Veznia in 2013), Marc-Andre Fleury, Antii Niemi, Henrik Lundqvist etc….the list goes on. For me, this is an incredible discovery. Unfortunately, statistics often do not tell us the “Why?” or the “How?” All we know is that these elite goalies are all of equal height. We can only assume that this magic number could be the “preferred” size for NHL goalies.

Below is another plot I created during this experiment, but wasn’t sure if I should leave it out for the blog post or not. It’s a distribution of Height/Weight Proportion vs Number of Goalie Wins. This is a weak argument for trying to show that weight doesn’t have significant effect in goaltending. The correlation value for Height/Weight Proportion vs. Numboer of Gaolie Wins turns out to be only 0.215. It didn’t occur to me at the time, that showing the correlation value would’ve been a stronger argument than plotting the distribution: image

Anyway, I had a lot of fun scraping Goalie data, and doing some analysis on it. Let me know if there is any specific topic I should tackle. Cheers.

Sunday, October 27, 2013

What’s Gephi Really Good For?

Shortly after I posted about Bryzgalov last week, an asshole friend of mine asked me a really interesting question: “Does body size matter in goaltending?” What a great question, asshole! I thought about this a lot and really wanted to write about it for this week. But quite honestly, I couldn’t commit myself to writing a scraping script. I don’t think it would be hard because on NHL.com, player height and weight information are readily available with (slightly misleading) career statistics. But I just couldn’t commit myself (things have been coming up in my personal life lately). So the answer to the question from my asshole friend would have to wait until next Sunday.

On a very non-hockey related note, I got an e-mail from Tumblr saying that today was one of my old blogs’ 1 year anniversary (http://isomorphicgraph.tumblr.com). I had a total of 4 blog posts for that one. I mostly wrote about Data Visualization using Gephi. At the time I was writing on Isomorphic Graph, I really thought Gephi was the next, up and coming software for visualizing relatively large data (that you wouldn’t call “Big Data”). I thought its ability to represent data into nodes and edges (like you do in graph theory) was really cool. I used it to visualize my Facebook network. I used it to visualize Twitter hashtags. I used it to visualize random datasets that were already provided on Gephi database. 

But looking back, I think Gephi’s limit ends there. Today, I thought about somehow applying Gephi to show some kind of stunning random relationships in hockey. But what relationship? After mulling over this for a good hour, I gave up on the idea. I couldn’t think of anything because the real world data you work with is never a large N x 2 adjacency matrix that shows relationships between two elements (or nodes in our case). Gephi is great for showing social networks such as Facebook and Twitter. Its overall concept is to create pretty visual aids to show relationships in data. But unfortunately, Big Data is not a series of binary relationships.

Let me know if you think I’m completely wrong, or I’m just stupid. Below are some of the useless visual aids I came up with using Gephi from those delusional days:




Sunday, October 20, 2013

Another Way of Looking at Mr. Universe

Once upon a time, Ilya Bryzgalov, was the most “exciting” goalie in NHL. He popularized the phrase, “Why You Heff to be Mad?” during his early years in Anaheim. In 2010, he single-handedly led the underdogs of the West, Phoenix Coyotes to the Stanley Cup playoff berth, which resulted in his nomination for the Veznia Trophy. He signed a whopping 9 year $51 million contract with the Philadelphia Flyers shortly after his stellar 2010~11 season. He brought his profound knowledge of the Universe to the National Hockey League, and preached the “Don’t Worry, Be Happy” attitude to the young Philadelphia locker room. And finally, he was bought out in the offseason of 2013. As of today, he is an “Emergency Backup Goaltender” for the ECHL’s Las Vegas Wranglers. 

So what went wrong with Mr. Universe? Shit, I don’t freaken know (if I did, I would be an NHL coach). But what we can do is visually observe his digression over the last few years. 

My initial thought process was that I should just plot all the goals Bryz gave up on a season-by-season basis, and try to explain the change over time. Well, so I did:image

This is a plot of all the shots that Bryzgalov faced in 2009~2010 season (the year he was nominated for Veznia Trophy) on a hockey rink scaled axes. As you might’ve guessed, the red dots are the “goals.” 

So, how useful is this scatter plot? Not very. In fact, if I created scatter plots for the last few years, you wouldn’t be able to tell the difference between any of them (when we should clearly notice a significant difference in his final year in NHL). In other words, they are not visually helpful. 

So, another way of looking at the same data is by looking at the Kernel Density of the goals scored. Kernel Density Estimation is a smoothing technique for a finite sample data. You can make a connection with Polynomial Regression in a sense that it tries to find a “smooth” line that runs across points. But Kernel Density counts the number of occurrences and estimates the density of that occurrence, and normalizes the curve based on the “smoothing” parameter or the bandwidth value of \(h\). This is the formula for the Kernel Density Estimator:

\[\hat{f}(x) = \frac{1}{nh} \sum^n_{i=1} K\Big(\frac{x-x_i}{h}\Big)\]

So, the higher the bandwidth of \(h\), the more “precise” the \(KDE\) becomes. Now, we apply the KDE2D function on all the seasons since his Veznia Nomination year until his downfall (2009~2013), and this is what we get:image

These are kind of hard to see because I used default bandwidth of 25. But if you look at the 2009 season and compare it to the 2011 and 2012 seasons, the colors in the “eyes” become brighter and stronger. This shows that Ilya started to give up more and more goals in these areas of the ice. Also, the 2011 and 2012 seasons have more random yellow spots outside of the “eyes.” This means that Ilya also started to give up goals in uncommon areas.

For the viewing pleasure, I created the Kernel Density plots of the 2009 season and the 2012 season with a much higher bandwidth of 200:imageimage

Now, it’s easier to see the huge difference in performance, and kind of guess what happened to Bryz. The Kernel Density plots show that in 2009, Mr. Universe was actually okay. In 2012, he shat all over the place.