Jump to content

Regression Analysis on STHS Attributes in the Hybrid Era (S83-85 Regular Seasons)


jacobcarson877

Recommended Posts

Introduction

 

I have always been interested in hockey analytics and data science as a whole. I took a stats class a couple years ago and that got me interested in the idea for the first time, but it wasn't until I took a data science course last year and worked with regression for the first time that I started to really put together the resources I would need to undertake this project. I had meant to start it a while ago, when Nykonax first pointed out to me the work done by the likes of and Eaglesfan and Motzaburger done in 2019, and thus began my path towards this attempt to update those resources according to the hybrid attributes. While I didn't actually use the hybrid attributes themselves as part of the process (because they aren't real and would bring out some even more wonky conclusions than I already have), there are obviously some major changes regarding how we can and do build.

 

How the Data was Gathered

 

I took the season-ending attributes for each player for each regular season between 83-85. I obviously cannot track everyone's spending live during the season and I wouldn't want to anyways. I originally had bots in the mix, just to fill out the dataset more, but they were skewing the results too much so I scrapped them. I still ended up with 448 season totals, so I'm alright drawing some conclusions from those. Anyways, I'll be breaking a lot of the rules of data science here, so don't expect this to pass peer-reviewing or anything. I'll mostly just be commenting on trends, surprises and my thoughts on why those things happened.

 

I used an old homework assignment of mine as the basis of doing this analysis. It uses an optimization function to reduce residual sum of squares, or essentially the sum of all the distances away from each real point from the regression line. I used this for plotting mostly so I could visualize the relationships, but I gathered most of my R2 values (how much variance each independent variable explained), for both linear and multiple regression using Leave One Out Cross Validation and Bootstrapping Confidence Intervals to once again check the directions of relationships. I also made sure to use the corrected assignment that my prof returned to me and not the one I submitted. I did all of my coding in Python, although I probably would have attempted this in R if I didn't have nearly completed code sitting in my documents folder.

 

Goals/Shots

SC is king here, as we all likely expected. PH and SK had a decent showing here, but PH was more impressive, and SK likely did as well as it did due to the fact that we've all just been upgrading SK because what else is there to upgrade. DF and FO both did well, because both establish possession, meaning you had a puck to shoot and score with. I expected a bit more from DF, but it will have its chance to shine later. PS and ST get a little messy here, as the model can interpret increasing ST and PS as positives, where it is really just the act of building a gap between PA and SC that matters. PA alone however doesn't seem to really have an effect either way, which is reassuring. I lumped goals and shots here together because they are generally affected by the same attributes, and generally shots linearly affect goals, since every shot in STHS is as likely as any other to go in.

 

SC +++
PH++
SK/DF/FO +

ST/PS ?

 

Assists

 

There seems to have always been a debate as to what makes a good playmaker, or what makes you get more assists. It is hard to really tell, as the assist isn’t the end result, the goal is. You can’t necessarily achieve an assist on your own. But it seems as though someone who holds and doesn’t lose the puck tends to get more assists. What is interesting to note, is that despite pre-hybrid testing, PA does in fact have somewhat of a minor positive correlation with assists. Now PH, SK and DF are definitely much more important, but PA may not be as detrimental as it used to be.

 

  PH/DF +++

SK/SC ++
PA +

 

Hits/PIMs

 

So basically CK is the only thing that matters here. SK and DF seem to correlate a bit, but I assume that is mostly due to people's previous assumptions as to how they should build a defensive player. Hits are rather controversial as a stat anyways, at least without the assistance of takeaways as well, as every contact made in STHS counts as a hit, regardless of who retrieves the puck afterwards. One could reasonably assume that someone who hits a lot recovers the puck a lot, but that simply isn't a definite fact. While CK obviously will result in puck retrievals, it also is the leading contributor to PIMs. In fact it is the only really relevant attribute when talking about PIMs. One thing to note is how DI affects both stats. DI does effectively reduce PIMs, but it reduces Hits at a much greater level. So when it comes to ratios, you will find your Hits:PIMs ratio get worse, while both of the totals go down as well. Regardless of whether you want to get Hits or not, DI is likely a waste of your TPE. Sorry everyone.

 

CK +++

DF/SK +
DI -

 

Shot Blocks

 

This one is probably the least pleasant result I found. In previous reports, there was some lack of meaningful results, but nothing like mine. While attributes like DF do have some reasonable correlation, nothing really added up to a reasonable sum that would be enough to describe the trend. So I was forced to come up with another answer here, that some of our particularly defensive minded players may not like. It seems as though the best way to earn more shots blocks, not shockingly, is to have more pucks shot at you. And by that I mean be on a worse team. Obviously even playing on the PK on a good team will get you some extra shot blocks, but seriously, no combination of attributes even came close to explaining the variance of shot blocks. Perhaps if I isolated just D, there would be a trend, but that sounds like too much extra work (it really wouldn't be).

 

DF +

 

Save Percentage

 

Alright so here is where the real fun begins. So many people have no idea what the goalie attributes do or mean (me included for the most part) so we can actually glean some moderately interesting information from this. I know Save Percentage is an inherently biased stat, and tends to help overworked goalies feel better. But we don't have a lot to go on here so we have to make do with what we have. Right off the bat HS starts working its magic. In the next tier though there is a lot of competition with SC, RT, AG, RB and SZ are explaining 25-35% variance by themselves.

 

HS/SC +++
RT/SZ ++

RB/AG +

 

Goals Against Average

 

This one is somewhat similar to Save Percentage, but with somehow a little less certainty. Obviously our data set is rather small, and I had to take out bot goalies but I couldn't remove player backup goalies and still pretend any conclusions would be remotely valid. Goals Against Average is also largely not the goaltender's fault, usually just reflects how the team in front of them is. But I do have some conclusions here, that might mean something. Somewhat surprisingly it is the typical secondary attributes that dominate explaining the variance, but it isn't too hard to put together why. There isn't a lot of build variety when it comes to goaltenders, and it isn't a particularly large feat to hit very high values in all the core 4 stats. What then makes the difference is the secondary stats, that the highest TPE players can afford. Are AG and SZ suddenly super-stats? Probably not, but they definitely shouldn't be ignored.

 

AG/SZ/HS/RT +++

RB/SC ++

 

Wins

 

If you wanted more subjective stats that may or may not imply anything at all about a goaltender's talent then here you go.

 

AG, SZ, HS, RT +++
RB, SC ++


Conclusion

 

I managed to solve a lot less mysteries than I hoped to with this, but I figured I may as well share my rather mundane conclusions with you all.

- Puck Handling is super underrated.

- Skating is moderately overrated, I think so long as you have like 83-85+ , you should be fine, I have noticed that people not meeting that threshold do have uncharacteristically bad results.

- DEFENSE IS GOD (although more so on than one can really show using the scoreboard stats)

- Passing isn't bad, but building a 15+ gap between Scoring and Passing is an absolute must, or a 10-20 point gap the other way between Passing and Scoring, although I'd still recommend continuing to crank up the OV.
- Leadership unfortunately has no real measurable effect in this era. Now I must say there are so few people in the VHL with it increased that it could very well be useful but we don't know.

- Same with experience and Strength in my opinion, it is really hard to tell whether it is correlation or causation with these two attributes, since SS increases SC and the gap, and EX is gained by doing well. So of course both make you appear to be doing well.

- If you're going to increase Checking, just accept the PIMs.

- Agility and Size are potentially majorly underrated, perhaps even better than Rebounds.

- Hand Speed, Style Control and Reaction Time are a step ahead of Rebounds.

- Best way to look like a great goaltender is to play on a hot offensive team with mediocre defense, so you can win games, have the puck mostly in your opponent's net and still face a few shots.

 

That's honestly it. If you want to ask how certain attributes relate to stats let me know and I can run something for you, it doesn't take much effort now that I've set up the file nicely.

 

I may at some point suck it up and buy STHS and run some dummy data through it just to see what I can find with more normalized data, instead of the obviously biased data we have created. I thought about doing this for the VHLM as well, as that is likely to be far more normalized, but I imagine it will likely result in some very wacky conclusions due to the funky nature of the VHLM to begin with.

 

1800 words, but I'll be writing something else soon so I may or may not come back to this in a few weeks to claim!

 

Link to comment
Share on other sites

This is awesome!  I am always fascinated by data like this and running dummy sims would be super cool to see!  You should start your own dummy league if you do end up getting STHS.  I'd tune in for those games for sure!  You could base a whole podcast on your findings on a week by week basis if you wanted to make it your life lol

 

Meta King Jacob is on the horizon!

Link to comment
Share on other sites

Awesome read! As I said I would, here is my review! Fantastic information gather, but my poor plan of getting Discipline looks to come to a sad abrupt ending. DI is useless? :( I need to change my build again 🤣  Great use of titles and paragraph structure. Could have used more colour though. I'm still wondering how the +++ comes about, but that's a different story. Also was goaltending stats really relatable for this analysis as there was no change to goaltenders with the hybrid change? Or am I wrong? Regardless very good read. Looks like I need to get more Checking, I don't hit enough for a 6'8 d-man. 10/10

P.S Why do people like Centered writing so much? I personally find them a little bit harder to read.

Link to comment
Share on other sites

  • Commissioner
6 hours ago, jacobcarson877 said:

What is interesting to note, is that despite pre-hybrid testing, PA does in fact have somewhat of a minor positive correlation with assists.

If I recall we always found that correlation to exist. The issue was that it didn’t equate to more assists any more than adding more scoring did so it was more cost effective to ignore it.

Link to comment
Share on other sites

8 hours ago, Rhynex Entertainment said:

I'm still wondering how the +++ comes about, but that's a different story. Also was goaltending stats really relatable for this analysis as there was no change to goaltenders with the hybrid change? Or am I wrong?

I had originally written out the R2 values I got from the functions, but then I really didn't want to then explain that R2 doesn't actually tell you whether the relationship is positive or negative, simply how much of the variance the attribute can explain. So in the effort of saving space I just kinda loosely grouped attributes together.

The +++ were positive relationships that explained at least 30% variance, (some upwards of 45%)
++ was positive and roughly 20-30% variance
+ was anything else positive and above 10% variance
-/+ is anything  between like 7-10% variance

- was honestly anything above 10% variance that was negative (not much)

Secondly while goalie attributes haven't changed, that's correct, their stats have. Since we're not breaking the sim with mind-boggling offense, the goaltenders actually have a fighting chance now. I did get very similar trend to those in the past, but I think I got much more solid answers than those before, just due to the state of the current meta.

Link to comment
Share on other sites

  • 2 weeks later...

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...