Jump to content

Using the Power of Statistics to Have a 22% Chance at Predicting How Many Points You'll Get


Nykonax

Recommended Posts

Hi All,

I have nothing better to do so I'm back with another statistical analyses. I have come a long way since my last one using basic bitch Z-scores, taking 2 more whole statistic classes in University since then! So I figured what better to do than to procrastinate real work and reuse code from my most recently submitted university assignment and apply it to the VHL. This time we are using decision trees!

Decision trees can be thought of as statistical flowcharts, and are usually used for classification problems (example: here is some information about a mushroom, based on all this previous information about mushrooms, is this one going to be poisonous?), but it can also be used for regression predictions. Essentially they look at a bunch of previous data and it's outcome, and then decides the most important parameters and uses those in a flowchart to determine the expected outcome.

In our case, I have scraped the last 6 seasons of VHL Hybrid player attributes (STHS ones from Portal) and the stats from that season, and will be using it to build essentially a flowchart to predict how many points players will get.

Anyways, here's the flowchart. Sorry darkmode users

5e4f0c8b1671a40864ecafc3b58de57e.png

This can be interpreted like a flowchart. Look at the question, if the answer is yes, go left, if the answer is no, go right. The top number in the circle is the expected number of points, the bottom number is the % of people in the category. For example, if you have a PH > 82, ST < 80, and SK > 80, you would expect to score 67 points (right, left, right). It's interesting to see what variables the model sees as important cutoff points, DF and SC don't appear at all in here, whereas its only PH/SK/ST. My theory on this is that these attributes aren't really tied to anything and have their own ratios, allowing you to increase them much higher, making your STHS attributes higher relative to your TPE compared to someone upgrading DF and SC.

Now is this accurate? Eh, kind of, depends how much of a margin of error you're willing to accept.

9c64cdc93497dcb903d04434898a5489.png
Here's a histogram of the differences of the actual points from the prediction. Each bar represents 5 points. So for example, about 40 predictions were 5 under, while >60 were 5 over. If you want exact numbers, 22% of predictions were within 5, 43% within 10, and 74% within 20. Which honestly I think isn't bad, considering the variability of STHS. Especially at the higher ends where you're predicted a max of 86 points but still end up in the 100's or 120's.

We can run the same analyses on goals and assists:
Goals:
3b44ffd1ec1eb9e2809c72de2c2d2f1e.png39b9d1fb7fe9e3e87727a9187fe03b37.png


27% of predictions within 3, 43% within 5, 72% within 10

Assists:
62470447c791f700ab85e3acf8783402.png281f32a3c259823fdbfcf9b7ddbbacdf.png
22% within 3, 36% within 5, 59% within 10

Conclusion: I can make a pretty bad prediction of how many points you'll score. But if you're willing to accept a margin of error of 40 points, I have a 98.5% chance of predicting how many points you'll get successfully. For real though, I think it's interesting to see what variables the tree chooses as important, as traditional attributes aren't really present, as it puts more emphasis on PH, SK, and ST. My theory on this is that these attributes can be upgraded with good ratios compared to SC and DF, which means if you're spending your TPE on these while someone is spending there's on SC and DF, you'll have a higher PH/SK compared to their DF/SC, which means you have more effective STHS attributes for your TPE. I think this can give some interesting insight to build paths and advice, as it may be beneficial to get as much STHS attributes as possible, even if they aren't the optimal ones, as you'll just end up statchecking other players.

I'd like to follow this up with VHLM analysis, but not sure how reliable that will be considering the more changing nature of player attributes during the season. Also if anyone is wondering why I didn't just use regression, it's cause these are cooler. (And I can't write as much on regression). But here's a regression model anyways

4a91c2d75fe3492ca3f961ca3e21923b.png
Adjusted R-squared = 0.5369

Also if anyone wants the dataset let me know and I can send you it

Edited by Nykonax
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...