League of Legends CSPM Predictor

6 min readNov 23, 2020

League of Legends creeps fighting in lane. Source:https://notagamer.net/

Summary

This article is about how to implement a Creep Score Per Minute predicter that estimates the number average creep score in a league of legends match given 4 parameters (CS%P15, GOLD%, DMG%, DPM) using a hand made dataset of professional games in 5 major regions.

Keywords: Linear Regression, League of Legends, Data Mining, Python.

Introduction

League of Legends is a team-based strategy game where two teams of five powerful champions face off to destroy the other’s base. You can choose 140 champions to make epic plays, secure kills, and take down towers as you battle your way to victory.[1]

LoL has multiples game mechanics, one of the most important of this is Csing. In the game, you need to obtain gold in order to become powerful, in order to reach this goal you have to do two basic things, Kill and Farm, but we are going to focus in the second one. Farming involves Csing, you basically kill creeps in order to get gold, and with gold you become powerful, simple as that. Not everything in life is easy, so while trying Csing you will face multiple scenarios in which it will be either hard to do it or not.

Csing wins games

It is well known in LoL, that if you have a high CSPM, you are a good player, and it is just because it is the core of all LoL gameplay, and it has vital importance in game, that is way I decided to focus on, to help players know what a good Csing is comparing it with professional level.

Irelia farming creeps. Source: https://aminoapps.com/

Let’s get more technical

Multiple linear regression (MLR), also known simply as multiple regression, is a statistical technique that uses several explanatory variables to predict the outcome of a response variable.[2]

We basically trying what is previously mentioned, predict the CSPM(Outcome) using (CS%P15, GOLD%, DMG%, DPM), but what are those?

CS%P15 — Tells us the average Share of team’s total CS

GOLD% — Tells us the average share of team’s total gold earned

DMG% — Average share of team’s total damage dealed to champions

DPM — Average damage to champions per minute

Those are our model X’s, but we are going to talk about this later.

The multiple linear regression formula is given by:

MLR Formula. Source: https://corporatefinanceinstitute.com/

Where:

yi is the dependent or predicted variable
β0 is the y-intercept, i.e., the value of y when both xi and x2 are 0.
β1 and β2 are the regression coefficients that represent the change in y relative to a one-unit change in xi1 and xi2, respectively.
βp is the slope coefficient for each independent variable
ϵ is the model’s random error (residual) term.[3]

In our case our model is given by this formula:

CSPM Model.

But, how we reached this?

Exploratory data analisys

There is no magic in engineering, to have the 4 “X’s” of our model we did an EDA, in order to summarize the main characteristics of our data and also to know the correlation of our “X’s” and our “y”, but how you do that?

Boom, we simply plot the correlation of our data set in order to see what “X’s” are relevant to our “y”, and we can see that the 4 highest values of the CSPM row are our model “X’s”

Having this information, we now need to know how our current model fit our data.

R-Squared(R²) , Root Mean Square Error(RMSE) and Cross-Validation.

R-squared is a goodness-of-fit measure for linear regression models. This statistic indicates the percentage of the variance in the dependent variable that the independent variables explain collectively.[4]

The RMSE is the square root of the variance of the residuals. It indicates the absolute fit of the model to the data–how close the observed data points are to the model’s predicted values.[5]

Cross-validation is a technique for evaluating ML models by training several ML models on subsets of the available input data and evaluating them on the complementary subset of the data. Use cross-validation to detect overfitting, ie, failing to generalize a pattern.[6]

We use this three to be sure that our model is the correct one for our data and to know if our model is performing in a good way.

Results

After running our model and testing with our dataset, the final results are this:

Our RMSE , R² and Mean Cross Validation values are:

RMSE — 0.6050913077558205

R² — 0.5464690260259888

MCV — 0.59

Which tells us that our model performance is not that bad and has learnt from our data.

Now it is time to test our model with different data manually.

Testing

For these purposes we will use 4 data cells from a 2020 CBLOL dataset of champions, specifically Ashe Row, which had (0.30 CS%P15, 0.25 GOLD%, 0.22 DMG%, 459 DMG). The expected value is 8.9.

CBLOL 2020 Champion Dataset. Source: https://oracleselixir.com/

After running our model, the final results are:

Which tell us that our model, is in a good spot regarding its training.

Conclusions

Based on the results obtained, the goal of the project was reached.

Even thought the model has succesfully predicted the CSPM, it can have multiple change and upgrades for better performance, such as testing with other linear models.

Biblography

[1]How to Play — League of Legends. [Online]. Available: https://na.leagueoflegends.com/en-us/how-to-play/. [Accessed: 23-Nov-2020].

[2]W. Kenton, “How Multiple Linear Regression Works,” Investopedia, 21-Sep-2020. [Online]. Available: https://www.investopedia.com/terms/m/mlr.asp. [Accessed: 23-Nov-2020].

[3]“Multiple Linear Regression — Overview, Formula, How It Works,” Corporate Finance Institute, 02-Jun-2020. [Online]. Available: https://corporatefinanceinstitute.com/resources/knowledge/other/multiple-linear-regression/. [Accessed: 23-Nov-2020].

[4]J. Frost, Lamessa, Laurie, B. P. Mondal, Katja, J. H. Lee, Renu, Daren, Hamster, M. Sethi, S. Made, K. Oser, Josh, T. Lee, G. Love, A. Singh, Gui, Guilherme, O. Ojuri, Hitesh, L. romen, Jeff, Badr, Thomas, Dana, Takunda, Jim, Greg, S. Hussain, Angie, Charles, Luyando, Alexandros, K. Meekaewnoi, N. murthy, Nic, M. Hartley, Don, A. verma, Kamala, Q. Khachoo, A. Gupta, Miteya, and D. Dubey, “How To Interpret R-squared in Regression Analysis,” Statistics By Jim, 03-Nov-2020. [Online]. Available: https://statisticsbyjim.com/regression/interpret-r-squared-regression/. [Accessed: 23-Nov-2020].

[5]J. Moody, “What does RMSE really mean?,” Medium, 06-Sep-2019. [Online]. Available: https://towardsdatascience.com/what-does-rmse-really-mean-806b65f2e48e. [Accessed: 23-Nov-2020].

[6]T. M. Mitchell, “Machine learning,” Amazon, 2017. [Online]. Available: https://docs.aws.amazon.com/machine-learning/latest/dg/cross-validation.html. [Accessed: 23-Nov-2020].