Abstract
One of the significant challenges in the sports industry is identifying the factors influencing match results and their respective weightage. For appropriate recommendations to the team management and the team players, there is a need to predict the match and quantify the important factors for which prediction models need to be developed. The second thing required is identifying talented and emerging players and performing an associative analysis of the important factors to the match-winning outcome. This paper formulates a hybrid machine learning-clustering-associative rules model. This paper also implements the framework for cricket matches, one of the most popular sports globally watched by billions around the world. We predict the match outcome for One day Internationals (ODIs) and Twenty 20 s (T20s) (two formats of Cricket representing fifty over and twenty over versions respectively) adopting state-of-the-art machine learning algorithms, Random Forest, Gradient Boosting, and Deep neural networks. The variable importance is computed using machine-learning techniques and further statistically validated through the regression model. The emerging talented players are identified by clustering. Association rules are generated for determining the best possible winning outcome. The results show that environmental conditions are equally crucial for determining a match result, as are internal quantitative factors. The model is thus helpful for both team management and for players to improve their winning strategy and also for discovering emerging players to form an unbeatable team.
Similar content being viewed by others
Notes
Retrieved from: https://stats.espncricinfo.com/ci/engine/stats/index.html.
References
Abadie, A., Athey, S., Imbens, G. W., & Wooldridge, J. M. (2020). Sampling-based versus design-based uncertainty in regression analysis. Econometrica, 88(1), 265–296.
Adam, E., Mutanga, O., Abdel-Rahman, E. M., & Ismail, R. (2014). Estimating standing biomass in papyrus (Cyperus papyrus L) swamp: Exploratory of in situ hyper-spectral indices and random forest regression. International Journal of Remote Sensing, 35(2), 693–714.
Bendazzoli, S., Brusini, I., Damberg, P., Smedby, Ö., Andersson, L., & Wang, C. (2019). Automatic rat brain segmentation from MRI using statistical shape models and random forest. In Medical Imaging 2019: Image Processing (Vol. 10949, p. 109492O). International Society for Optics and Photonics.
Bose, A., Mitra, S., Ghosh, S., Ghosh, R., Patra, T., & Chakrabarti, S. (2021). Unsupervised learning based evaluation of player performances. Innovations in Systems and Software Engineering, 17(2), 121–130.
Bliss, A., Ahmun, R., Jowitt, H., Scott, P., Jones, T. W., & Tallent, J. (2021). Variability and physical demands of international seam bowlers in one-day and Twenty20 international matches across five years. Journal of Science and Medicine in Sport, 24(5), 505–510.
Cappelli, C., Di Iorio, F., Maddaloni, A., & D’Urso, P. (2019). Atheoretical regression trees for classifying risky financial institutions. Annals of Operations Research, 1–21.
Cea, S., Durán, G., Guajardo, M., Sauré, D., Siebert, J., & Zamorano, G. (2020). An analytics approach to the FIFA ranking procedure and the World Cup final draw. Annals of Operations Research, 286(1), 119–146.
Chauhan, S., Pande, R., & Sharma, S. (2020). The causal relationship between Indian energy consumption and the GDP: A shift from conservation to feedback hypothesis post economic liberalisation. Theoretical & Applied Economics, 27(3), 203–212.
D’Urso, P., De Giovanni, L., & Massari, R. (2019). Trimmed fuzzy clustering of financial time series based on dynamic time warping. Annals of Operations Research, 1–17.
D’Urso, P., De Giovanni, L., Massari, R., D’Ecclesia, R. L., & Maharaj, E. A. (2020). Cepstral-based clustering of financial time series. Expert Systems with Applications, 161, 113705.
D’Urso, P., De Giovanni, L., & Vitale, V. (2021). Spatial robust fuzzy clustering of COVID 19 time series based on B-splines. Spatial Statistics, 100518.
Deval, G., Hamid, F., & Goel, M. (2021). When to declare the third innings of a test cricket match?. Annals of Operations Research, 1–19.
de Zepeda, M. V. N., Meng, F., Su, J., Zeng, X. J., & Wang, Q. (2021). Dynamic clustering analysis for driving styles identification. Engineering Applications of Artificial Intelligence, 97, 104096.
Goossens, D. R., Beliën, J., & Spieksma, F. C. (2012). Comparing league formats with respect to match importance in Belgian football. Annals of Operations Research, 194(1), 223–240.
Hubáček, O., Šourek, G., & Železný, F. (2019). Learning to predict soccer results from relational data with gradient boosted trees. Machine Learning, 108(1), 29–47.
Huang, J., Tan, J., & Hua, D. (2021). Data mining of association between hyperuricemia and common chronic diseases based on evolutionary apriori algorithm (EAA). In 2021 IEEE 6th International Conference on Cloud Computing and Big Data Analytics (ICCCBDA) (pp. 73–77). IEEE.
Jain, P. K., Quamer, W., & Pamula, R. (2021). Sports result prediction using data mining techniques in comparison with base line model. Opsearch, 58(1), 54–70.
Jiang, Y., & Chen, N. C. (2019). Event attendance motives, host city evaluation, and behavioral intentions. International Journal of Contemporary Hospitality Management.
Kamath, G. B., Ganguli, S., & George, S. (2020). Attachment points, team identification and sponsorship outcomes: evidence from the Indian Premier League. International Journal of Sports Marketing and Sponsorship.
Kamble, R. R. (2021). Cricket score prediction using machine learning. Turkish Journal of Computer and Mathematics Education (TURCOMAT), 12(1S), 23–28.
Kong, Y. S., Abdullah, S., Schramm, D., Omar, M. Z., & Haris, S. M. (2019). Development of multiple linear regression-based models for fatigue life evaluation of automotive coil springs. Mechanical Systems and Signal Processing, 118, 675–695.
Lumbantobing, I. P., Sulivyo, L., Sukmayuda, D. N., & Riski, A. D. (2020). The effect of debt to asset ratio and debt to equity ratio on return on assets in hotel, restaurant, and tourism sub sectors listed on Indonesia stock exchange for the 2014–2018 period. International Journal of Multicultural and Multireligious Understanding, 7(9), 176–186.
Loureiro, A. L., Miguéis, V. L., & da Silva, L. F. (2018). Exploring the use of deep neural networks for sales forecasting in fashion retail. Decision Support Systems, 114, 81–93.
Mondal, S., Plumley, D., & Wilson, R. (2021). The evolution of competitive balance in men’s international Cricket. Managing Sport and Leisure, 1–20.
Nikolaidis, Y. (2015). Building a basketball game strategy through statistical analysis of data. Annals of Operations Research, 227(1), 137–159.
Reyers, M., & Swartz, T. B. (2021). Quarterback evaluation in the national football league using tracking data. AStA Advances in Statistical Analysis, 1–16.
Saha, D., (2020). 10 Reasons why cricket is the most famous sport In India. Retrieved from: https://sportzwiki.com/cricket/why-cricket-most-famous-sport-india
Sahu, A. (2021). Predictive analysis of cricket. Turkish Journal of Computer and Mathematics Education (TURCOMAT), 12(6), 5111–5124.
Schneider, M. J., & Sachin, G. (2016). Forecasting sales of new and existing products using consumer reviews: A random projections approach. International Journal of Forecasting, 32(2), 243–256.
Stern, S. E. (2016). The Duckworth-Lewis-Stern method: Extending the Duckworth-Lewis methodology to deal with modern scoring rates. Journal of the Operational Research Society, 67(12), 1469–1480.
Thomson, J., Perera, H., & Swartz, T. B. (2021). Contextual batting and bowling in limited overs Cricket. South African Statistical Journal, 55(1), 73–86.
Thorley, J. (2021). Age-related changes in the performance of bowlers in Test match cricket. International Journal of Sports Science & Coaching, 17479541211001726.
Vörösmarty, G., & Dobos, I. (2020). Green purchasing frameworks considering firm size: A multicollinearity analysis using variance inflation factor. Supply Chain Forum: an International Journal, 21(4), 290–301.
Weeraddana, N., & Premaratne, S. (2021). Unique approach for cricket match outcome prediction using Xgboost algorithms. Journal of Theoretical and Applied Information Technology, 99(9), 2162–2173.
Xia, H., Yang, Y., Pan, X., Zhang, Z., & An, W. (2019). Sentiment analysis for online reviews using conditional random fields and support vector machines. Electronic Commerce Research, 1–18.
Zhang, B., Guan, X., & Zhang, Q. (2020). Inverse optimal value problem on minimum spanning tree under unit l∞ norm. Optimization Letters, 14(8), 2301–2322.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Srivastava, P.R., Eachempati, P., Kumar, A. et al. Best strategy to win a match: an analytical approach using hybrid machine learning-clustering-association rule framework. Ann Oper Res 325, 319–361 (2023). https://doi.org/10.1007/s10479-022-04541-6
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10479-022-04541-6