Abstract:
Athletics is a popular sport that includes a variety of events such as running, jumping, and
throwing etc. Marathon running is one of popular sport all over the world which includes
standardized distance of 42.195 kilometers (26.2 miles). It is a symbol of endurance and
human achievement, drawing participants from elite athletes to recreational runners.
Though, Ethiopia has a rich history of success in long-distance running, but recent concerns
have emerged about a slight decline in marathon performance. Previous studies on
predicting marathon performance using artificial intelligence have faced several
limitations. These include a narrow set of variables, small sample sizes, and a lack of
consideration for individual runners' specific physiological, training, and environmental
variables. This study aims to develop an explainable machine learning model to predict the
performance levels of Ethiopian marathon runners with 2,316 time series data from the
Ethiopian Athletics Federation recorded in the year 2011 to 2015 e.c. The recorded data
are preprocessed and the features are selected using the XGBoost feature importance score.
The prepared data feed to machine learning models, which are K-nearest Neighbor (KNN)
, Artificial Neural Network (ANN), Random Forest (RF), Decision Trees (DT), and
XGBoost to compare and select the optimum one. The developed models were evaluated
with accuracy, precision, recall, F1 score, and ROC AUC curve. XGBoost showed the best
performance before applying SMOTE. After applying SMOTE, XGBoost continued to
outperform other models with an accuracy of 99%. LIME and SHAP techniques are applied
for the model understandability. The final model provides a reliable tool for predicting
marathon performance, which can help athletes, coaches, and sports scientists optimize
training strategies and performance analysis. This research aims to enhance the
competitiveness and national pride of Ethiopian marathon runners by addressing previous
study limitations and incorporating a larger and more detailed dataset.
Keywords: - Marathon, Endurance running, Machine Learning, physiological factors,
training factors and environmental factor