Abstract:
Software maintenance costs can represent up to 67% of total expenses in the Software Development Life Cycle (SDLC), often surpassing 50% across all phases. This highlights the importance of accurate cost estimation for effective project planning and resource management. Traditional approaches, such as expert judgment and algorithmic models, often fail to address the complexities arising from changing requirements, aging codebases, and evolving technologies. This research aims to create a machine learning-based model for more reliable software maintenance cost estimation, utilizing data gathered from domestic software companies through interviews with project managers and analysis of archival project records. Factors affecting costs were identified through a literature review and refined via interviews. Various machine learning algorithms, including Linear Regression, Ridge Regression, Decision Trees, Random Forest, Support Vector Regression (SVR), Gradient Boosting Regressor, and XGBoost, were assessed, with XGBoost emerging as the most effective. It recorded the lowest error rates (MAE: 437.51, RMSE: 620.05) and the highest R² (0.95). To enhance transparency and build stakeholder trust, SHAP and LIME were used to explain the model's predictions. The research concludes with the development and integration of the XGBoost model for practical software maintenance cost prediction, thereby improving accuracy in cost estimation and resource management for software projects.
Keywords: Software Maintenance Cost Estimation, Machine Learning, XGBoost, SHAP and LIME