Abstract:
Tax is the backbone of the country to provide public service and build infrastructure for
the people in need. But most taxpayers in Ethiopia have a failure of willingness to pay
expected tax for the tax authority. Since most taxpayers are not compliant, in order to
investigate taxpayers' compliance level, the Ethiopian Ministry of Revenues (MOR) has
developed a criterion for measuring taxpayers' compliance level by analyzing the
taxpayers' data in Jun 2019. The main goal of this taxpayer‟s compliance level
classification is that, recognizing and awarding the status of high-compliance taxpayers
while influencing or mitigating fraud of tax from low-compliance level taxpayers. Since
this new taxpayers‟ compliance level classification method is developed by MOR in 2019
for the first time, existing studies doesn‟t address. However, even if identifying taxpayers'
compliance level is good for the next step to take action, the taxpayer compliance level
identification criteria are complex and broad, and it can be tedious and erroneous to
analyze a large number of taxpayers‟ data without the support of advance computing
technology. To assist manpower experts and to help mitigate this problem, an assistive
model that can predict taxpayers' compliance level from the provided data should be
developed. The aim of this study is to create an assistive model using machine learning
techniques that can learn from training data and that can predict taxpayer compliance
levels to assist the ministry of revenue experts more efficiently and effectively so that it
helps to mitigating tax evasion. This study is conducted using experimental design process
approach. In this research we used machine learning approaches to construct a taxpayer
compliance level prediction model using the Anaconda Python programming environment
and the Jupyter IDE. As the result, we have tested four classification algorithms and we
found accuracies 96.74%, 96.09, 99.67% and 79.15% for Random forest, K-Nearest
Neighbour, Support Vector Machine, and Naive Bayes classifiers respectively. And we
have found Support Vector Machine best performing algorithm with an accuracy of
99.67% after SMOTE.
Keywords: Compliance Level, Machine learning, SMOTE, Taxpayers