The classification and clustering of bank telemarketing data using extreme gradient boost and k-prototype techniques

Researcher: Tselahale Serongwa, University of the Witwatersrand, Johannesburg
Supervisor: Dr Wilbert Chagwiza, University of the Witwatersrand, Johannesburg

The banking sector needs the ability to categorize the customer data they possess to enable business intelligence analytics and improve their marketing strategies. They require trivial automated models that yield interpretable results to comply with the financial regulations. An XGBoost model is built to determine the minimum number of attributes with the greatest impact in determining the potential of the 45211 customers to subscribe to a term deposit. The Synthetic Minority Over-sampling Technique was used to balance the dataset and eleven important attributes with 79% prediction power from 39 attributes. The model had an f1-score and testing accuracy of 93% whilst the model’s reliability was 86%. Four clusters were determined using the k-prototype clustering technique to group customers for tailored marketing strategies. It was determined that the bank had more chances of getting business from the 6859 customers clustered in the three most valuable clusters and should consider cheaper marketing options for the remainder of their customers.


Read More

Optimisation of hybrid neural network techniques used for stock market predictions

Researcher: Mohammad Rehman, University of the Witwatersrand, Johannesburg
Supervisor: Dr Wilbert Chagwiza, University of the Witwatersrand, Johannesburg

The combination of traditional technical and fundamental analysis techniques and machine learning techniques has become a common practice for informed stock price prediction.

The focus of this research was to stochastically optimise a long short term memory (LSTM) prediction model using a genetic algorithm (GA) and test its viability for predicting next day stock prices.

The hybrid GA optimised LSTM model created was able to achieve an RMSE of 247.30, an MAE of 190.22 and an MAPE of 1.52% indicating that a viable prediction model was constructed.


Read More

Rural Governance and Financial Inclusion: Does Rural Governance Matter in the Financial Inclusion in Developing Countries

Researcher: Takunda Pfigu, University of the Witwatersrand, Johannesburg
Supervisor: Dr Nyasha Mahonye, University of the Witwatersrand, Johannesburg

Financial Inclusion : Process that ensures the access to and usage of basic formal financial services for all. It is crucial for local economic development. Rural people, women and poor people are is proportionately unbanked.


Read More

Prediction of Lightning in South Africa using an LSTM Neural Network Model using Historical Lightning Data

Researcher: Yaseen Essa , University of the Witwatersrand, Johannesburg
Supervisors: Dr Ritesh Ajoodah and Dr Hugh Hunt , University of the Witwatersrand, Johannesburg

We evaluated the prediction ability of the Long-Short Term-Memory-Recurrent Neural Network (LSTM) model to predict short-term lightening flash densities within South Africa using historical lightening events. We predicted the lightening flash densities for one-hour periods for two areas within South Africa using data from the South African Lightening Detection Network. Models were trained using four years of data and predictions were made for every one-hour interval for one year. The models were tested repeatedly and cross-validated. We found a combined Mean Absolute Error of 2.87 lightening-flashes.hour and a combined Mean Squared Error of 1209. The model predicted 3 in 10 lightening events. We believe that LSTM models are useful tools to manage lightening risk.


Read More

Electoral Accountability in South Africa

Researcher: Leslie Dwolatzky, University of the Witwatersrand, Johannesburg
Supervisor: Prof. Rod Alence, University of the Witwatersrand, Johannesburg

This research project investigates the relationship between change in support for the ANC and the change in the provision of public services at the electoral ward level. The project replicates the study conducted by de Kadt and Lieberman (2017) and seeks to contribute further to their analysis. Contrary to expectations, there is a signi cant, negative relationship between service delivery and support for the ANC: The ANC is more likely to experience a decrease in support in electoral wards where it has done best at improving service provision.


Read More

Modelling Gold Production Using Sigmoid Models

Researcher: Ignitious Chauke, University of Venda
Supervisor: Dr Caston Sigauke, University of Venda

This study approximate monthly gold production from Sibanye-Stillwater South Africa (SA) gold operations based on five sigmoid models. The studied models were Gompertz, Gaussian, Probit and the Hill, which were used to forecast the future. Although all estimated five models offered a good realistic estimate, the Hill model was de-fined to better approximate the observed gold output pat-tern in Sibanye-Stillwater mines. The model has been chosen based on its high variance (R2) and the lowest error (RMSE) and information loss (AIC) value. The model indicated that the production of gold would be too low by 2035 given the current trend towards gold production persists in Sibanye-Stillwater (South Africa operation) mines continues. The Hill model findings were also backed ARIMA (0,1,2)(1,0,1)[4] model showed that the monthly gold production will continue to decrease until 2025.


Read More

Investigation of supervised learning methods to classify cell types from single-cell RNA sequencing data

Researcher: Warren Freeborough, University of the Witwatersrand, Johannesburg
Supervisors: Prof. Terence van Zyl, University of Johannesburg and Nikki Gentle, University of the Witwatersrand, Johannesburg

The study of living systems has prompted improvements in sequencing technology, which in turn has led to biological science entering the field of big data. To adequately study this single cell RNA sequencing (scRNA) data requires use of data scientific methods.

The aim of the study is to replicate the results produced by Grabski and Irizarry, using the same datasets, whilst exploring alternative supervised learning methods. In doing so, this study hopes to provide support for the models usage in scRNA classification or provide promising alternative to explore further.


Read More

Predicting Insurance Claims Fraud using Random Forest

Researcher: Khanya May, University of the Witwatersrand, Johannesburg
Supervisor: Dr Wilbert Chagwiza, University of the Witwatersrand, Johannesburg

Fraud presents a threat that has serious consequences in the insurance industry. In recent years the use of machine learning and analytical techniques for fraud detection has been a topic of several research projects. This research aims to explore the capabilities of random forest in fraud detection. The random forest model is applied to TSA insurance claims data in an effort to predict fraud. The model achieved an accuracy and F measure of 69.72% and 75.04%, respectively. The performance of the model is better than that of an unskilled model which would accurately predict 54.67%. The is a 76.09% chance that the model will be able to distinguish between fraudulent and legitimate claims.


Read More

Credit Card Default Payment Prediction using Random Forest

Researcher: Thanganedzo Beverly Mashamba, University Venda
Supervisor: Dr W Chagwiza, University of the Witwatersrand, Johannesburg

Lending money to customers, it is a good investment if the credit card is with a customer with good credit standing[1, 2]. Credit card applicants are usually subjected to thorough background checks before their applications are approved or declined. By doing so, banks ensure that their credit cards are issued only to clients with the proven ability to repay the credit. For banks to identify those customers who are not worthy of being given a credit card, they must have in place models that reliably predict any risky behavior on the part of those who apply for credit cards. This study uses the Random forest machine-learning algorithm to predict the customers who deserve to be given a credit cards.


Read More

Customer churn prediction in telecoms industry using random forest

Researcher: Fortune Mhlanga, University of the Witwatersrand, Johannesburg
Supervisor: Dr Wilbert Chagwiza, University of the Witwatersrand, Johannesburg

The telecommunications industry lose customers to their competitors daily in the world. The churning of customers always leads to reduced revenues. The need to create predictive models that would predict customers that are likely to churn will significantly increase the revenue of the industry.

This study seeks to develop a customer churn prediction model using the random forest algorithm. The performance of the model is measured using statistical measures such as accuracy, precision, recall, f-measure, Cohen’s kappa, Gini coefficient and Matthew’s correlation coefficient. This study explores the SK-learn default filter method, the step forward feature selection, the step backward feature selection, and the exhaustive feature selection, to find features that can be used as inputs of the model. This study shows that the random forest algorithm is not the best at predicting customer churn for the telecommunications industry, since the best performing model has a recall metric value of 52.2%.


Read More