Predicting Insurance Claims Fraud using Random Forest

Researcher: Khanya May, University of the Witwatersrand, Johannesburg
Supervisor: Dr Wilbert Chagwiza, University of the Witwatersrand, Johannesburg

Fraud presents a threat that has serious consequences in the insurance industry. In recent years the use of machine learning and analytical techniques for fraud detection has been a topic of several research projects. This research aims to explore the capabilities of random forest in fraud detection. The random forest model is applied to TSA insurance claims data in an effort to predict fraud. The model achieved an accuracy and F measure of 69.72% and 75.04%, respectively. The performance of the model is better than that of an unskilled model which would accurately predict 54.67%. The is a 76.09% chance that the model will be able to distinguish between fraudulent and legitimate claims.


Read More

Credit Card Default Payment Prediction using Random Forest

Researcher: Thanganedzo Beverly Mashamba, University Venda
Supervisor: Dr W Chagwiza, University of the Witwatersrand, Johannesburg

Lending money to customers, it is a good investment if the credit card is with a customer with good credit standing[1, 2]. Credit card applicants are usually subjected to thorough background checks before their applications are approved or declined. By doing so, banks ensure that their credit cards are issued only to clients with the proven ability to repay the credit. For banks to identify those customers who are not worthy of being given a credit card, they must have in place models that reliably predict any risky behavior on the part of those who apply for credit cards. This study uses the Random forest machine-learning algorithm to predict the customers who deserve to be given a credit cards.


Read More

Customer churn prediction in telecoms industry using random forest

Researcher: Fortune Mhlanga, University of the Witwatersrand, Johannesburg
Supervisor: Dr Wilbert Chagwiza, University of the Witwatersrand, Johannesburg

The telecommunications industry lose customers to their competitors daily in the world. The churning of customers always leads to reduced revenues. The need to create predictive models that would predict customers that are likely to churn will significantly increase the revenue of the industry.

This study seeks to develop a customer churn prediction model using the random forest algorithm. The performance of the model is measured using statistical measures such as accuracy, precision, recall, f-measure, Cohen’s kappa, Gini coefficient and Matthew’s correlation coefficient. This study explores the SK-learn default filter method, the step forward feature selection, the step backward feature selection, and the exhaustive feature selection, to find features that can be used as inputs of the model. This study shows that the random forest algorithm is not the best at predicting customer churn for the telecommunications industry, since the best performing model has a recall metric value of 52.2%.


Read More

Event classification for gamma-hadron separation for H.E.S.S

Researcher: Wandile Lesejane, University of the Witwatersrand, Johannesburg
Supervisor: Prof Nukri Komin, University of the Witwatersrand, Johannesburg

The H.E.S.S is one of the best IACTs and is crucial for studying cosmic particles, particularly gamma induced particles. It is able todetect particles with energies ranging from tens of GeV to TeV. The challenge stems from the influx of the hadronic air showers which are more common and can obscure the detection of gamma particles. Deep Neural Networks were employed to discriminate the gamma events from hadron events using data that was simulated using KASKADE and SMASH softwares. The model had a performance accuracy of 97.42% and a loss of 7.21% at its best and an accuracy of 53% at its poorest.


Read More

Attention-based LSTM algorithm with ARIMA on wavelet denoised Bitcoin prices

Researcher: Ndamulelo Innocent Nelwamondo, University of the Witwatersrand, Johannesburg
Supervisor: Dr Farai Mlambo, University of the Witwatersrand, Johannesburg

The cryptocurrency market is recognized for its intense uncertainty and instability, and people are still searching for a reliable and convenient way to direct cryptocurrency trading. An overall of 4 models, namely ARIMA, LSTM, attention-based LSTM, and hybrid attention based LSTM-ARIMA were used to forecast the prices Bitcoin in which the hybrid attention-based LSTM-ARIMA model on wavelet denoised Bitcoin prices with MSE = 42816.51, RMSE = 206.92, MAE = 167.34 and R2 = 0.8172 was found to be the best fitting model.


Read More

Surveying informal settlements in the Gauteng province using Machine Learning

Researcher: Ronewa Nemalili, University of the Witwatersrand, Johannesburg
Supervisor: Professor Bruce Mellado, University of the Witwatersrand, Johannesburg

One of the good ways in which the government can provide better services to its people is through surveying informal settlements, which is why an improvement is always needed on the techniques of surveying them. Fortunately in this project through DSI-NICIS NEPTTP funding the classification of informal settlements were improved through surveying all kinds of dwellings instead of only focusing the areas which were known to be consisted of informal settlements.


Read More

The Relationship Between Climate Change and Human Fertility

Researcher: Ayrton Altorio, University of the Witwatersrand, Johannesburg
Supervisor: Prof. Nicole de Wet-Billings, University of the Witwatersrand, Johannesburg

Climate change, as we know it, has been a key point of academic discussion for many decades. Throughout these years there has been scarce research interest directed towards the relationship between climate change and human fertility outcomes. This study aims to investigate the relationship between climate change and human fertility by employing Poisson regression analyses to longitudinal cross national data for 109 countries over the fifteen year time period spanning 2000 to 2015. Our dependent variable is Total Fertility Rate. The two main independent variables are associated with climate change and measure (1) annual precipitation levels and (2) percentage of arable land. The control variables span a range of demographic indicators that are known to be accurate predictors of TFR, such as GDP per capita. The results of our Poisson regression models indicate that both of our climate change indicators are significant in predicting change in TFR, with changes in arable land having the largest estimate of all our predictor variables.


Read More

The prevalence and probability of hypertension among youth 15-34 years old, in South Africa

Researcher: Lucas Banda, University of the Witwatersrand, Johannesburg
Supervisor: Prof. Nicole de Wet-Billings, University of the Witwatersrand, Johannesburg

There is limited and not so latest information from nationally representative data that exist about the prevalence of hypertension among the youth 15-34 ‘years olds’ in South Africa, though the prevalence is over 30% among the adult population 15+ (Seedat, Rayner, and Veriava, 2014). Youth in South Africa are faced with challenges such as unemployment and environments that expose them to poor diet habits, to alcohol and sub-stance abuse (Peltzer and Phaswana-Mafuya, 2013). Thus, understanding the probability of hypertension among the youth, 15-34 years in South Africa, conditional to demographic, socioeconomic and behavioral factors is critical.


Read More

How adequate is access to Ante-Natal Care for South African women in public hospitals

Researcher: Babalwa N.C. Dingiswayo, University of the Witwatersrand, Johannesburg
Supervisors: Mr Michael Jana and Prof. Rod Alence, University of the Witwatersrand, Johannesburg

The aim of this research is to assess how adequate is access to antenatal care (ANC) for South African women, particularly the provision of ANC with South African women living with HIV, in public hospitals. The objective of the study would be to define the relationship ANC in public health care in SA as well testing of HIV as a part of ANC to pregnant patients.

  • Assessing the adequacy of ANC according to the World Health Organisation (WHO) standards.
  • How accessible is ANC in public health care for South African women?
  • What is the relationship between selected socio-economic factors, HIV status and ANC attendance by South African women?


Read More

Describing the magnitude spectrum with symbolic notation in musical chords

Researcher: Ruan Jean du Randt, University of Pretoria
Supervisor: Dr. Ritesh Ajoodha, University of the Witwatersrand, Johannesburg

Mapping chords from the magnitude spectrum to symbolic-notation makes way for a plethora of advances in algorithmic music. This will allow chord in musical accompaniment to be recognised and described in a symbolic way. It also makes great advances toward algorithmic music composition. This research shows various methods to map the magnitude spectrum to symbolic-notation using algorithmic chord recognition.

This research also evaluates the importance of various features within the magnitude spectrum when considering algorithmic chord recognition. The results show that Mel Frequency Cepstral Coefficients are the most important features, and that the Fuzzy Lattice Reasoning classifier obtain the highest accuracy with 99.0417%. This provides an effective way of mapping chords from the magnitude spectrum to symbolic notation and gives a foundation for future research.


Read More