Modelling Gold Production Using Sigmoid Models

Researcher: Ignitious Chauke, University of Venda
Supervisor: Dr Caston Sigauke, University of Venda

This study approximate monthly gold production from Sibanye-Stillwater South Africa (SA) gold operations based on five sigmoid models. The studied models were Gompertz, Gaussian, Probit and the Hill, which were used to forecast the future. Although all estimated five models offered a good realistic estimate, the Hill model was de-fined to better approximate the observed gold output pat-tern in Sibanye-Stillwater mines. The model has been chosen based on its high variance (R2) and the lowest error (RMSE) and information loss (AIC) value. The model indicated that the production of gold would be too low by 2035 given the current trend towards gold production persists in Sibanye-Stillwater (South Africa operation) mines continues. The Hill model findings were also backed ARIMA (0,1,2)(1,0,1)[4] model showed that the monthly gold production will continue to decrease until 2025.


Read More

Investigation of supervised learning methods to classify cell types from single-cell RNA sequencing data

Researcher: Warren Freeborough, University of the Witwatersrand, Johannesburg
Supervisors: Prof. Terence van Zyl, University of Johannesburg and Nikki Gentle, University of the Witwatersrand, Johannesburg

The study of living systems has prompted improvements in sequencing technology, which in turn has led to biological science entering the field of big data. To adequately study this single cell RNA sequencing (scRNA) data requires use of data scientific methods.

The aim of the study is to replicate the results produced by Grabski and Irizarry, using the same datasets, whilst exploring alternative supervised learning methods. In doing so, this study hopes to provide support for the models usage in scRNA classification or provide promising alternative to explore further.


Read More

Predicting Insurance Claims Fraud using Random Forest

Researcher: Khanya May, University of the Witwatersrand, Johannesburg
Supervisor: Dr Wilbert Chagwiza, University of the Witwatersrand, Johannesburg

Fraud presents a threat that has serious consequences in the insurance industry. In recent years the use of machine learning and analytical techniques for fraud detection has been a topic of several research projects. This research aims to explore the capabilities of random forest in fraud detection. The random forest model is applied to TSA insurance claims data in an effort to predict fraud. The model achieved an accuracy and F measure of 69.72% and 75.04%, respectively. The performance of the model is better than that of an unskilled model which would accurately predict 54.67%. The is a 76.09% chance that the model will be able to distinguish between fraudulent and legitimate claims.


Read More

Credit Card Default Payment Prediction using Random Forest

Researcher: Thanganedzo Beverly Mashamba, University Venda
Supervisor: Dr W Chagwiza, University of the Witwatersrand, Johannesburg

Lending money to customers, it is a good investment if the credit card is with a customer with good credit standing[1, 2]. Credit card applicants are usually subjected to thorough background checks before their applications are approved or declined. By doing so, banks ensure that their credit cards are issued only to clients with the proven ability to repay the credit. For banks to identify those customers who are not worthy of being given a credit card, they must have in place models that reliably predict any risky behavior on the part of those who apply for credit cards. This study uses the Random forest machine-learning algorithm to predict the customers who deserve to be given a credit cards.


Read More

Customer churn prediction in telecoms industry using random forest

Researcher: Fortune Mhlanga, University of the Witwatersrand, Johannesburg
Supervisor: Dr Wilbert Chagwiza, University of the Witwatersrand, Johannesburg

The telecommunications industry lose customers to their competitors daily in the world. The churning of customers always leads to reduced revenues. The need to create predictive models that would predict customers that are likely to churn will significantly increase the revenue of the industry.

This study seeks to develop a customer churn prediction model using the random forest algorithm. The performance of the model is measured using statistical measures such as accuracy, precision, recall, f-measure, Cohen’s kappa, Gini coefficient and Matthew’s correlation coefficient. This study explores the SK-learn default filter method, the step forward feature selection, the step backward feature selection, and the exhaustive feature selection, to find features that can be used as inputs of the model. This study shows that the random forest algorithm is not the best at predicting customer churn for the telecommunications industry, since the best performing model has a recall metric value of 52.2%.


Read More

Event classification for gamma-hadron separation for H.E.S.S

Researcher: Wandile Lesejane, University of the Witwatersrand, Johannesburg
Supervisor: Prof Nukri Komin, University of the Witwatersrand, Johannesburg

The H.E.S.S is one of the best IACTs and is crucial for studying cosmic particles, particularly gamma induced particles. It is able todetect particles with energies ranging from tens of GeV to TeV. The challenge stems from the influx of the hadronic air showers which are more common and can obscure the detection of gamma particles. Deep Neural Networks were employed to discriminate the gamma events from hadron events using data that was simulated using KASKADE and SMASH softwares. The model had a performance accuracy of 97.42% and a loss of 7.21% at its best and an accuracy of 53% at its poorest.


Read More

Attention-based LSTM algorithm with ARIMA on wavelet denoised Bitcoin prices

Researcher: Ndamulelo Innocent Nelwamondo, University of the Witwatersrand, Johannesburg
Supervisor: Dr Farai Mlambo, University of the Witwatersrand, Johannesburg

The cryptocurrency market is recognized for its intense uncertainty and instability, and people are still searching for a reliable and convenient way to direct cryptocurrency trading. An overall of 4 models, namely ARIMA, LSTM, attention-based LSTM, and hybrid attention based LSTM-ARIMA were used to forecast the prices Bitcoin in which the hybrid attention-based LSTM-ARIMA model on wavelet denoised Bitcoin prices with MSE = 42816.51, RMSE = 206.92, MAE = 167.34 and R2 = 0.8172 was found to be the best fitting model.


Read More

Surveying informal settlements in the Gauteng province using Machine Learning

Researcher: Ronewa Nemalili, University of the Witwatersrand, Johannesburg
Supervisor: Professor Bruce Mellado, University of the Witwatersrand, Johannesburg

One of the good ways in which the government can provide better services to its people is through surveying informal settlements, which is why an improvement is always needed on the techniques of surveying them. Fortunately in this project through DSI-NICIS NEPTTP funding the classification of informal settlements were improved through surveying all kinds of dwellings instead of only focusing the areas which were known to be consisted of informal settlements.


Read More

The Relationship Between Climate Change and Human Fertility

Researcher: Ayrton Altorio, University of the Witwatersrand, Johannesburg
Supervisor: Prof. Nicole de Wet-Billings, University of the Witwatersrand, Johannesburg

Climate change, as we know it, has been a key point of academic discussion for many decades. Throughout these years there has been scarce research interest directed towards the relationship between climate change and human fertility outcomes. This study aims to investigate the relationship between climate change and human fertility by employing Poisson regression analyses to longitudinal cross national data for 109 countries over the fifteen year time period spanning 2000 to 2015. Our dependent variable is Total Fertility Rate. The two main independent variables are associated with climate change and measure (1) annual precipitation levels and (2) percentage of arable land. The control variables span a range of demographic indicators that are known to be accurate predictors of TFR, such as GDP per capita. The results of our Poisson regression models indicate that both of our climate change indicators are significant in predicting change in TFR, with changes in arable land having the largest estimate of all our predictor variables.


Read More

The prevalence and probability of hypertension among youth 15-34 years old, in South Africa

Researcher: Lucas Banda, University of the Witwatersrand, Johannesburg
Supervisor: Prof. Nicole de Wet-Billings, University of the Witwatersrand, Johannesburg

There is limited and not so latest information from nationally representative data that exist about the prevalence of hypertension among the youth 15-34 ‘years olds’ in South Africa, though the prevalence is over 30% among the adult population 15+ (Seedat, Rayner, and Veriava, 2014). Youth in South Africa are faced with challenges such as unemployment and environments that expose them to poor diet habits, to alcohol and sub-stance abuse (Peltzer and Phaswana-Mafuya, 2013). Thus, understanding the probability of hypertension among the youth, 15-34 years in South Africa, conditional to demographic, socioeconomic and behavioral factors is critical.


Read More