Reviews & Ratings: Do they match?

Researcher: Olatomiwa Akinlaja, Sol Plaatje University
Supervisor: Dr M.S Mosia, Sol Plaatje University

Reviews and ratings influence us every day. We have all experienced reviews and ratings in one form or the other, usually from the applications and online platforms that require us to leave a review based on our experience. Many authors have asked the question; should we analyze the writer’s intentions or perceptions. Multiple studies have been conducted within the area of Natural language processing (NLP) in order to extract value from text. We use deep learning and sentiment analysis to extract value from reviews in order to justify its respective rating.


Read More

“They Steal Our Jobs”: An analysis of Group Threat Theory toward immigrants. A case study of South Africa and the European Refugee Crisis

Researcher: Yasmin Sizwe, University of the Witwatersrand, Johannesburg
Supervisors: Professor Rod Alence, University of the Witwatersrand, Johannesburg

This research will be testing the theory of group threat which hypothesizes that economic conditions and immigrant populations are the main influences of negative perceptions towards immigrants in host countries. This analysis will be specifically be looking at Gross domestic product (growth) and Net migration as the main predictors, which a perception index will be regressed against these predictors. This analysis specifically uses a multiple regression model as a means of determining this relationship. In using this model, it is concluded that Net migration is more influential on perceptions than economic conditions in shaping perceptions of immigrants. This relationship is further explored below.


Read More

Bank Credit Card Default Prediction Using Machine Learning Techniques

Researcher: Nompumelelo Sibiya, University of the Witwatersrand, Johannesburg
Supervisors: Mr Rendani Mbuvha, University of the Witwatersrand, Johannesburg

Due to increasing popularity associated with artificial intelligence, machine learning and data availability, most banks are renewing their business models. Credit risk predictions is one of the most vital keys to evaluation measure and decision making. This study establishes two binary classifiers based on machine learning models namely; Support Vector Machines and Gradient Boosting Machines as well as the classic Logistic Regression on real credit card data in predicting lo an default probability. Results reveal that class distribution has a major effect on the performance of the models, how ever, gradient boosting can cater to this and produce robust performance.


Read More

Using text mining on reviews of Accommodation establishments on TripAdvisor to draw insights on perceived South African weather

Researcher: Subhashini Pillay, University of the Witwatersrand, Johannesburg
Supervisors: Dr Jennifer Fitchett, University of the Witwatersrand, Johannesburg

Social media is an becoming an important platform for deriving the opinions, views and perceptions of tourists (O’Connor, 2010);
TripAdvisor is the most popular reviewing platform for tourists (Litvin and Dowling, 2018); and
With the sheer amount of data being generated from TripAdvisor, text mining techniques may be the easiest way to determine tourists views and perceptions related to the weather (Berezina et al, 2016).


Read More

Survival Analysis: Modelling Time to Event Data with Time Varying Features using Survival Trees and Forests

Researcher: Andisani Nemavhola, University of Venda
Supervisors: Dr Justine Nasejje, University of the Witwatersrand, Johannesburg

This study presents and implements random survival forests (RSF) and two of its split rules. The two RSF split rules that this study focuses on are the 𝐿1 and the logrank.

The research questions for this study are the following:

  1. Can the proposed 𝐿1 split rule for building RSF be used in the presence of non-proportional hazards
    or time varying covariates?
  2. Does the proposed 𝐿1 split rule for RSF perform better than the log rank split rule?


Read More

Deep Learning & Its Application To Searches Beyond The Standard Model With The ATLAS Detector At The Large Hadron Collider

Researcher: Asiphe Mzaza, University of the Witwatersrand, Johannesburg
Supervisors: Prof. Bruce Mellado, University of the Witwatersrand, Johannesburg

Anomalies in the observations of Run 1 and Run 2 data from the ATLAS detector at the LHC suggest physics beyond the Standard Model. The Madala Hypothesis, attempts to explain these anomalies with new hypothetical scalars H and S with 2mh<mH<2mH and mh<mS<mH, where mh and mt are the masses of the Higgs boson and top-quark. These bosons are being searched in the leptonic decay channels, such as H→SS→4W [1], which have extremely low signal-to-background ratios. A discriminative machine learning algorithm is developed to distinguish between signal and background. Specifically, the viability of a deep neural network is assessed. A neural network model with useful diagnostic power is constructed using simulations of the di-lepton decay process.


Read More

Market Fraction Hypothesis: Application to the South African Financial Market

Researchers: Patrick Mthisi, University of the Witwatersrand, Johannesburg
Supervisor: Dr Yudhvir Seetharam, University of the Witwatersrand, Johannesburg

This project uses a disaggregate modeling approach to model the behaviour of key financial market players in order to assess the quality of South African financial market. To achieve this, an Agent-based Model [1] is used as the primary tool. In addition, theoretical concepts that underlay Market Fraction Hypothesis [2] are used and the market fractions of the agents’ risk clusters are observed over a period of time to assess the quality of the financial market.


Read More

Understanding The Jet Activities In Association With The Dileptons Using Machine Learning Techniques With The Use Of The ATLAS Detector At LHC

Researcher: Mashaka Molepo, University of Venda
Supervisor: Professor Bruce Mellado, University of the Witwatersrand, Johannesburg

The use of neural networks as sources for the study in Monte Carlo simulation for the detection of hadronic -decay w boson including strong jet quacks [1]. The need for data science algorithms in deep neural networks leads to a strong rejection of the fundamental one-jet correlation sub-structure [1]. The use of deep learning methods has been noticed in the context of an increased deep neural networks to enhance context exclusion including the use of implantation sub-sub-structures and mass-tagers [1]. The corresponding classifiers are often linearly associated with the parameters in their substructures.


Read More

Comparative Analysis of Logistic Regression over Homomorphically Encrypted Data and Decrypted Data

Researcher: Linhle Mbombo, University of Venda
Supervisor: Professor Augustine Munagi and Professor Turgay Çelik, University of the Witwatersrand, Johannesburg

Machine learning (ML) algorithms is improving auto-mated tasks, using data to make predictions and solve clustering problems. The data that is used to fit the model is from institutions, organizations, etc. that have sensitive data. Such as personal, medical and financial data. The research seeks to bridge the security gab and design secure measures of preserving the privacy of data used in the process. Homomorphic encryption concept allows computations on encrypted data. The research uses Paillier algorithm for an encryption scheme.


Read More

The development of Extreme Learning Machine with di-lepton data from the ATLAS detector at the LHC

Researcher: Makgoka Nkoana, University of the Witwatersrand, Johannesburg
Supervisor: Dr Farai Mlambo, University of the Witwatersrand, Johannesburg

Wavelet Neural Networks were inspired by neural networks as well as wavelet decomposition, and as such Zhang and Beneveniste sort to combine the two concepts. This particular network is often used in the financial industry for making forecasts related to the stock market and as such this Investigation sort to compare the results of stock market forecasts made several implementations of the Wavelet Neural Networks, with different activation functions. The main findings from this investigation found that the Shannon Wavelet Neural Network and the Gaussian Neural Network were the best performing followed by the Mexican Hat Neural Network.


Read More