Fingerprint Image Compression using DCT-based Algorithms

Researcher: Promise Magoma, University of Venda
Supervisors: Dr Farai Mlambo, University of the Witwatersrand, Johannesburg

The main aim of this research project is to develop an image compression model based on discrete cosine transform (DCT) to reduce image redundancy (noise).


Read More

Automated Text Analysis as an Exploratory Aid to Traditional Content Analysis

Researcher: Thomas Lancaster, University of the Witwatersrand, Johannesburg
Supervisor: Professor Rod Alence, University of the Witwatersrand, Johannesburg

The current study aimed to examine the utility of text mining methods within the broader process of qualitative content analysis. The study aimed to examine whether text mining; in the form of word frequency analysis and topic modelling, could be utilized in the role of an exploratory text analysis method. In order to attempt to answer this question, the study aimed to examine the utility of text mining in the examination of the concept of spirituality within extracts of narrative sections of both the AA and NA primary texts. The paper worked under the assumption that words associated with the concept of spirituality would be heavily represented within the text corpus.


Read More

The use of machine learning to extract a physics model from ATLAS experiment at the LHC

Researcher: Thanyani Gumani, University of Venda
Supervisor: Professor Bruce Mellado, University of the Witwatersrand, Johannesburg

The standard model of particle physics outlines our understanding of the fundamental particles of existence and their interactions. To enhance our understanding of this area, experiments with ever greater energies and intensities have been needed, generating extremely large and detailed data samples. The use of machine learning methods revolutionizes the analysis of these data samples and greatly increases current and future research in their capacity for exploration. There is an overview into the ATLAS experiment and the LHC and Decision Trees and the debate about possible insights and issues. The connections between the machine learning and energy physics analysis are discussed. We consider the supervised machine learning classification in this paper. In this study we apply the MVA methods proposed to analyse their performance using the di-lepton data from the ATLAS experiment at the LHC. Results demonstrate the good performance of the chosen MVA methods, where TMVA is used for computation.


Read More

Population Density and National-Level Governance in Africa

Researcher: Pranish Desai, University of the Witwatersrand, Johannesburg
Supervisor: Professor Rod Alence, University of the Witwatersrand, Johannesburg

Previous studies into the effect of population density on governance in Africa have been mainly qualitative due to problems with reliable population data. In recent years population data has become more reliable but no quantitative study into the density -governance relationship hasoccurred, a void this study aimed to fill.


Read More

Deep Machine Learning in the search for new bosons at the Large Hadron Collider”

Researcher: Nkateko Baloyi, University of the Witwatersrand, Johannesburg
Supervisors: Professor Bruce Mellado, University of the Witwatersrand, Johannesburg

The search for new Bosons implies that protons in the LHC collide at very high energies and high Luminosity. High luminosity increases the probability of discovering new particles be-yond the standard model (BMS) and also increases the back-ground. The aim is to define a Machine learning algorithm that can suppress the background for signal enhancement and be applied on the search for new bosons.


Read More

The Impact of Initialization Strategies on the K-Means Convergence

Researchers: Tshauambea Murendeni, University of Venda
Supervisor: Ms Nothabo Ndebele, University of the Witwatersrand, Johannesburg

Clustering is a method where information items are grouped to attain the objective of maximizing within cluster resemblance and dissimilarity of different clusters [1]. The kmeans algorithm is commonly used, simple and ease to implement, unsupervised partitioning clustering algorithm. The kmeans convergence to the optimal solution is dependent on the initialization strategy. This study utilizes 3 initialization strategies namely: the random, k-means++ and farthest transversal to experiment on the k-means algorithm. The experiments were conducted on various consumer segmentation data sets of different sizes and data structures. The comparison made on these initialization strategies were the quantity of steps the k-means algorithm took to reach its optimal solution. The experiments show that all the initialization strategies lead to the same optimal solution of the kmeans algorithm. However the k-means++ reachs the optimal solution with less iterations compared to other initialization strategies used in this study. For the data sets utilized in this research the k-means++ initialized k-means is more efficient or faster than the k-medoids algorithms to reach their optimal solution.


Read More

Supervised versus Semi-supervised Machine Learning in the search for new Physics with the ATLAS data

Researcher: Lindiwe Malobola, University of the Witwatersrand, Johannesburg
Supervisor: Professor Bruce Mellado, University of the Witwatersrand, Johannesburg

Present-day machine learning methods can be used to build very powerful models for difficult collider physics problems. In numerous applications, the models are often built using fully supervised machine learning methods that require fully labelled data. In collider physics, the data is often unlabelled thus the norm is to generate computer-simulated data that tries to mimic the real data and then build models on this computer-generated data. The computer-generated data does not match the truth-level data thus building models on this data can be problematic as the models can sometimes learn things that are not naturally present but occur as a result of the simulation procedure. This study investigates semi-supervision as a machine learning methodology for new particle discovery. With semi-supervision ,the models can be built without labelled data which means we can start training models on real collision data. We present a comparison of full-supervision and a semi-supervision method called weak supervision. Weak supervision allows us to extract information from data by having partial knowledge of the data, for example, if we know the background well, we can extract unknown signals from the data.


Read More

Identifying themes in online harassment using text analytics

Researcher:  Miss Safiyyah Ismail, The University of the Witwatersrand
Supervisor: Professor Rod Alence, University of the Witwatersrand, Johannesburg

Nothing quite affects human behaviour as much as the internet. With a cloak of invisibility and anonymity in cyberspace, a new phenomenon known as cyberharassment has been created. It is suggested that these behaviours are magnified online due to the Online Disinhibiton Effect and Deindividuation [7]; and occur under 5 major themes: political, racial/religious, sexual, appearance-related, and intelligence-related. Thus, the current study aimed to provide a mixed-method analysis of textual data to explore the nature of cyber-harassment on Twitter – a popular social networking platform well-known for aggressive and hostile online behaviours. Latent Dirichlet Allocation (LDA) Topic Modelling was used to explore the nature of online harassment by identifying key themes online. Results found that political and racial/religious themes are most prevalent online. The current research study has contributed to existing literature by exploring the nature of existing themes which have been identified in online harassment. While the results did not adequately address the proposed question, it forms part of the foundation to understanding the nature of online behaviours.


Read More

Using multisource machine learning to map Prosopis glandulosa in the Griekwastadarea

Researcher: Wessel Bonnet, University of the Witwatersrand, Johannesburg
Supervisors: Elhadi Adam and Turgay Çelik, University of the Witwatersrand, Johannesburg

The project seeks to show that the accuracy with which Prosopis glandulosacan be mapped for the Griekwastadarea in the Northern Cape using satellite imagery is improved by including surrounding pixels and an additional higher-resolution RGB image in the analysis.


Read More

The use of machine learning in the search for di-photons in association with missing energy

Researcher: Theodore Cwere Gaelejwe, University of the Witwatersrand, Johannesburg
Supervisor: Prof. Bruce Mellado

The Large Hadron Collider (LHC) generates petabytes of data per second during each data taking period and has long term data storage in the order of exabytes. Sophisticated machine learning (ML) techniques are used at the trigger and final state level to analyse this data. Boosted Decision Trees (BDTs) in particular, have been the default ML tool for this task. However, in the recent past, more modern techniques such as Deep Learning have emerged and there has been growing justification for their use in High Energy Physics (HEP). We conduct a comparative study between BDTs and (Deep Neural Networks) DNNs in classifying signal and background events in the H → γγ + Χ decay channel. A comparison between a fully supervised and weakly supervised model is also conducted. Results suggest that DNNs outperform BDTs and the fully supervised model is outperformed by the weakly supervised model though it is more robust.

View the full report

Read More