Supervised versus Semi-supervised Machine Learning in the search for new Physics with the ATLAS data
Researcher: Lindiwe Malobola, University of the Witwatersrand, Johannesburg
Supervisor: Professor Bruce Mellado, University of the Witwatersrand, Johannesburg
Present-day machine learning methods can be used to build very powerful models for difficult collider physics problems. In numerous applications, the models are often built using fully supervised machine learning methods that require fully labelled data. In collider physics, the data is often unlabelled thus the norm is to generate computer-simulated data that tries to mimic the real data and then build models on this computer-generated data. The computer-generated data does not match the truth-level data thus building models on this data can be problematic as the models can sometimes learn things that are not naturally present but occur as a result of the simulation procedure. This study investigates semi-supervision as a machine learning methodology for new particle discovery. With semi-supervision ,the models can be built without labelled data which means we can start training models on real collision data. We present a comparison of full-supervision and a semi-supervision method called weak supervision. Weak supervision allows us to extract information from data by having partial knowledge of the data, for example, if we know the background well, we can extract unknown signals from the data.