Thesis Defense - Master of Science Rwiddhi Chakraborty

Master of Science Rwiddhi Chakraborty will Friday December 13th, 2024, at 12:15 hold his Thesis Defense for the PhD degree in Science. The title of the thesis is:

«Model and Data Diagnosis under Limited Supervision in Modern AI»

Abstract:

Deep Learning in modern Artificial Intelligence (AI) has witnessed unprecedented success on a variety of domains over the past decade, ranging from computer vision to natural language reasoning tasks. This success is owed primarily to the availability of large, annotated datasets, the existence of powerful mathematical models, and the mechanism to train large models on such data with advanced resources of compute. However, this success has led to increased scrutiny on the failure points of models trained on suspect data. Issues such as model and data bias, reliance on spurious correlations, and poor generalization capability on challenging test data, to name a few, have surfaced in the research community. As a result, it seems imperative to diagnose such systems for generalization performance on challenging test data, and uncovering potential biases hidden in datasets. In this thesis, we address these key challenges through the following directions: first, in the generalization capabilities with limited labeled data - few-shot learning, semi-supervised learning, and unsupervised learning. Second, towards bias discovery in existing models and datasets, particularly in unsupervised group robust learning, and debiased synthetic data generation. Our two broad directions are encapsulated by a common challenge: the paucity of labeled data, since manually annotating large datasets is a time consuming and expensive process for humans. This motivation is relevant today due to the exponential growth in the sizes of models and datasets in use. It is becoming more and more intractable for humans to annotate billions of data points, leading to large benchmark datasets that are not well calibrated with human expectations on fairness. These issues, if left unchecked, are inevitably exacerbated when models train on such datasets. We consider these two directions, i.e. model generalization with limited labels, and the existence of biased data, to be two sides of the same coin, and thus coin the framework encapsulating such research as Model and Data Diagnosis. This work proposes novel contributions in few-shot learning, semi-supervised learning, unsupervised learning, and in data diagnosis and debiasing techniques. Further, we show that model and data diagnosis do not necessarily exist as disparate entities, and can be viewed in a co-dependent context. Finally, this thesis hopes to amplify the scrutiny surrounding model capabilities, however impressive, trained on datasets, however vast.

Supervisory Committee:

Professor Michael Kampffmeyer, IFT, UiT (Main Supervisor)
Professor Robert Jenssen, IFT, UiT (Co-Supervisor)
Associate Professor Benjamin Ricaud, IFT, UiT (Co-Supervisor)
Associate Professor Shujian Yu, IFT, UiT (Co-Supervisor)

Evaluation Committee:

Assistant Professor Yuki M. Asano, University of Amsterdam, Netherlands (1. opponent)
Professor Adín Ramírez Rivera, IFI, UiO (2. opponent)
Associate Professor Elisabeth Wetzer, IFT, UiT (internal member and leader of the committee)

Streaming Site:

The disputas and trial lecture will be streamed from these sites:

Disputas (12:15 - 16:00)
Trial Lecture (10:15 - 11:00)

Thesis:

The thesis is available at Munin Here.