Worldwide Thesis Database & PhD tips

Online learning in the presence of strategic adversary - PhDData

Access database of worldwide thesis

Online learning in the presence of strategic adversary

The thesis was published by Dinh, Le Cong, in May 2023, University of Southampton.

Abstract:

This thesis offers a comprehensive exploration of the online learning problem in which an agent needs to strategise against a strategic adversary (also known as a no-regret adversary). Through examination of three interrelated settings, we have devised novel algorithms that achieve improved performance guarantees and last round convergence to the Nash Equilibrium both in theoretical and empirical contexts. Our findings open the door to further investigation of complex problems in online learning and game theory, where strategic adversaries play a crucial role in a multitude of applications.In the first of the three main chapters comprising our study, we examine the problem of playing against a strategic adversary under a two-player zero-sum game setting. In this scenario, we introduce a new no-dynamic regret algorithm, namely the Last Round Convergence of Asymmetric Games (LRCA), that achieves last round convergence to the minimax equilibrium. Building on this work, the second main chapter investigates the more general problem of online linear optimization and proposes several new algorithms, including Online Single Oracle (OSO), Accurate Follow the Regularized Leader (AFTRL), and Prod-Best Response algorithm (Prod-BR). These algorithms achieve state-of-the-art performance guarantees against a strategic adversary, such as no-forward regret and no-dynamic regret. Additionally, we show that a special case of AFTRL, the Accurate Multiplicative Weights Update (AMWU), can achieve last round convergence to the Nash equilibrium in self-play settings. In the third and final main chapter, we extend our results to the challenging setting of Online Markov Decision Processes (OMDPs), which have many significant applications in practice. Here, we propose two new algorithms, MDP-Online Oracle Expert (MDP-OOE) and Last Round Convergence-OMDP (LRC-OMDP), that achieve no-policy regret and last round convergence to the Nash equilibrium, respectively, against a strategic adversary.

The full thesis can be downloaded at :
https://eprints.soton.ac.uk/476759/
https://eprints.soton.ac.uk/476759/1/Le_Cong_Dinh_PhD_Thesis_pdf_A_3b_final.pdf

Read the last PhD tips

2022
September

Why are PhDs doctors of philosophy?
2022
October

What rude awakening are graduate and Ph.D. students in for?
2022
September

How do PhD students keep their motivation up?
2022
October

The Hidden Downsides of Having a Ph.D.
2022
September

Can a Ph.D. supervisor fire you?