I am a Staff Research Scientist/Deep Learning researcher at Mediatek Research UK. My work focuses on reinforcement learning and its application to challenging real-world problems, in domains such as chip design, industrial and experimental design, drug discovery, robotics, visual navigation, or human-computer interaction. I am particularly interested in incorporating human-like flexibility, adaptivity and autonomy into machine learning algorithms, with the aim of developing the theoretical and practical approaches needed to understand, predict, and assist human behaviour, or complement human expertise.

In pursuit of these goals I use tools from Reinforcement Learning, Bayesian Statistics and Deep Learning to develop algorithms for lifelong and continual learning problems. In this framework I have worked e.g. on multi-task and continual RL and flexible planning, as well as on transfer learning in computer vision and combinatorial optimization.

I received my PhD from New York University, where my advisor was Joe LeDoux, and my thesis comprised a mix of computational neuroscience and experimental work. Following my PhD, I spent some time as a postdoc at the University of Geneva and at the University of Oxford. Before science, I was studying and working as a ‘cellist for many years in the U.K., Germany, and France.

You can find my CV and resume by clicking on these links. To get in touch (e.g. to request code, or with any comments) please email tamas*at*nyu*dot*edu, or leave a message here.

Selected Publications

Some central challenges preventing RL form autonomously solving real-world problems are the ability to break down complex tasks into manageable sub-tasks, and to transfer knowledge from one task/sub-task to another. In a series of papers and ongoing work I have contributed new algorithmic tools to decompose and reason about the structure of complex, non-Markovian reinforcement learning problems, while also allowing efficient planning in new settings. By representing temporal and causal invariances in the reward functions of such tasks, and a savvy re-evaluation of past experience given the current problem setting, this approach allows fast generalization to new tasks within a task family.  LPI: Learned Positional Invariances for Transfer of Task Structure and Zero-shot Planning was published at the Workshop on Responsible Decision Making in Dynamic Environments  at ICML 2022, and Learning transferable task schemas by representing causal invariances  at the Causal Learning and Decision Making Workshop at ICLR 2020.

An important problem when transferring knowledge in the RL setting is to know how to act when our goals or the available rewards in the environment change. How do we know when these changes happen, which policies to reuse, how to quickly adapt them, and how to discover newly available rewards? Our paper, ‘Better transfer learning with inferred successor maps‘, gives some answers to these questions. By continually evaluating task similarity online, clustering similar tasks, and quickly approximating optimal policies for new reward functions, the Bayesian Successor Representation algorithm quickly solves new tasks as rewards or tasks change without explicit task boundaries. It outperform competing approaches on continual learning problems by proposing a new way to combine nonparametric clustering methods with factorised representation of the value function. The paper ‘ was awarded a spotlight talk at NeurIPS 2019, with the code available on GitLab. For the neuroscience enthusiasts, we also show how the brain might do something similar, and how this explains hippocampal remapping and so-called splitter cells as goal-directed inference.

During my post-doc with Alex Pouget, we collaborated with Alan Carleton’s group examining task-dependent learning strategies for olfaction. The resulting paper was published in Neuron. For this work I developed a computational model of neural dynamics during olfactory learning that highlights the importance of cortical feedback in representation learning and odour discrimination, and explains the differences in circuit behaviour during unsupervised, and supervised/reinforcement learning.

A large part of my thesis investigated how the brain deals with ambiguities in the environment and learns statistical structure for predictive models, as summarised in this paper published in Nature Neuroscience with Joshua Johansen and others. It relates a computation model of structure learning in probabilistic graphical models to the behaviour and neural circuitry of rodents learning about ambiguous predictive relationships. This thesis work was also awarded NYU’s Samuel J. and Joan B. Williamson Dissertation Fellowship.

Thanks for visiting!