Academic Partnerships

The Discovery: Two Sigma PhD Symposium

An event with knowledge sharing and innovation with fellow PhD students, professors, and Two Sigma researchers.

Event Overview

The Discovery, Two Sigma’s PhD Symposium, brings together the top PhD students from across North America to exchange ideas with fellow PhD students, distinguished professors, and Two Sigma researchers at our Soho headquarters. This invite-only event offers PhD students with the unique opportunity to share and solicit feedback on their current research.

Inside The Discovery

Connect with
Two Sigma

The Academic Partnerships program offers universities, research labs, professors, and students many ways to connect with Two Sigma. Our mission is to foster inclusive communities where academics of all backgrounds feel empowered to expand frontiers in STEM.

2024 PhD Participants

Daniel
Halpern

Wanqiao
Xu

Shubham
Ugare

Haque
Ishfaq

Awni
Altabaa

Yecheng
(Jason)
Ma

Daniel Halpern

Harvard University
Computer Science

Social Choice Under Uncertainty

This talk will touch on a few projects in social choice under uncertainty. Social choice theory gives a variety of tools for making a decision for a group of people based on the individuals’ heterogeneous preferences. These solutions often enjoy provable mathematical guarantees about the quality of the outcome. However, in many cases, we will not have enough data on people’s preferences in order to make the recommended decision. The question guiding this research agenda is what kind of theoretical guarantees can we make using this limited information?

We then focus on one project inspired by the opinion aggregation website pol.is. On pol.is, participants submit free-text opinions on a topic and then vote on other users’ submitted opinions, either indicating agreement or disagreement. The platform’s goal is to collect enough of this agreement/disagreement data in order to output a summary of the populations’ preferences. If we knew all participants’ opinions on all submitted opinions, tools from social choice theory would allow us to select a diverse subset of opinions that “represent” the entire population in a formal sense. However, in this practical scenario, people will only feasibly be able to respond to 20 of other users’ opinions, even though thousands have been submitted. We show that if this data is collected in the canonical way of showing random opinions from other users, finding a representative set will, in some cases, be impossible. On the other hand, we devise an adaptive sampling method that, provably, will be able to find a representative set with high probability.

Wanqiao Xu

Stanford University
Management Science & Engineering

RLHF and independence of irrelevant alternatives (IIA)

Modern generative AIs ingest trillions of data bytes from the Internet to produce a large pretrained model. Trained to imitate what is observed, this model can exhibit undesirable behaviors alongside the intended ones. Further training through human interaction, even on fewer than a hundred thousand bits of data, has proven to greatly enhance usefulness and safety, enabling the remarkable AIs we have today. This process of reinforcement learning from human feedback (RLHF) steers AIs toward the more desirable among behaviors observed during pretraining.

With all the effort and resources directed at RLHF, it is worth asking whether current algorithms rest on firm foundations. Maybe not, as these algorithms are based on models that assume independence of irrelevant alternatives (IIA), which intuitively means that, when making a choice between two options, the introduction of a third option should not alter preferences between the original two. This work demonstrates that human preferences for text content violate IIA. In this talk, I will present evidence of how seemingly minor changes in the feedback format or learning algorithm can lead to unintended outcomes. This raises concerns about the robustness of current RLHF methodologies. The research proposes avenues for future exploration to develop more flexible and human-centric approaches to guide generative AI training.

Shubham Ugare

University of Illinois Urbana-Champaign
Computer Science

SynCode: Structured LLM Generation

LLMs are widely used in complex AI applications. These applications underscore the need for LLM outputs to adhere to a specific format, for their integration with other components in the systems. Typically the format rules e.g., for data serialization formats such as JSON, YAML, or Code in Programming Language are expressed as context-free grammar (CFG). Due to the hallucinations and unreliability of LLMs, instructing LLMs to adhere to specified syntax becomes an increasingly important challenge.

In this talk, I will talk about my recent work SynCode, a novel framework for efficient and general syntactical decoding with LLMs, which addresses this challenge. SynCode leverages the CFG of a formal language, utilizing an offline-constructed efficient lookup table called DFA mask store based on the discrete finite automaton (DFA) of the language grammar terminals.

 

Haque Ishfaq

Montreal Institute for Learning Algorithms &
McGill University

Computer Science

Bridging Theory and Practice: Efficient Exploration in Reinforcement Learning via Langevin Monte Carlo

I will present a scalable and effective exploration strategy based on Thompson sampling for reinforcement learning (RL). One of the key shortcomings of existing Thompson sampling algorithms is the need to perform a Gaussian approximation of the posterior distribution, which is not a good surrogate in most practical settings. We instead directly sample the Q function from its posterior distribution, by using Langevin Monte Carlo, an efficient type of Markov Chain Monte Carlo (MCMC) method. Our method only needs to perform noisy gradient descent updates to learn the exact posterior distribution of the Q function, which makes our approach easy to deploy in deep RL. We provide a rigorous theoretical analysis for the proposed method and demonstrate that, in the linear Markov decision process (linear MDP) setting, it has a sub linear regret bound. We apply this approach to deep RL, by using Adam optimizer to perform gradient updates. Our approach achieves better, or similar results compared with state-of-the-art deep RL algorithms on several challenging exploration tasks from the Atari57 suite.

Awni Altabaa

Yale University
Statistics

Disentangling Relational and Sensory Information in Transformer Architectures

The Transformer architecture is at the heart of many modern-day applications of artificial intelligence, including the large language models that have taken the world by storm with the advent of ChatGPT. The Transformer achieves this through a surprisingly simple architecture that implements a process of iterative information retrieval (attention), followed by local processing (position-wise MLP). Two distinct types of information are essential under this general computational paradigm: 1) “sensory” information about individual objects, and 2) “relational” information describing the relationships between objects. The standard attention mechanism of Transformers naturally encodes the former, but does not explicitly encode the latter.

In this talk, I will describe some of our recent work that proposes an extension of the Transformer architecture where multi-head attention is augmented to comprise two distinct types of attention heads, each routing information of a different type. The first type is the standard attention mechanism of Transformers, which captures object-level features, while the second type is a novel attention mechanism we propose to explicitly capture relational information. The two types of attention heads each possess different inductive biases, giving the resulting architecture greater efficiency and versatility.

Yecheng (Jason) Ma

University of Pennsylvania
Computer Science

Foundation Reward Models for Robot Reinforcement Learning

A promising path towards general-purpose robots is by learning reward models that enable robots to learn various tasks autonomously with minimal human supervision. In this presentation, I will present my recent progress in learning foundation reward models that can jumpstart skill learning across robots, tasks, and perceptual inputs. First, I will present Eureka, a LLM-based reward design system that writes reward function code that can exceed the performance of manually-written ones by domain experts; Eureka is capable of self-improving, adapting to human preference, and unlocking new dexterity tasks such as pen spinning in simulation. Then, I will present DrEureka, which extends Eureka to have LLMs supervise the full Sim-To-Real pipeline; DrEureka enables a quadruped robot to balance and walk on a yoga ball in the real world despite training only in simulation. Finally, I will discuss a separate line of work that shows how to directly use in-the-wild human videos to supervise general-purpose value functions that can zero-shot produce dense rewards for manipulation tasks specified in images and texts.

Participating Professors

Arthur Jacot

New York University
Deep Neural Networks

Breaking the Curse of Dimensionality with DNNs: Learning Features and Symmetries

Neural Scaling Laws describe how much data and compute is needed for Deep Neural Networks (DNNs) to achieve certain performance, and they have become central tools in the design of the recent generation of large scale models. I will discuss a few theoretical results that allow us to predict the exponent which drives these laws. In the absence of feature learning, and relying only on the `smoothness’ of the task, the number of data required to achieve a certain error scales exponentially with dimension. This is called the curse of dimensionality, suggesting that efficient learning in high dimension is impossible in general. But in the presence of feature learning it scales in the dimensionality of the features learned, which can be much lower than the input dimension, which could explain the success of deep learning on tasks that were previously thought almost impossible. This illustrates the importance of learning low-dimensional features, which can be interpreted as the network learning symmetries of the task. This suggests that the main task of feature learning is to forget as much irrelevant information as possible.

Noah Snavely

Cornell University
Computer Vision and Computer Graphics

Reconstructing 3D Shape and Motion from the World’s Imagery

Computer vision and machine learning methods are getting pretty good at reconstructing beautiful 3D models from 2D images. I’ll talk about two ongoing, fundamental challenges: reconstructing 3D scenes that exhibit symmetries and repeated patterns, and reconstructing scenes that are in motion. These methods have applications ranging from VR and AR to generative AI for synthesizing 3D shape and video.

Learn More About Two Sigma's
Academic Partnerships Program

Our Academic Partnerships program aims to support and recognize outstanding students and educators who share our passion for learning and seeking a deeper understanding of the world around us. Our video here explains more.

Questions? Contact Us

Email Us