ICLR 2019: Our Favorite Machine Learning Talks and Papers

Two Sigma researchers highlight a few particularly insightful papers, talks, and presentations from ICLR 2019.

The International Conference on Learning Representations (ICLR) is is one of the “Big 3” international machine learning conferences (along with ICML and NeurIPS). ICLR is steadily gaining a reputation for the cutting-edge research presented there each year, with a particular focus on deep learning, as well as other areas of artificial intelligence, data science, and statistics.

Two Sigma is always eager to stay abreast of the latest machine learning research–and to contribute back to the community when we can. As in past years, we sponsored ICLR 2019, which took place in New Orleans, and several Two Sigma researchers and engineers were in attendance. Below, they highlight a handful of particularly insightful papers, talks, and presentations that may be of interest to practitioners in many different fields.

Special thanks to Rachel Malbin, Kishan Patel, Mike Schuster, and Rhys Ulerich for their contributions to this write-up.

Learning without a Goal

Task-agnostic reinforcement learning received a lot of attention at ICLR 2019, with a workshop dedicated to recent advancements in the field.

Recent works have focused on tying experiences learned without a goal to an action plan when a goal is specified, and to efficient exploration while the agent is exploring a space in free-play.

  • Skew-Fit by Vitchyr Pong et al. optimizes state coverage by training a generative model to pick a diverse set of goals for the agent to reach.  Over iterative improvements, the generative model approaches a uniform distribution across all reachable states.
  • In State Marginal Matching with Mixtures of Policies by Lisa Lee et al., the model seeks to maximize marginal state entropy, as well as mixtures of state distributions, to accelerate exploration and generalization on unseen tasks.

More Efficient Exploration for Reinforcement Learning

While, in recent literature, reinforcement learning has seen strong performance in environments with dense rewards, games with either very sparse rewards or a single final reward have proven more difficult to learn. Montezuma’s Revenge has been one such difficult game for RL agents to learn, as there is only a single final reward, and the agent has to traverse back to rooms it has already seen to pick up key components.

  • Recent work from OpenAI by Burda et al. has achieved record-breaking performance on Montezuma’s Revenge by providing the agent a reward for “curiosity,” or exploring states it has not seen before.
  • Previous curiosity algorithms suffered from the “noisy TV problem,” in which agents end up stuck in a loop and receive large rewards from a stochastic change in the environment.
  • The paper from Burda avoids the noisy TV problem through random network distillation, where the net produces a deterministic random feature vector for a given state and sidesteps a noisy transition.

The potential benefits of such research aren’t limited to video games, of course. It may help improve agent robustness in a variety of settings with uncontrolled environments (e.g., driving).

Image source: https://medium.com/@awjuliani/on-solving-montezumas-revenge-2146d83f0bc3

End-to-End Optimization with Reinforcement Learning

Recent advancements in reinforcement learning tuned outer loop hyperparameters as a meta-learning task. In these problems, an RL agent receives a reward based on the performance of some inner loop, and then modifies the hyperparameters of the inner loop problem to further maximize its reward.

  • Reinforcement learning has been incorporated as a meta-learning loop for finding optimal net architectures, such as in Xie et al.
  • Ruiz et al. explored using RL to modify hyperparameters of a simulator to produce different distributions with the goal of maximizing performance on an external validation set

Inspiration Drawn from Differential Equations and Max-Affine Spline Operators

Researchers in deep learning continue to incorporate well-known results from other disciplines:

Continued Refinement of the Great Tricks

Researchers continued to better understand and automate many common approaches.  For example:

  • Zhang et al. studied what exactly is the regularization effect provided by weight decay.
  • Luo et al. proposed “Switchable Norm,” which automatically behaves like the best of layer, channel, or mini-batch normalization.

Sophisticated Analysis of Training Processes

ICLR 2019 featured work, including a best paper, on characterizing how networks learn:

  • Achille et al. quantified how image classification networks respond to temporary “visual defects” during the training process.
  • The Lottery Ticket Hypothesis showed that, inside deep networks, both the connection structure and the initialization of sparse subnetworks are critical to achieving high accuracy.

These findings are part of the field’s ongoing aim to speed training and reduce inference costs.

Domain-Aware Loss Function Construction

Several researchers bested prior state-of-the-art performance by incorporating more domain knowledge into their training processes.

  • In language-translation tasks, Wang et al. found combining many, e.g., English→German and German→English agents within a dual formulation to be helpful.
  • Improving prior work in geometrically-inspired embeddings for learning hierarchies, Li et al. annealed the compactness of their concept representations and demonstrated the approach’s utility in an imbalanced WordNet setting.

Enabling Factorized Piano Music Modeling and Generation with the MAESTRO Dataset

Hawthorne et al. of the Google Magenta group have built a massive public dataset for MIDI music that has been recorded by semi-professional pianists. This is a great database for many kinds of sequence learning problems, not just music.

  • The group built a WAV2MIDI2WAV system using Tensorflow that allows researchers to go from waveforms to Pianoroll (live MIDI format) and back again to waveforms.
  • Some music is learned by a specific transformer architecture and then sampled from the system to get closer to producing actual, coherent music.


This article is not an endorsement by Two Sigma of the papers discussed, their viewpoints or the companies discussed. The views expressed above reflect those of the authors and are not necessarily the views of Two Sigma Investments, LP or any of its affiliates (collectively, “Two Sigma”). The information presented above is only for informational and educational purposes and is not an offer to sell or the solicitation of an offer to buy any securities or other instruments. Additionally, the above information is not intended to provide, and should not be relied upon for investment, accounting, legal or tax advice. Two Sigma makes no representations, express or implied, regarding the accuracy or completeness of this information, and the reader accepts all risks in relying on the above information for any purpose whatsoever. Click here for other important disclaimers and disclosures.

Related Articles

This section links out to multiple articles. To read the article, click the headline.