Vienna Alignment Workshop

21 July 2024 | Vienna, Austria

The Vienna Alignment Workshop brought together researchers and leaders from around the world, facilitating a better understanding of potential risks from advanced AI systems and strategies for solving them. Machine learning researchers from academia, industry, governments, and nonprofits come together to debate and discuss current issues in AI safety, including topics within Guaranteed Safe AI & Robustness, Interpretability, and Governance & Evaluations. Leveraging the convening power of ICML, the Vienna Alignment Workshop is held the day before ICML. A day after the event, the participant-led Monday Unconference enabled attendees to strengthen connections made during the workshop.

As part of the Alignment Workshop Series, talks at Vienna built on the content from our highly-rated events in San Francisco and New Orleans (2023), sharing progress in the field since then. Given our positive experience with an invite-only model, around 150 attendees were hand-selected by our program committee. For those who were unable to attend, presentations are now freely available online on our YouTube channel, where we build a large and publicly accessible repository of high-quality content about AI Alignment.

Keynote Speaker: Jan Leike
Supervising AI on hard tasks

Change the size of the slides:

Watch this talk in full-screen and read the transcript here.

Alignment Workshop Speakers

Introduction and Panel

Topic: Current Issues in AI Safety

Panelists: Victoria Krakovna, David Krueger, Gillian Hadfield, Robert Trager

Moderator: Adam Gleave

Guaranteed Safe AI and Robustness

Stuart Russell - "AI: What if we succeed?"
Nicholas Carlini - "Some Lessons from Adversarial Machine Learning"

Interpretability

Neel Nanda - "Mechanistic Interpretability: A Whirlwind Tour"
David Bau - "Resilience and Interpretability"

Governance and Evaluations

Helen Toner - "Governance for advanced general-purpose AI: Status check, hurdles, & next steps"
Mary Phuong - "Dangerous capability evals: Basis for frontier safety"

Keynote

Jan Leike - "Supervising AI on hard tasks"

Lightning Talks - AM

Aditya Gopalan - Towards reliable alignment: Uncertainty-aware RLHF
Oliver Klingefjord - What are human values, and how do we align AI to them?
Vincent Conitzer - Social choice for AI alignment
Stephen Casper - Generalized Adversarial Training and Testing
Dmitrii Krasheninnikov - Stress-Testing Capability Elicitation With Password-Locked Models

Lightning Talks - PM

Jelena Luketina & Herbie Bradley - An Update from the UK AI Safety Institute
Ben Bucknall - Open Problems in Technical AI Governance
Zhaowei Zhang - Research Proposal: The Three-Layer Paradigm for Implementing Sociotechnical AI Alignment
Alex Tamkin - Societal Impacts Research at Anthropic: Recent Directions
Vikrant Varma - Challenges with unsupervised LLM knowledge discovery
Sophie Bridgers - Scalable Oversight: A Rater Assistance Approach

Monday Unconference (July 22)

The Monday Unconference on July 22nd from 9:30-6 was a participant-led, unconference-style optional event for Alignment Workshop participants. It included lightning talks on research, informal topic sessions, 1-1s, speed networking, breakout sessions, and coworking.