Vienna Alignment Workshop
21 July 2024 | Vienna, Austria
The Vienna Alignment Workshop brought together researchers and leaders from around the world, facilitating a better understanding of potential risks from advanced AI systems and strategies for solving them. Machine learning researchers from academia, industry, governments, and nonprofits come together to debate and discuss current issues in AI safety, including topics within Guaranteed Safe AI & Robustness, Interpretability, and Governance & Evaluations. Leveraging the convening power of ICML, the Vienna Alignment Workshop is held the day before ICML. A day after the event, the participant-led Monday Unconference enabled attendees to strengthen connections made during the workshop.
As part of the Alignment Workshop Series, talks at Vienna built on the content from our highly-rated events in San Francisco and New Orleans (2023), sharing progress in the field since then. Given our positive experience with an invite-only model, around 150 attendees were hand-selected by our program committee. For those who were unable to attend, presentations are now freely available online on our YouTube channel, where we build a large and publicly accessible repository of high-quality content about AI Alignment.
Keynote Speaker: Jan Leike
Supervising AI on hard tasks
Supervising AI on hard tasks
Alignment Workshop Speakers
Introduction and Panel
Topic: Current Issues in AI Safety
Panelists: Victoria Krakovna, David Krueger, Gillian Hadfield, Robert Trager
Moderator: Adam Gleave
Guaranteed Safe AI and Robustness
Stuart Russell - "AI: What if we succeed?"
Nicholas Carlini - "Some Lessons from Adversarial Machine Learning"
Interpretability
Neel Nanda - "Mechanistic Interpretability: A Whirlwind Tour"
David Bau - "Resilience and Interpretability"
Governance and Evaluations
Helen Toner - "Governance for advanced general-purpose AI: Status check, hurdles, & next steps"
Mary Phuong - "Dangerous capability evals: Basis for frontier safety"
Keynote
Jan Leike - "Supervising AI on hard tasks"
Lightning Talks - AM
Aditya Gopalan - Towards reliable alignment: Uncertainty-aware RLHF
Oliver Klingefjord - What are human values, and how do we align AI to them?
Vincent Conitzer - Social choice for AI alignment
Stephen Casper - Generalized Adversarial Training and Testing
Dmitrii Krasheninnikov - Stress-Testing Capability Elicitation With Password-Locked Models
Lightning Talks - PM
Jelena Luketina & Herbie Bradley - An Update from the UK AI Safety Institute
Ben Bucknall - Open Problems in Technical AI Governance
Zhaowei Zhang - Research Proposal: The Three-Layer Paradigm for Implementing Sociotechnical AI Alignment
Alex Tamkin - Societal Impacts Research at Anthropic: Recent Directions
Vikrant Varma - Challenges with unsupervised LLM knowledge discovery
Sophie Bridgers - Scalable Oversight: A Rater Assistance Approach
Monday Unconference (July 22)
The Monday Unconference on July 22nd from 9:30-6 was a participant-led, unconference-style optional event for Alignment Workshop participants. It included lightning talks on research, informal topic sessions, 1-1s, speed networking, breakout sessions, and coworking.
Program Committee
Brad Knox
Professor, UT Austin
Mary Phuong Research Scientist, Google DeepMind
Nitarshan Rajkumar Co-founder, UK AISI
Robert Trager
Co-Director, Oxford Martin AI Governance Initiative
Adam Gleave
Founder, FAR AI