Vienna Alignment Workshop

21 July 2024 - Vienna, Austria

The Vienna Alignment Workshop aims to bring together researchers and leaders to better understand potential risks from advanced AI systems and strategies for solving them. Machine learning researchers from academia, industry, governments, and nonprofit come together to debate and discuss current issues in AI safety, including topics within Guaranteed Safe AI & Robustness, Interpretability, and Governance & Evaluations. Leveraging the convening power of ICML, the Vienna Alignment Workshop is held the day before ICML. A day after the event, the participant-led Monday Unconference will take place to strengthen connections made during the workshop.

As part of the Alignment Workshop Series, talks at the Vienna Alignment Workshop build on the content from our highly-rated events in San Francisco and New Orleans (2023), sharing progress in the field since then. Given our positive experience with an invite-only model, around 150 attendees have been hand-selected by our program committee. Presentations will be made freely available online on our YouTube channel, where we build a large and publicly accessible repository of high-quality content which we also disseminate via social media.

All ICML attendees are welcome to attend our Vienna Alignment Workshop Open Social taking place on Sunday night after the event!

Alignment Workshop Speakers

Current Issues in AI Safety (Introduction and Panel)

Moderator: Adam Gleave

Panelists: Victoria Krakovna, David Krueger, Gillian Hadfield, Robert Trager


Guaranteed Safe AI and Robustness

Stuart Russell - "AI: What if we succeed?"
Nicholas Carlini - "Some Lessons from Adversarial Machine Learning"

Interpretability

Neel Nanda - "Mechanistic Interpretability: A Whirlwind Tour"
David Bau - "Resilience and Interpretability"


Governance and Evaluations

Helen Toner - "Governance for advanced general-purpose AI: Status check, hurdles, & next steps"
Mary Phuong - "Dangerous capability evals: Basis for frontier safety"


Keynote

Jan Leike - "Supervising AI on hard tasks"

Lightning Talks - AM

Aditya Gopalan - Towards reliable alignment: Uncertainty-aware RLHF
Oliver Klingefjord - What are human values, and how do we align AI to them?
Vincent Conitzer - Social choice for AI alignment
Stephen Casper - Generalized Adversarial Training and Testing
Dmitrii Krasheninnikov - Stress-Testing Capability Elicitation With Password-Locked Models

Lightning Talks - PM

Jelena Luketina & Herbie Bradley - An Update from the UK AI Safety Institute
Ben Bucknall - Open Problems in Technical AI Governance
Zhaowei Zhang - Research Proposal:  The Three-Layer Paradigm for Implementing Sociotechnical AI Alignment
Alex Tamkin - Societal Impacts Research at Anthropic: Recent Directions
Vikrant Varma - Challenges with unsupervised LLM knowledge discovery
Sophie Bridgers - Scalable Oversight: A Rater Assistance Approach

Monday Unconference (July 22)

The Monday Unconference on July 22nd from 9:30-6 is a participant-led, unconference-style optional event for Alignment Workshop participants. With extra time for lightning talks on research, informal topic sessions, 1-1s, speed networking, breakout sessions, and coworking, participants can build on connections they made during the Vienna Alignment Workshop the day before.

Program Committee

Brad Knox
Professor, UT Austin

Mary Phuong Research Scientist, Google DeepMind

Nitarshan Rajkumar Co-founder, UK AISI

Robert Trager
Co-Director, Oxford Martin AI Governance Initiative

Adam Gleave
Founder, FAR AI

YouTubeLinkedInLink