Atoosa Kasirzadeh - Value pluralism and AI value alignment

Transcript

Hello everyone. I'm going to talk about value pluralism and AI value alignment. In the space of AI value alignment, there are some recent efforts that try to prioritize value pluralism. When you read these papers, there is not much concern about safety questions, but the kind of concerns are more like “how can we bring the perspectives of a variety of different people into the AI value alignment problem?”. Here is a sample of a variety of different works that try to bridge the idea of value pluralism to the AI value alignment problem. Sometimes they go under different names like collective alignment, democratic alignment, pluralistic alignment, but they all share this vision of “let's bring in value pluralism to AI value alignment”. 


And after reviewing these papers …these projects start as an attempt to do some empirical work on value pluralism or AI value alignment. But after I reviewed all of these efforts, I just figured out that a lot of these efforts are not at all rooted in any theoretical understanding of value pluralism. Value pluralism is an area that is very well studied by psychologists, economists, anthropologists. In order to be able to actually compare all of these efforts to one another and say that this is a better approach to do pluralistic value alignment, we need to root them in various different theories.


And if we don't do that, I think unfortunately we're going to end up doing many other things like “pluralistic washings”, where we are just trying to somehow bring in some general perspectives into the value alignments, without really engaging with deep questions about theories of collectivity, democracy, and pluralism.


So, in order to think about all of these normative questions that developers - if they want to admit it or not - are dealing with when they are building the pluralistic AI value alignment, I developed this two-tier level approach to think about pluralistic alignment. This is informed by reviewing a variety of different theories, being developed by psychologists and anthropologists and economists, the way they thought about value pluralism, how we have to do value pluralism.


The schema might sound a bit complicated, but actually it's not. The short summary of it is that when we do pluralistic value alignment approaches, we make decisions at two different levels.


One of these levels, tier-one, is in relation to decisions about “What do we mean by values?” Do we mean helpfulness, harmlessness, honesty, or do we mean a different set of criteria? And which kind of representative population is gonna tell us what those values are? How do we measure them and represent them? Are we doing pairwise comparison, preference measurements, or do we need to go to principles and just do principle-based approaches?


Then there are questions about how we're gonna aggregate and do conflict resolution. So for each of these value decisions, there are many different options. And when you look at these various different, very interesting efforts, it seems like there's a little bit of randomness in how some of those choices are picked.


But indeed, those choices are picked according to some implicit assumptions a developer makes about what is a legitimate way to choose each of these choices at the first level. And so these legitimate choices are just really implicit in the development of these approaches.


And here is a specific sample of various different ways in which we can choose. Or basically we can choose them as the ground for the legitimacy of the value decisions that we make. And different papers choose different approaches, and there's not much of a justification about why one approach is chosen over another.


And actually, I don't have time to present, but I did some computational analysis of some fine-tuning datasets, just to look at the specific topics that are picked. And it seems like we might be able to bring in interesting theories: value theories from psychology, like moral foundation theory, or Schwarz's theory of basic human values.


These give us an interesting hierarchy of what we mean by values and how we can encode the values into the models instead of just asking people “tell us what do you prefer” or “how do you think about helpfulness” and trying to bring those. So I think for each of these aspects we can bring in a lot of theoretical knowledge into how we think about these questions.


I'm gonna stop here. Thank you.