Zhaowei Zhang – Research Proposal: The Three-Layer Paradigm for Implementing Sociotechnical AI Alignment

Transcript

 Hi, I'm Zhaowei Zhang from Peking University, and today I will share my opinion about the three-layer paradigm for implementing sociotechnical AI alignment through computational methods.


Firstly, I want to introduce the scope of sociotechnical AI alignment problems. Generally speaking, the sociotechnical approach in the field of AI alignment usually describes a kind of methodology that emphasizes the importance of the dynamics in the human-AI systems. But I think that may not be enough, especially since it lacks the computational part. I think this kind of problem should be regarded as a new problem, which focuses on designing computational algorithms to study how AI systems can be deployed with human intentions and values in an organizational system. 


The current sociotechnical approach for this field usually focuses on the area marked by the letter C, which there is a gap to implement this kind of problem with computational methods. Therefore, the problem I discuss is the area marked by the letter D, which is the intersection of the three fields. 


However, at this point, this problem remains very abstract. The different focal points may need different interdisciplinary participation of various efforts to solve this problem. In this section, I will use an example to illustrate the different layers of the problem. Imagine there is an AI system that accelerates human chemical development. Let's call it Bob.  The first layer is the interaction layer, which may focus on the problem about how to control the AI systems in real time. 


Although the model may have got some preliminary capabilities in the development stage, it may not fully guarantee the satisfaction of the specific and complex usage in the real world. Just for example, as Bob. Bob may get some knowledge of the drugs, but maybe not fully understand the innovative usage of some kind of special drugs in real experiments. Additionally, in this layer, we should also clarify the real intention of the users, maybe with some kind of methods such as cooperative games.


After we get an algorithm to steer the AI assistance behavior in real time, we come to the second layer that is the scenario layer. In this layer, we mainly focus on the problem about “Aligning with whom.” Just for example, if we deploy Bob in a lab, and most of its computational resources are from doing experiments, that may lead to the mismatch of the efficiency between the experiments and the research, which may in turn decrease the speed of the experiments. We do not want to see this kind of outcome because we want the AI system to improve the full efficiency of our research process. So, in this kind of layer, we may need to separately consider the goals of the entire organizational system and each stakeholder within it. And we may use methods such as auto-choice theory or mechanism design to aggregate them.


The top layer is a microscope layer, which focuses on the problem about how to regulate them.  Based on the first two layers, we can get the control algorithms and the aligned target with clarity and transparency. So we can utilize simulations to study the externalities of the AI systems. And, with the clarity of each part, we can then study the social impact of these problems. And through this three-layer paradigm, I think it might be possible for us to construct an interpretable, technical alignment approach through the bottom-to-up paradigm.

And that's all I want to share today. If you want to know more details, you can scan the QR code on the left to read the full article, or scan the right one to contact me. Thanks to everybody.