What is 'Superalignment'?

It's probably no secret that the technological singularity has led to the rise of superintelligence, the ability to outperform all human intelligence combined. Scientists have been wondering: "Is it possible for humans to oversee superintelligence that is better than human?" The goal of Superalignment is to ensure that this superintelligence is aligned with and adheres to human values and intentions. And on December 14, 2023, OPEN AI published a new study on just that.

Link to OpenAI document

Generalization of the superalignment problem

At the core of the research idea was to extrapolate the relationship between humans and "superintelligence" to the technology that exists today. The idea was to explore whether smaller, less capable AI models, or "weak supervisors," could effectively supervise larger, more capable AI models. The researchers conducted an experiment to see if GPT-2 could control GPT-4. If GPT-2 could effectively supervise GPT-4, the hypothesis was that humans could control AI.

Generalization of superalignment

What is The result

In conclusion, the researchers found that GPT-2 supervised GPT-4 to perform at the level of GPT-3 or GPT-3.5, meaning that the "weak supervisor" was able to drive some (or, in their words, "most") of the performance of the parent model. Of course, these results do not prove that humans can supervise AI, and the researchers recognize this. However, the researchers say that their findings pave the way for the development of new methodologies and technologies for superalignment research in the future, and they plan to conduct further experiments.