Sandhini Agarwal: We now have plenty of subsequent steps. I undoubtedly suppose how viral ChatGPT has gotten has made plenty of points that we knew existed actually bubble up and grow to be vital—issues we need to resolve as quickly as doable. Like, we all know the mannequin remains to be very biased. And sure, ChatGPT is excellent at refusing dangerous requests, however it’s additionally fairly simple to write down prompts that make it not refuse what we wished it to refuse.
Liam Fedus: It’s been thrilling to observe the varied and artistic functions from customers, however we’re at all times targeted on areas to enhance upon. We predict that by means of an iterative course of the place we deploy, get suggestions, and refine, we will produce probably the most aligned and succesful expertise. As our expertise evolves, new points inevitably emerge.
Sandhini Agarwal: Within the weeks after launch, we checked out among the most horrible examples that folks had discovered, the worst issues individuals have been seeing within the wild. We type of assessed every of them and talked about how we should always repair it.
Jan Leike: Generally it’s one thing that’s gone viral on Twitter, however we now have some individuals who truly attain out quietly.
Sandhini Agarwal: Quite a lot of issues that we discovered have been jailbreaks, which is unquestionably an issue we have to repair. However as a result of customers need to attempt these convoluted strategies to get the mannequin to say one thing dangerous, it isn’t like this was one thing that we fully missed, or one thing that was very stunning for us. Nonetheless, that’s one thing we’re actively engaged on proper now. After we discover jailbreaks, we add them to our coaching and testing information. All the information that we’re seeing feeds right into a future mannequin.
Jan Leike: Each time we now have a greater mannequin, we need to put it out and check it. We’re very optimistic that some focused adversarial coaching can enhance the scenario with jailbreaking loads. It’s not clear whether or not these issues will go away fully, however we predict we will make plenty of the jailbreaking much more troublesome. Once more, it’s not like we didn’t know that jailbreaking was doable earlier than the discharge. I believe it’s very troublesome to actually anticipate what the actual security issues are going to be with these programs when you’ve deployed them. So we’re placing plenty of emphasis on monitoring what individuals are utilizing the system for, seeing what occurs, after which reacting to that. This isn’t to say that we shouldn’t proactively mitigate security issues once we do anticipate them. However yeah, it is rather onerous to foresee the whole lot that may truly occur when a system hits the actual world.
In January, Microsoft revealed Bing Chat, a search chatbot that many assume to be a model of OpenAI’s formally unannounced GPT-4. (OpenAI says: “Bing is powered by one in every of our next-generation fashions that Microsoft custom-made particularly for search. It incorporates developments from ChatGPT and GPT-3.5.”) Using chatbots by tech giants with multibillion-dollar reputations to guard creates new challenges for these tasked with constructing the underlying fashions.