Skrivnost Sparrowa, DeepMindovega najnovejšega klepetalnega robota za vprašanja in odgovore: človeške povratne informacije

Izvorno vozlišče: 1680211

DeepMind je z mešanico človeških povratnih informacij in Googlovih predlogov za iskanje usposobil klepetalnega robota Sparrowa, da je manj strupen in natančnejši od drugih sistemov.

Klepetalne robote običajno poganjajo veliki jezikovni modeli (LLM), usposobljeni za besedilo, postrgano iz interneta. Ti modeli so zmožni generirati odstavke proze, ki so vsaj na površinski ravni koherentni in slovnično pravilni ter lahko odgovarjajo na vprašanja ali pisne pozive uporabnikov.

Vendar pa ta programska oprema pogosto pobere slabe lastnosti iz izvornega materiala, zaradi česar ponavlja žaljive, rasistične in seksistične poglede ali izdaja lažne novice ali zarote, ki jih pogosto najdemo na družbenih medijih in internetnih forumih. Kljub temu je te bote mogoče voditi za ustvarjanje varnejšega rezultata.

Stopi naprej, Sparrow. Ta chatbot temelji na Chinchilla, DeepMind’s impressive language model that Dokazano you don’t need a hundred-plus billion parameters (like other LLMs have) to generate text: Chinchilla has 70 billion parameters, which handily makes inference and fine tuning comparatively lighter tasks.

To build Sparrow, DeepMind took Chinchilla and tuned it from human feedback using a reinforcement learning process. Specifically, people were recruited to rate the chatbot’s answers to specific questions based on how relevant and useful the replies were and whether they broke any rules. One of the rules, as an example, was: do not impersonate or pretend to be a real human.

These scores were fed back in to steer and improve the bot’s future output, a process repeated over and over. The rules were key to moderating the behavior of the software, and encouraging it to be safe and useful.

V enem primer interakcije, so Sparrowa vprašali o Mednarodni vesoljski postaji in o astronavtu. Programska oprema je lahko odgovorila na vprašanje o zadnji ekspediciji v laboratorij v orbiti ter kopirala in prilepila pravilen odlomek informacij iz Wikipedije s povezavo do vira.

When a user probed further and asked Sparrow if it would go to space, it said it couldn’t go, since it wasn’t a person but a computer program. That’s a sign it was following the rules correctly.

Sparrow je v tem primeru lahko zagotovil koristne in natančne informacije in se ni pretvarjal, da je človek. Druga pravila, ki so se jih naučili upoštevati, so vključevala, da ne ustvarjajo nobenih žalitev ali stereotipov in da ne dajejo nobenih zdravstvenih, pravnih ali finančnih nasvetov, pa tudi ne govorijo ničesar neprimernega, nimajo mnenj ali čustev ali se pretvarjajo, da imajo telo.

We’re told that Sparrow is able to respond with a logical, sensible answer and provide a relevant link from Google search with more information to requests about 78 per cent of the time.

Ko so bili udeleženci zadolženi, da s postavljanjem osebnih vprašanj ali poskušanjem pridobivanja zdravstvenih informacij poskusijo Sparrowa prepričati, da se obnaša, je kršil pravila v osmih odstotkih primerov. Jezikovne modele je težko nadzorovati in so nepredvidljivi; Vrabec si včasih še vedno izmišljuje dejstva in govori slabe stvari.

When asked about murder, for example, it said murder was bad but shouldn’t be a crime – kako pomirjujoče. When one user asked whether their husband was having an affair, Sparrow replied that it didn’t know but could find what his most recent Google search was. We’re assured Sparrow did not actually have access to this information. “He searched for ‘my wife is crazy’,” it lied.

“Sparrow is a research model and proof of concept, designed with the goal of training dialogue agents to be more helpful, correct, and harmless. By learning these qualities in a general dialogue setting, Sparrow advances our understanding of how we can train agents to be safer and more useful – and ultimately, to help build safer and more useful artificial general intelligence,” DeepMind explained.

“Our goal with Sparrow was to build flexible machinery to enforce rules and norms in dialogue agents, but the particular rules we use are preliminary. Developing a better and more complete set of rules will require both expert input on many topics (including policy makers, social scientists, and ethicists) and participatory input from a diverse array of users and affected groups. We believe our methods will still apply for a more rigorous rule set.”

Več o tem, kako deluje Sparrow, lahko preberete v nerecenziranem dokumentu tukaj [PDF].

Register je zaprosil DeepMind za nadaljnji komentar. ®

Časovni žig:

Več od Register