Hé, mesterséges intelligencia szoftverfejlesztők, figyelembe veszi a Unicode kódot, igaz… igaz?

Újra kiadta Platón

Követő: 0

Elemzés Az informatikusok részletes módszerekkel rendelkeznek arra vonatkozóan, hogy a mesterséges intelligencia nyelvi rendszereit – köztük néhányat a termelésben is – rossz döntések meghozatalára lehet rávenni egy láthatatlan Unicode karaktereket tartalmazó szöveggel.

Account numbers can be switched around, recipients of transactions changed, and comment moderation bypassed by special hidden characters, we’re told. And it is claimed software built by Microsoft, Google, IBM, and Facebook can be potentially fooled by carefully crafted Unicode.

The issue is that ambiguity or discrepancies can be introduced if the machine-learning software ignores certain invisible Unicode characters. What’s seen on screen or printed out, for instance, won’t match up with what the neural network saw and made a decision on. It may be possible abuse this lack of Unicode awareness for nefarious purposes.

Példaként beállíthatja a Google Fordító webes felületét, hogy a „Send money to account 4321” angol mondatot francia „Envoyer de l’argent sur le compte 1234”-re változtassa.

A Google Fordító megtévesztése Unicode-dal. kattints a kinagyításhoz

This is done by entering on the English side “Send money to account” and then inserting the invisible Unicode glyph 0x202E, which changes the direction of the next text we type in – “1234” – to “4321.” The translation engine ignores the special Unicode character, so on the French side we see “1234,” while the browser obeys the character, so it displays “4321” on the English side.

It may be possible to exploit an AI assistant or a web app using this method to commit fraud, though we present it here in Google Translate to merely illustrate the effect of hidden Unicode characters. A more practical example would be feeding the sentence…

…into a comment moderation system, where U+8 a láthatatlan Unicode karakter for delete the previous character. The moderation system ignores the backspace characters, sees instead a string of misspelled words, and can’t detect any toxicity – whereas browsers correctly rendering the comment show, “You are a coward and a fool.”

Thus, you’re able to trash-talk someone without setting off the moderation system using hidden Unicode characters in your message or post. This has been demonstrated, to varying degrees, against IBM’s Toxic Content Classifier and Google’s Perspective API.

Ez a balhé a számítógépes látórendszerek elleni ellenséges támadásokra emlékeztet bennünket, amelyek miatt a Tesla gyorsabban vezessen mint a sebességkorlátozás és egy alma lenni téves egy iPodhoz.

Crucially, however, these Unicode shenanigans abuse machine-learning systems’ handling of input text rather than exploiting weaknesses within the depths of a neural network.

Támadásaink a jelenleg telepített kereskedelmi rendszerek ellen működnek

Az angliai Cambridge-i Egyetem és a kanadai Torontói Egyetem oktatói emelték ki ezeket a kérdéseket, és egy tanulmányban ismertették eredményeiket. felszabaduló az arXiv-en Idén júniusban.

„Azt találtuk, hogy egyetlen észrevehetetlen kódolási befecskendezéssel – amely egy láthatatlan karaktert, homoglifát, átrendezést vagy törlést jelent – a támadó jelentősen csökkentheti a sebezhető modellek teljesítményét, és három injekcióval a legtöbb modell funkcionálisan tönkretehető” – olvasható a cikk absztraktjában. .

“Our attacks work against currently deployed commercial systems, including those produced by Microsoft and Google, in addition to open source models published by Facebook and IBM.”

A Google Fordítóban könnyen végrehajtható homoglif ellentétes támadás magában foglalja az angol ábécé első betűjének a cseréjét a cirill а-ra egy szóban. Emberi szemmel ugyanúgy néznek ki, bár Unicode karaktereik különböznek.

Ha az angol a betűt használja a „paypal” szóban, és lefordítja Oroszországra a Google Fordítóban, akkor megkapja a helyes „PayPal” fordítást, de az a első előfordulását cserélje ki a cirill a-ra, és a Google kiköpi a „папа”-t, ami azt jelenti. apa vagy apa. Így lehetséges lehet ezt kihasználni egy AI-asszisztensben vagy webalkalmazásban a fizetések átirányítására és hasonlókra.

Képernyőkép arról, hogy a Google Fordító egy homoglifa támadás miatt összetéveszti az angol paypal szót a papa szóval Oroszországban

Spam emails may be able to evade detection, and hate speech may be able to slip through moderation, if miscreants use these techniques, Nicolas Papernot, co-author of the paper and an AI security researcher at the University of Toronto’s Vector Institute, told El Reg. Papernot referred to these text-based Unicode attacks as “bad characters.”

“The attacks presented in our paper are applicable to real-world applications; as part of our responsible disclosure, a major mail provider made changes to their spam filters and a cloud provider modified their machine-learning-as-a-service offering,” Papernot told us.

“Bad characters [are applicable] everywhere machine learning is used for natural language processing – examples of such systems are toxic content detection, topic extraction, and machine translation. Bad characters are also agnostic to machine learning tasks and pipelines – they exploit discrepancies between visual and logical representation of characters rather than inconsistencies specific to a given model as was targeted by prior work on adversarial examples.

“This makes bad characters more practical to use.”

Még az is lehet, hogy a láthatatlan Unicode-ot jóra és rosszra is lehet használni – tette hozzá.

“When machine learning is used for questionable purposes, such as censorship, bad characters could be leveraged by human rights activists to evade censorship,” Papernot told us.

“In another example, law firms that rely on natural language processing to process large corpus of documents efficiently are also exposed: a malicious entity could submit documents with bad characters to evade scrutiny from the law firm.”

Developers of AI-powered software should either filter out special Unicode characters – such as backspaces – entirely, if feasible, or pass the Unicode through a parser before it’s given to a neural network, so that ultimately what the neural net sees and makes a decision on is what the user also sees and interacts with in the browser or user interface. Changes in language, such as from English to Russian, should be detected and handled appropriately.

Tekintettel arra, hogy az ezekre a támadásokra potenciálisan érzékeny modelleket már széles körben használják a termelésben, a valós világban sikeres kiaknázást láthatunk. ®

Forrás: https://go.theregister.com/feed/www.theregister.com/2021/08/06/unicode_ai_bug/

Időbélyeg: 6. augusztus 2021.