Generative AI has burst into the scene with a bang and while the Image generation tech is not perfect yet it is getting more and more sophisticated. Due to the way the tech works, the model needs to be trained on existing art and most of the models in the market right now have been trained on artwork available on the internet whether or not it was in the public domain. Because of this multiple lawsuits have been filed against AI companies by artists.
Unfortunately this has not stopped AI models from using these images as training data, so while the question is being debated in the courts, the researchers over at University of Chicago have created a new tool called Nightshade that allows artists to poison the training data for AI models. This functionality will be an optional setting in the their prior product Glaze, which cloak’s digital artwork and alter its pixels to confuse AI models about its style. Nightshade goes one step further by making the AI learn the wrong names for objects etc in a given image.
Optimized prompt-specific poisoning attack we call Nightshade. Nightshade uses multiple optimization techniques (including targeted adversarial perturbations) to generate stealthy and highly effective poison samples, with four observable benefits.
- Nightshade poison samples are benign images shifted in the feature space. Thus a Nightshade sample for the prompt “castle” still looks like a castle to the human eye, but teaches the model to produce images of an old truck.
- Nightshade samples produce stronger poisoning effects, enabling highly successful poisoning attacks with very few (e.g., 100) samples.
- Nightshade samples produce poisoning effects that effectively “bleed-through” to related concepts, and thus cannot be circumvented by prompt replacement, e.g., Nightshade samples poisoning “fantasy art” also affect “dragon” and “Michael Whelan” (a well-known fantasy and SciFi artist).
- We demonstrate that when multiple concepts are poisoned by Nightshade, the attacks remain successful when these concepts appear in a single prompt, and actually stack with cumulative effect. Furthermore, when many Nightshade attacks target different prompts on a single model (e.g., 250 attacks on SDXL), general features in the model become corrupted, and the model’s image generation function collapses.
In their tests the researchers poisoned images of dogs to include information in the pixels that made it appear to an AI model as a cat. After sampling and learning from just 50 poisoned image samples, the AI began generating images of dogs with strange legs and unsettling appearances. After 100 poison samples, it reliably generated a cat when asked by a user for a dog. After 300, any request for a dog returned a near perfect looking cat.
Obviously this is not a permanent solution as the AI training models will start working on fixing this issue immediately and then the whack-a-mole process of fixes/updates to one up will continue (similar to how virus & anti-virus programs have been at it) for the foreseeable future.
Full paper: Prompt-Specific Poisoning Attacks on Text-to-Image Generative Models (PDF)
Source: Venturebeat: Meet Nightshade, the new tool allowing artists to ‘poison’ AI models
– Suramya