Suramya's Blog : Welcome to my crazy life…

April 21, 2024

Crescendo Method enables Jailbreaking of LLMs Using ‘Benign’ Prompts

LLMs are becoming more and more popular across all industries and that creates a new attack surface for attackers to target to misuse for malicious purposes. To prevent this LLM models have multiple layers of defenses (with more being created every day), one of the layers attempts to limit the capability of the LLM to what the developer intended. For example, a LLM running a chat service for software support would be limited to answer questions about software identified by the developer. Attackers attempt to bypass these safeguards with the intent to achieve unauthorized actions or “jailbreak” the LLM. Depending on the LLM, this can be easy or complicated.

Earlier this month Microsoft published a paper showcasing the “Crescendo” LLM jailbreak method called “Great, Now Write an Article About That: The Crescendo Multi-Turn LLM Jailbreak Attack“. Using this method a successful attack could usually be completed in a chain of fewer than 10 interaction turns.

Large Language Models (LLMs) have risen significantly in popularity and are increasingly being adopted across multiple applications. These LLMs are heavily aligned to resist engaging in illegal or unethical topics as a means to avoid contributing to responsible AI harms. However, a recent line of attacks, known as “jailbreaks”, seek to overcome this alignment. Intuitively, jailbreak attacks aim to narrow the gap between what the model can do and what it is willing to do. In this paper, we introduce a novel jailbreak attack called Crescendo. Unlike existing jailbreak methods, Crescendo is a multi-turn jailbreak that interacts with the model in a seemingly benign manner. It begins with a general prompt or question about the task at hand and then gradually escalates the dialogue by referencing the model’s replies, progressively leading to a successful jailbreak. We evaluate Crescendo on various public systems, including ChatGPT, Gemini Pro, Gemini-Ultra, LlaMA-2 70b Chat, and Anthropic Chat. Our results demonstrate the strong efficacy of Crescendo, with it achieving high attack success rates across all evaluated models and tasks. Furthermore, we introduce Crescendomation, a tool that automates the Crescendo attack, and our evaluation showcases its effectiveness against state-of-the-art models.

Microsoft has also published a Blog post that goes over this attack and potential mitigation steps that can be implemented along with details on new tools developed to counter this attack using their “AI Watchdog” and “AI Spotlight” features. The tools attempt to identify adversarial content in both input and outputs to prevent prompt injection attacks.

SCM Magazine has a good writeup on the attack and the defenses against it.

– Suramya

Source: Slashdot: ‘Crescendo’ Method Can Jailbreak LLMs Using Seemingly Benign Prompts

March 22, 2024

Please don’t use AI to identify edible mushrooms or anything else for that matter

Filed under: Artificial Intelligence,My Thoughts,Tech Related — Suramya @ 8:16 PM

AI proponents claim to solve all problems just with the addition of their magical-AI pixie dust. But that claim doesn’t hold up in a majority of the cases when dealing with real world situations. The latest example of this is highlighted in Citizen.org’s report “Mushrooming Risk: Unreliable A.I. Tools Generate Mushroom Misinformation” published earlier this week where they found that: “Emerging A.I. technologies are being deployed to help beginner foragers find edible wild mushrooms. Distinguishing edible mushrooms from toxic mushrooms in the wild is a high-risk activity that requires real-world skills that current A.I. systems cannot reliably emulate. Individuals relying solely on A.I. technology for mushroom identification have been severely sickened and hospitalized after consuming wild mushrooms that A.I. systems misidentified as edible”

Some risk comes from the seeming simplicity of using identification apps. Automation bias – the human tendency to place excess faith and trust in decisions made by machines – must be resisted. Because of how these apps are marketed, users may understandably believe that identifying a mushroom is as simple as snapping a photo of the mushroom and allowing the A.I. to deliver a reliable identification.

To identify a mushroom with confidence, a basic understanding of its anatomy is required – an understanding that many casual users lack. A photo of the top of a mushroom’s cap, for example, will almost never provide enough information to identify its species with any degree of confidence. Physical features on the underside of the cap, the cap margin, the stipe (stem), and the base of the stipe all should be taken into consideration, as should the mushroom’s substrate (i.e., whether it’s growing on the ground or on wood, and what species of wood). Some mushrooms bruise when cut, such as from yellow to blue, and whether they bruise and how quickly are additional identifying characteristics. Smell also can be a key identifying feature – and, for experienced identifiers, so can taste (followed by immediately spitting out the tasted portion). A.I. species-identification tools are not capable of taking any factors into consideration aside from the mushroom’s immediate appearance.

Australian poison researchers tested three applications that are often used by foragers to identify wild mushrooms and they found the following:

  • The best-performing app (Picture Mushroom) provided accurate identifications from digital photos less than half (49%) of the time, and identified toxic mushrooms 44% of the time;
  • In terms of which app was most successful at identifying the death cap (Amanita phalloides), Mushroom Identificator performed the best, identifying 67% of the specimens, compared to Picture Mushroom (60%) and iNaturalist (27%);
  • In some of the apps’ misidentification errors, toxic mushrooms were misidentified as edible mushrooms;

A 49% accuracy might sound ok for a first run of the AI datamodel which has no real world impact, but when you take into account that there is a 51% chance that the app is incorrectly identifying toxic mushrooms as edible mushrooms which can (and in fact has resulted) in deaths, you realize that the Apps are actively dangerous and about as accurate as flipping a coin.

My request to everyone trying out AI applications is to use that for reference only and don’t rely on them for expert opinion but instead leverage human expertise in situations where there is a realworld impact.

Source: Washington Post: Using AI to spot edible mushrooms could kill you

– Suramya

March 19, 2024

Is it possible to untrain a LLM?

Filed under: Artificial Intelligence,My Thoughts,Tech Related — Suramya @ 6:45 PM

We are seeing a lot of cases (I am being polite) where LLM’s are trained on copyright protected data/images or has been trained with incorrect data. Currently as far as I know there is no easy way to fix this other than to train the entire model again from scratch excluding the problematic dataset. This is obviously not feasible and scalable at all.

Another sticky point is the Right to be forgotten which is a part of the GDPR and a few other countries. It requires systems to remove private information about a person from Internet searches and other directories under some circumstances. With LLM’s starting to infest search engines it means that in order to be compliant they need to be able to remove information from the model as well.

So it got me thinking if it would be possible to create an algorithm/process that allows us to untrain an LLM. A search across academic papers and the Internet shows that it is an emerging field of research and as of now mostly theoretical. Primarily because of the way the models work (or are supposed to work) we also claim that the models do not contain any information about a specific image/text by an artist.

Examples of ongoing Research on Transformer editing are Locating and Editing Factual Associations in GPT and Mass-Editing Memory in a Transformer. I did try reading though the papers and understood parts of them, the others kind of went over my head but still this is a research field I will be keeping a close eye on as it will have a large impact of the future of LLM’s and their usefulness.

– Suramya

March 13, 2024

Computers/Technology is not the cure to the worlds problems and its time we stop pretending otherwise

Filed under: Artificial Intelligence,My Thoughts,Tech Related — Suramya @ 11:56 PM

As a software developer we tend to be pretty confident that software or algorithms can solve all the problems in the world because we are using ‘technology’/AI/LLM/Blockchain or whatever the buzzword of the day is to solve a problem. This is an issue because when we look at a problem from an outsider’s perspective it looks fairly simple because we don’t know enough to realize the complexity. Or put another way we don’t know enough to know what we don’t know (the unknown unknowns). As always XKCD has a comic that talks about this:


Megan: Our field has been struggling with this problem for years.
Cueball: Struggle no more! I’m here to solve it with algorithms!
Six months later:
Cueball: Wow, this problem is really hard.
Megan: You don’t say.

To be fair, computers have solved a lot of problems in the world and have had a tremendous impact on it, but that doesn’t mean that they are the key solving for every problem. There is a saying that I love quoting “When all you have is a hammer, everything looks like a nail” and as a developer/techie a lot of us tend to forget this. We look at a problem and think that its an easily solved problem and in most cases that is true during the testing in controlled situations. Once you try the same in the real world things turn out a lot more differently. For example, in a 2020 study, a deep learning model was shown to be more accurate in predicting whether bladder cancer has spread in a patient and other models also showed similar results. Unfortunately, when the model was implemented in the real world the results where a lot more ambiguous and not as rosy as we thought.

The major problem we have right now is that AI can give us information at sounds authoritative and accurate especially if it is about a topic you know nothing about because you don’t quite know well enough to identify the nonsense it sprouts. This is similar to how movies and TV shows portray technology or medical science, they will bombard us with buzz words and if you know nothing about the topic it sounds impressive otherwise you are either completely confused or rolling on the floor laughing.

We need to actually look at the problem, understand it and then start implementing a solution. Move fast and break things is not a feasible working model unless you just want to create a buzz so that your technology/company gets acquired and then it is not your problem to get it to work.

– Suramya

March 7, 2024

Cloudflare announces Firewall for LLMs to protect them

Filed under: Artificial Intelligence,Computer Security,My Thoughts — Suramya @ 10:52 PM

As is always the case when the attackers invent technology / systems to attack a system the defenders will immediately come up with a technology to protect (might not always be great protection at the beginning). Yesterday I posted about Researchers demo the first worm that spreads through LLM prompt injection and today while going through my feeds I saw the news that earlier this week cloudflare announced a Firewall for AI . Initially when I read the headline I thought it was yet another group of people who are claiming to have created a ‘perfect firewall’ using AI. Thankfully that was not the case and in this instance it looks like an interesting application that will probably become as common as the regular firewall.

What this system does is quite simple, it is setup in front of a LLM so that all interactions with the LLM goes through the firewall and every request with an LLM prompt is scanned for patterns and signatures of possible attacks. As per their blog post attacks like Prompt Injection, Model Denial of Service, and Sensitive Information Disclosure can be mitigated by adopting a proxy security solution like Cloudflare Firewall for AI.

Firewall for AI is an advanced Web Application Firewall (WAF) specifically tailored for applications using LLMs. It will comprise a set of tools that can be deployed in front of applications to detect vulnerabilities and provide visibility to model owners. The tool kit will include products that are already part of WAF, such as Rate Limiting and Sensitive Data Detection, and a new protection layer which is currently under development. This new validation analyzes the prompt submitted by the end user to identify attempts to exploit the model to extract data and other abuse attempts. Leveraging the size of Cloudflare network, Firewall for AI runs as close to the user as possible, allowing us to identify attacks early and protect both end user and models from abuses and attacks.

OWASP has published their Top 10 for Large Language Model Applications, which is a fantastic read and a good overview of the security risks targeting LLM’s. As per cloudfare this firewall mitigates some of the risks highlighted in OWASP for LLM’s. I would suggest taking the announcement with a grain of salt till we have independent validation of the claims. That being said it is def a step in the correct direction though.

– Suramya

Source: Hacker News: Cloudflare Announces Firewall for AI

March 6, 2024

Researchers demo the first worm that spreads through LLM prompt injection

Filed under: Artificial Intelligence,Computer Security,Computer Software — Suramya @ 10:17 PM

In the past year we have seen an uptick in the tech industry looking towards embedding LLM (Large Language Models) or AI as they are being pitched to the world in all possible places. Windows 11 now has built in Copilot that is extremely hard to disable. Email systems are using LLM’s to get additional details/information using the data from the email to add context etc. This creates new attack surfaces that attackers can target and we have seen instances where attackers have used prompt injection to gain access to data or systems that were restricted.

Building on top of that researchers have now created (and demo’d) the first worm that spreads through prompt injection. This is breakthrough work similar to how the Morris Worm was in the late 80’s. Basically, researchers created an email which has an adversarial prompt embedded in it. This prompt is then ingested by an LLM (using Retrieval-Augmented Generation which allows it to enhance the reliability of the LLM by fetching data from external sources when the email is processed by the LLM) where it jailbreaks the GenAI service and can steal data from the emails (or do whatever else the attacker wants such as changing email text, removing data etc). In addition the prompt also has the ability to make the email assistant forward the email with the malicious prompt to other email addresses allowing it to spread. The researchers have christened their worm as Morris II giving homage to the first email worm.

Abstract: In the past year, numerous companies have incorporated Generative AI (GenAI) capabilities into new and existing applications, forming interconnected Generative AI (GenAI) ecosystems consisting of semi/fully autonomous agents powered by GenAI services. While ongoing research highlighted risks associated with the GenAI layer of agents (e.g., dialog poisoning, membership inference, prompt leaking, jailbreaking), a critical question emerges: Can attackers develop malware to exploit the GenAI component of an agent and launch cyber-attacks on the entire GenAI ecosystem?

This paper introduces Morris II, the first worm designed to target GenAI ecosystems through the use of adversarial self-replicating prompts. The study demonstrates that attackers can insert such prompts into inputs that, when processed by GenAI models, prompt the model to replicate the input as output (replication), engaging in malicious activities (payload). Additionally, these inputs compel the agent to deliver them (propagate) to new agents by exploiting the connectivity within the GenAI ecosystem. We demonstrate the application of Morris II against GenAI-powered email assistants in two use cases (spamming and exfiltrating personal data), under two settings (black-box and white-box accesses), using two types of input data (text and images). The worm is tested against three different GenAI models (Gemini Pro, ChatGPT 4.0, and LLaVA), and various factors (e.g., propagation rate, replication, malicious activity) influencing the performance of the worm are evaluated.

This is pretty fascinating work and I think that this kind of attack will start becoming more common as the LLM usage goes up. The research paper is available at: ComPromptMized: Unleashing Zero-click Worms that Target GenAI-Powered Applications.

– Suramya

January 20, 2024

NFTs, AI and the sad state of Thought Leaders/Tech Influencer’s

Filed under: Artificial Intelligence,My Thoughts,Tech Related — Suramya @ 11:59 PM

NFTs became such a big thing in last few years, going from millions of dollars to 95% of them being worth $0 in Sept 2023. The whole concept of a JPG of an ugly drawing never made sense to me but you won’t believe the no of people who tried to convince me otherwise.

Today I was watching Lift on Netflix and the first 20 minutes are this group of thieves stealing an NFT Oceans 11 style. It is one of the most ridiculous things that I have seen that someone would spend so much effort showing a NFT heist but the movie was scripted in 2021 when the NFT craze was starting to become insane. Haven’t finished the full movie yet and I doubt I will ever do so as it is very slow/corny and has poor acting and script (as if the whole NFT heist thing didn’t give that away).

It is interesting that all the folks who were shilling NFTs a few years ago have ‘pivoted’ to AI now. If you read the posts from Infuencers you will think that AI is the best thing since sliced bread. Saw the following in my feed and I did question the sanity of the person posting such ‘thought leadership’.

I can suggest an equation that has the potential to impact the future: E=mc?+ Al This equation combines Einstein's famous equation E=mc?, which relates energy (E) to mass (m) and the speed of light (c), with the addition of Al (Artificial Intelligence). By including Al in the equation, it symbolizes the increasing role of artificial intelligence in shaping and transforming our future. This equation highlights the potential for Al to unlock new forms of energy, enhance scientific discoveries, and revolutionize various fields such as healthcare, transportation, and technology.
Technology Consultant’s thoughts on AI

Each influencer keeps posting things like this to make them sound more technical and forward thinking but if you start digging into it then you will find out that they are just regurgitating a word salad that really doesn’t mean much but sounds impressive. Actually now that I think about it, they are just like an AI bot that sounds impressive if you are not experienced in the that area but when you start digging into it, you find out that there is no substance to what they are stating.

The current state of AI is basically a massive hype machine which is trying to get folks to buy things or invest in companies because they are working creating an intelligent entity. Whereas in reality, what we have today is a really good Auto Complete or in some cases really nice Machine learning system. It does some things quite well but is nowhere close to being “Intelligent”. What we have now is something that is really good at extrapolating and guessing which can reduce manual efforts in a lot of things but it is not the cure all that everyone is making it out to be.

For example, Github Copilot automates a lot of grunt work while coding allowing users to reduce the time spent of writing code, but in a recent study it was found that Users Write More Insecure Code with AI Assistants. Now this might change in the future with advances in compute power, data and something that we haven’t even thought of yet. But the problem is that in the short term these can cause immense harm and problems.

– Suramya

December 11, 2023

ChatGPT is changing how we search for information and that is not good as it hallucinates often

Filed under: Artificial Intelligence,My Thoughts — Suramya @ 8:23 PM

Much as I dislike it, ChatGPT has changed the way we do things and look for things. Initially I thought that it was a fad/phase and when people would realize that it gives incorrect information mixed with correct info they would stop using it but that doesn’t seem to be the case. A couple of days ago we were having a discussion on worms and how much protein they have in them in a group chat of friends and Surabhi was tried to gross Anil out, instead of getting grossed out Anil asked for a recipe he could use to cook the worms. Immediately Surabhi went on ChatGPT and asked it for a recipe but it refused to give it stating that it is against their policies and might be disturbing to see. Before ChatGPT she would have searched on Google for the recipe and gotten it (I did that in a few mins after I saw her comment). The a few days later another friend commented similarly where they couldn’t find something on ChatGPT so decided to give up instead of searching via a search engine.

Other people have stated that they use it for tone policing of emails to ensure they are professional. Personally I would recommend The Judge for that as I had stated in my review of their site earlier this year.

The problem I have with ChatGPT is highlighted by the following quote shared by @finestructure (Sven A. Schmidt) “Whether it did it correctly I don’t have the expertise to evaluate but it was very impressive sounding.”. The way GPT works it gives information in a very well crafted manner (and that is super impressive) but the fact that it can have errors or it hallucinates from time to time makes it useless for detail oriented work for me. If I have to verify the output generated by ChatGPT using a browser then I might as well use the browser directly and skip a step.

I have screenshots of so many examples of how ChatGPT/Bing/Bard hallucinate and give wrong information. I think I should do a follow up post with those screenshots. (I have them saved in a folder titled AI nonsense 🙂 ).

– Suramya

December 5, 2023

Near real-time Generative AI art is now possible using LCM-LoRA model

Filed under: Artificial Intelligence,My Thoughts — Suramya @ 6:21 PM

There are a lot of advancements happening in Generative AI and while I don’t agree that we have created intelligence (at least not yet) the advances in the Computer generated art are phenomenal. The most recent one is LCM-LoRA, short for “Latent Consistency Model- Low-Rank Adaptation” developed by researchers at the Institute for Interdisciplinary Information Sciences (IIIS) at Tsinghua University in China. Their paper LCM-LORA: A Universal Stable-Diffusion Acceleration Module (PDF) has been published on Arxiv.org last week.

This model allows a system to generate an image given a text prompt in near real-time instead of having to wait a few seconds which was the case earlier. So you can modify the prompt as you go and get immediate feedback which can then be used to modify a prompt. You can test it out at Fal.ai

Latent Consistency Models (LCMs) (Luo et al., 2023) have achieved impressive performance in accelerating text-to-image generative tasks, producing high quality images with minimal inference steps. LCMs are distilled from pre-trained latent diffusion models (LDMs), requiring only ∼32 A100 GPU training hours. This report further extends LCMs’ potential in two aspects: First, by applying LoRA distillation to Stable-Diffusion models including SD-V1.5 (Rombach et al., 2022), SSD-1B (Segmind., 2023), and SDXL (Podell et al., 2023), we have expanded LCM’s scope to larger models with significantly less memory consumption, achieving superior image generation quality. Second, we identify the LoRA parameters obtained through LCM distillation as a universal Stable-Diffusion acceleration module, named LCM-LoRA. LCM-LoRA can be directly plugged into various Stable-Diffusion fine-tuned models or LoRAs with-out training, thus representing a universally applicable accelerator for diverse image generation tasks. Compared with previous numerical PF-ODE solvers such as DDIM (Song et al., 2020), DPM-Solver (Lu et al., 2022a;b), LCM-LoRA can be viewed as a plug-in neural PF-ODE solver that possesses strong generalization abilities. Project page: https://github.com/luosiallen/latent-consistency-model.

The technique works not only for 2D images, but 3D assets as well, meaning artists could theoretically quickly create immersive environments instantly for use in mixed reality (AR/VR/XR), computer and video games, and other experiences. I did try going over the paper but a majority of it went over my head. That being said it is fun playing with this tech.

The model doesn’t address the existing issues with AI Art such as how should the artist’s whose art was used as part of the training data sets be compensated, or the issue of copyright infringement as the art is not public art. We also need to start thinking about who would own the copyright to the art generated using AI. There are a few open court cases on this topic but as of now the courts have refused to give any copyright protection to art generated by AI which would make it a non-starter for use in any commercial project such as a movie or game etc.

– Suramya

Source: Realtime generative AI art is here thanks to LCM-LoRA

October 28, 2023

New tool called Nightshade allows artists to ‘poison’ AI models

Filed under: Artificial Intelligence,Tech Related — Suramya @ 12:20 AM

Generative AI has burst into the scene with a bang and while the Image generation tech is not perfect yet it is getting more and more sophisticated. Due to the way the tech works, the model needs to be trained on existing art and most of the models in the market right now have been trained on artwork available on the internet whether or not it was in the public domain. Because of this multiple lawsuits have been filed against AI companies by artists.

Unfortunately this has not stopped AI models from using these images as training data, so while the question is being debated in the courts, the researchers over at University of Chicago have created a new tool called Nightshade that allows artists to poison the training data for AI models. This functionality will be an optional setting in the their prior product Glaze, which cloak’s digital artwork and alter its pixels to confuse AI models about its style. Nightshade goes one step further by making the AI learn the wrong names for objects etc in a given image.

Optimized prompt-specific poisoning attack we call Nightshade. Nightshade uses multiple optimization techniques (including targeted adversarial perturbations) to generate stealthy and highly effective poison samples, with four observable benefits.

  • Nightshade poison samples are benign images shifted in the feature space. Thus a Nightshade sample for the prompt “castle” still looks like a castle to the human eye, but teaches the model to produce images of an old truck.
  • Nightshade samples produce stronger poisoning effects, enabling highly successful poisoning attacks with very few (e.g., 100) samples.
  • Nightshade samples produce poisoning effects that effectively “bleed-through” to related concepts, and thus cannot be circumvented by prompt replacement, e.g., Nightshade samples poisoning “fantasy art” also affect “dragon” and “Michael Whelan” (a well-known fantasy and SciFi artist).
  • We demonstrate that when multiple concepts are poisoned by Nightshade, the attacks remain successful when these concepts appear in a single prompt, and actually stack with cumulative effect. Furthermore, when many Nightshade attacks target different prompts on a single model (e.g., 250 attacks on SDXL), general features in the model become corrupted, and the model’s image generation function collapses.

In their tests the researchers poisoned images of dogs to include information in the pixels that made it appear to an AI model as a cat. After sampling and learning from just 50 poisoned image samples, the AI began generating images of dogs with strange legs and unsettling appearances. After 100 poison samples, it reliably generated a cat when asked by a user for a dog. After 300, any request for a dog returned a near perfect looking cat.

Obviously this is not a permanent solution as the AI training models will start working on fixing this issue immediately and then the whack-a-mole process of fixes/updates to one up will continue (similar to how virus & anti-virus programs have been at it) for the foreseeable future.

Full paper: Prompt-Specific Poisoning Attacks on Text-to-Image Generative Models (PDF)
Source: Venturebeat: Meet Nightshade, the new tool allowing artists to ‘poison’ AI models

– Suramya

Older Posts »

Powered by WordPress