Suramya's Blog : Welcome to my crazy life…

April 21, 2024

Crescendo Method enables Jailbreaking of LLMs Using ‘Benign’ Prompts

LLMs are becoming more and more popular across all industries and that creates a new attack surface for attackers to target to misuse for malicious purposes. To prevent this LLM models have multiple layers of defenses (with more being created every day), one of the layers attempts to limit the capability of the LLM to what the developer intended. For example, a LLM running a chat service for software support would be limited to answer questions about software identified by the developer. Attackers attempt to bypass these safeguards with the intent to achieve unauthorized actions or “jailbreak” the LLM. Depending on the LLM, this can be easy or complicated.

Earlier this month Microsoft published a paper showcasing the “Crescendo” LLM jailbreak method called “Great, Now Write an Article About That: The Crescendo Multi-Turn LLM Jailbreak Attack“. Using this method a successful attack could usually be completed in a chain of fewer than 10 interaction turns.

Large Language Models (LLMs) have risen significantly in popularity and are increasingly being adopted across multiple applications. These LLMs are heavily aligned to resist engaging in illegal or unethical topics as a means to avoid contributing to responsible AI harms. However, a recent line of attacks, known as “jailbreaks”, seek to overcome this alignment. Intuitively, jailbreak attacks aim to narrow the gap between what the model can do and what it is willing to do. In this paper, we introduce a novel jailbreak attack called Crescendo. Unlike existing jailbreak methods, Crescendo is a multi-turn jailbreak that interacts with the model in a seemingly benign manner. It begins with a general prompt or question about the task at hand and then gradually escalates the dialogue by referencing the model’s replies, progressively leading to a successful jailbreak. We evaluate Crescendo on various public systems, including ChatGPT, Gemini Pro, Gemini-Ultra, LlaMA-2 70b Chat, and Anthropic Chat. Our results demonstrate the strong efficacy of Crescendo, with it achieving high attack success rates across all evaluated models and tasks. Furthermore, we introduce Crescendomation, a tool that automates the Crescendo attack, and our evaluation showcases its effectiveness against state-of-the-art models.

Microsoft has also published a Blog post that goes over this attack and potential mitigation steps that can be implemented along with details on new tools developed to counter this attack using their “AI Watchdog” and “AI Spotlight” features. The tools attempt to identify adversarial content in both input and outputs to prevent prompt injection attacks.

SCM Magazine has a good writeup on the attack and the defenses against it.

– Suramya

Source: Slashdot: ‘Crescendo’ Method Can Jailbreak LLMs Using Seemingly Benign Prompts

April 2, 2024

Soon it will be possible to update Apple Devices while still in the box

Filed under: Computer Security,My Thoughts — Suramya @ 11:43 PM

Apple has come up with an interesting new technology that allows stores to install the latest updates to an iPhone without removing it from the box. If the technology works (and it looks like it does) it will remove one of the major hassles of buying a new phone or device which is to install the latest updates and patches on the phone.

This device can wirelessly turn on the iPhone, update its software and then power it back down. We still don’t have a full explanation on how it works but based on at a guess, it leverages the fact that the NFC chip in the phone can work potentially work even when the phone is switched off (it already works with a low battery). Placing the phone in the device would potentially trigger the NFC chip which would then start the phone in a special mode that allows it to connect to the WiFi and download the updates. Post completion the system would shutdown the phone and it would be ready to use.

In theory this sounds like a great enhancement but I fear that unless the system has sufficient controls and checks around it it will open up a whole new attack vector. Previously, there have been attacks where Nation States or Criminal organizations would intercept hardware being delivered to a target open the package, make changes and then reseal and send it on to the target. This is a sure shot way of ensuring that a device is compromised before it reaches the target, however it requires a lot of resources and manual effort to implement and there is a risk of exposure since multiple folks are involved. With this new update option an attacker just has to have physical access to the device and can be done by simply taking the packaged device and putting it in the updater for a little while.

This assumes that the security checks and authentication built around the process can be bypassed. That being said, once the tech is live there are going to be a lot of very smart people trying to bypass the checks to be able to update the phone. Keep in mind that there is nothing stopping anyone from updating the phone using this method even after someone is actively using it.

Source: arstechnica

March 7, 2024

Cloudflare announces Firewall for LLMs to protect them

Filed under: Artificial Intelligence,Computer Security,My Thoughts — Suramya @ 10:52 PM

As is always the case when the attackers invent technology / systems to attack a system the defenders will immediately come up with a technology to protect (might not always be great protection at the beginning). Yesterday I posted about Researchers demo the first worm that spreads through LLM prompt injection and today while going through my feeds I saw the news that earlier this week cloudflare announced a Firewall for AI . Initially when I read the headline I thought it was yet another group of people who are claiming to have created a ‘perfect firewall’ using AI. Thankfully that was not the case and in this instance it looks like an interesting application that will probably become as common as the regular firewall.

What this system does is quite simple, it is setup in front of a LLM so that all interactions with the LLM goes through the firewall and every request with an LLM prompt is scanned for patterns and signatures of possible attacks. As per their blog post attacks like Prompt Injection, Model Denial of Service, and Sensitive Information Disclosure can be mitigated by adopting a proxy security solution like Cloudflare Firewall for AI.

Firewall for AI is an advanced Web Application Firewall (WAF) specifically tailored for applications using LLMs. It will comprise a set of tools that can be deployed in front of applications to detect vulnerabilities and provide visibility to model owners. The tool kit will include products that are already part of WAF, such as Rate Limiting and Sensitive Data Detection, and a new protection layer which is currently under development. This new validation analyzes the prompt submitted by the end user to identify attempts to exploit the model to extract data and other abuse attempts. Leveraging the size of Cloudflare network, Firewall for AI runs as close to the user as possible, allowing us to identify attacks early and protect both end user and models from abuses and attacks.

OWASP has published their Top 10 for Large Language Model Applications, which is a fantastic read and a good overview of the security risks targeting LLM’s. As per cloudfare this firewall mitigates some of the risks highlighted in OWASP for LLM’s. I would suggest taking the announcement with a grain of salt till we have independent validation of the claims. That being said it is def a step in the correct direction though.

– Suramya

Source: Hacker News: Cloudflare Announces Firewall for AI

March 6, 2024

Researchers demo the first worm that spreads through LLM prompt injection

Filed under: Artificial Intelligence,Computer Security,Computer Software — Suramya @ 10:17 PM

In the past year we have seen an uptick in the tech industry looking towards embedding LLM (Large Language Models) or AI as they are being pitched to the world in all possible places. Windows 11 now has built in Copilot that is extremely hard to disable. Email systems are using LLM’s to get additional details/information using the data from the email to add context etc. This creates new attack surfaces that attackers can target and we have seen instances where attackers have used prompt injection to gain access to data or systems that were restricted.

Building on top of that researchers have now created (and demo’d) the first worm that spreads through prompt injection. This is breakthrough work similar to how the Morris Worm was in the late 80’s. Basically, researchers created an email which has an adversarial prompt embedded in it. This prompt is then ingested by an LLM (using Retrieval-Augmented Generation which allows it to enhance the reliability of the LLM by fetching data from external sources when the email is processed by the LLM) where it jailbreaks the GenAI service and can steal data from the emails (or do whatever else the attacker wants such as changing email text, removing data etc). In addition the prompt also has the ability to make the email assistant forward the email with the malicious prompt to other email addresses allowing it to spread. The researchers have christened their worm as Morris II giving homage to the first email worm.

Abstract: In the past year, numerous companies have incorporated Generative AI (GenAI) capabilities into new and existing applications, forming interconnected Generative AI (GenAI) ecosystems consisting of semi/fully autonomous agents powered by GenAI services. While ongoing research highlighted risks associated with the GenAI layer of agents (e.g., dialog poisoning, membership inference, prompt leaking, jailbreaking), a critical question emerges: Can attackers develop malware to exploit the GenAI component of an agent and launch cyber-attacks on the entire GenAI ecosystem?

This paper introduces Morris II, the first worm designed to target GenAI ecosystems through the use of adversarial self-replicating prompts. The study demonstrates that attackers can insert such prompts into inputs that, when processed by GenAI models, prompt the model to replicate the input as output (replication), engaging in malicious activities (payload). Additionally, these inputs compel the agent to deliver them (propagate) to new agents by exploiting the connectivity within the GenAI ecosystem. We demonstrate the application of Morris II against GenAI-powered email assistants in two use cases (spamming and exfiltrating personal data), under two settings (black-box and white-box accesses), using two types of input data (text and images). The worm is tested against three different GenAI models (Gemini Pro, ChatGPT 4.0, and LLaVA), and various factors (e.g., propagation rate, replication, malicious activity) influencing the performance of the worm are evaluated.

This is pretty fascinating work and I think that this kind of attack will start becoming more common as the LLM usage goes up. The research paper is available at: ComPromptMized: Unleashing Zero-click Worms that Target GenAI-Powered Applications.

– Suramya

March 5, 2024

Yet another example on why we need controls and audit logs around sensitive data

Filed under: Computer Security,My Thoughts — Suramya @ 11:40 AM

People like the one in the example below are why cyber security and privacy policies insist on having access control rules and oversight on who has access to data and audit logs for why they are accessing that data.

My favourite thing about working in HR is being able to look up anyone's age or salary. Its like having a version of IMDB, but for real people.
My favorite thing about working in HR is being able to look up anyone’s age or salary. Its like having a version of IMDB, but for real people.

If you are building of maintaining a system that has sensitive data, or PII (Personal Identifiable Information) you need to ensure that you not just have access controls around the data but also have a way to audit who is accessing the data and for what reason. If the reason is not work related then action should be taken and their access revoked. Law enforcement has access to various monitoring systems and there have been multiple examples in the past where law enforcement officers looked up their ex’s, stalked people etc. Again, that is something that can be prevented in part by having strictly enforced policies on who can access the data and for what purpose.

In one of my previous companies, everytime you accessed any production or critical systems you had to give a reason and link to either a support or incident ticket. Then a manager and the system owner would review the access log along with the keylogs from the sessions and sign off on it. With that they were personally confirming that all activity was required and was justified. If that wasn’t the case then action would be taken against the person who signed off on the logs along with the person who did the accessing.

We need more of that in all systems.

– Suramya

September 4, 2023

Mashing Enter can allow you bypass full disk encryption in certain scenarios

Filed under: Computer Security,Linux/Unix Related,My Thoughts — Suramya @ 12:30 PM

When folks think about hacking and people bypassing secure systems they have this mental image of folks writing complex code or physically reading the data byte by byte but that is not always true. Sometimes, it is as simple as just keeping the enter key pressed while the system is booting up. Yes, you read that right. A few days ago a vulnerability was found in a TPM-protected system that is configured to implement unattended unlocking for LUKS full disk encryption using RedHat’s Clevis and dracut software along with systemd.

Generally, a Linux computer using TPM-protected unattended disk encryption will still allow a user to view the output of the boot process and optionally manually enter a decryption password with the keyboard. This allows for situations where the computer fails to boot and needs someone to troubleshoot the startup process. While the unattended TPM unlocking is taking place, the user is still presented with the password prompt and an opportunity to enter input.

There’s a limited window of time before the TPM will unlock the disk and the boot process will proceed automatically to the login prompt, so how can we effectively fuzz this input opportunity? What if we could type faster than a human being? Using an Atmel ATMEGA32U4 microcontroller (such as you’d find in an Arduino Leonardo development board) we can emulate a keyboard that sends virtual keypresses at essentially the maximum rate that the computer will accept. The following short Arduino program sets up a Leonardo as a keyboard emulator:

#include "Keyboard.h"
void setup() {
void loop() {;

One second after being plugged in this program begins to simulate pressing the Enter key on a virtual keyboard every 10 milliseconds. This is about 10x faster than the usual keyboard repeat rate you’d get simply holding down a key, and Linux seems to recognise around 70 characters per second using this method, or one keypress approximately every 15 milliseconds.

Sending keypresses this fast quickly hits the maximum number of password entry retries, while keeping the system from unlocking the disk automatically due to password guess rate limiting, and systemd eventually gives up trying to unlock the disk. It takes a minute or two but the recovery action in this failure scenario is to give us a root shell in the early boot environment

The simplest way to address the most immediate problem: Add and rd.emergency=reboot to the kernel command line. This ensures that if anything fails during the early boot process the computer will reboot immediately rather than dropping into a root shell.

However, this goes to show us that the old statement about security is still absolutely valid: “Physical access is root access. You can’t spend thousands on protecting the cyber threat landscape and ignore physical security such that people can just walk up to your computer and stick things inside. That being said, having a physical security program doesn’t necessarily protect your from an insider threat so that is also something to keep in mind.

Source: Pulsesecurity: Mashing Enter to bypass full disk encryption with TPM, Clevis, dracut and systemd

– Suramya

August 29, 2023

Excel holding up the Global Financial System, now with Python support

Filed under: Computer Security,Computer Software,My Thoughts,Tech Related — Suramya @ 1:12 PM

It is both impressive and scary how much of the world’s financial systems is being run using Microsoft Excel. Folks have created formulars/macros/scripts/functions etc in Excel that allows them to generate data that is used to take major financial decisions with real world impact.

In one of my previous companies we actually had a full discussion on how to get an inventory of all the Excel code in use at the company and how to archive it so that we have backups and version control on them. Unfortunately, I left before much headway was made but I did learn enough about excel use to scare me. (Especially since I am not the biggest fan of Microsoft software 😉 )

Now you might ask why so many people are using excel when there are better tools available in the market and these companies have inhouse teams to create custom software for the analyst and I asked the exact same questions when I started. I think it is probably because the tool makes it easy for folks to come up with formulas and scripts that get their work done instead of having to wait for an external team to make the changes etc that they need.

Now, a few days ago Microsoft made a surprise announcement that going forward they are going to support running Python inside an Excel file. Yikes!! In order to use this functionality you will need to be part of the Microsoft 365 Insider program and then you can type Python code directly into cells using the new =PY() function, which then gets executed in the cloud. From what I have read, this will be enabled by default and needs to be disabled via a registry key.

Since its inception, Microsoft Excel has changed how people organize, analyze, and visualize their data, providing a basis for decision-making for the millions of people who use it each day. Today we’re announcing a significant evolution in the analytical capabilities available within Excel by releasing a Public Preview of Python in Excel. Python in Excel makes it possible to natively combine Python and Excel analytics within the same workbook – with no setup required. With Python in Excel, you can type Python directly into a cell, the Python calculations run in the Microsoft Cloud, and your results are returned to the worksheet, including plots and visualizations.

We already have issues with Excel Macros being used as vectors for malware & viruses, this just opens a whole new front in that war. Now, admins will have to worry about attackers using Python in Excel to infiltrate the organization or to send data outside the org. I can see how it is useful for people working with datasets and MS is adding this functionality to keep up with other tools such as Tableau etc which are more powerful but still I feel that this is a bad move.

Another problem that folks are going to face is that now your Excel sheets have Python programs inside them, how are we supposed to version the code, how is code review done? Basically this code should be going through the standard SDLC (Software Development Life Cycle) process but wouldn’t. We also need to ensure that all changes are reviewed and monitored to protect against insider attacks but the way the system is setup this is going to be extremely difficult (We have already seen that with Macros and Formulas etc).

Lets see how folks address this risk profile.

– Suramya

June 20, 2023

It is now possible to track someone using SMS Receipt Messages

Filed under: Computer Security,Interesting Sites,My Thoughts,Tech Related — Suramya @ 6:04 PM

With modern technology it is getting more and more easy to track someone. There are many apps, devices etc that allow a target to be tracked in near realtime by someone. This can be done using an App on your phone, find my phone functionality, family phone track etc etc. As someone who is worried about getting tracked they can disable GPS, get a new dumb phone that doesn’t support GPS etc which can mitigate the threat to a large extent. Unfortunately, now there is a new attack surface that allows an attacker to approximately locate a target with up to 96% accuracy.

Researchers have figured out how to deduce the location of an SMS recipient by analyzing timing measurements from typical receiver location. Basically they measure the time elapsed between sending a SMS and the receipt of the Delivery report and then use a ML model to predict the location area where the target could be located. The other advantage of this attack is that it doesn’t require any specialized equipment or access to restricted systems but can be executed via a simple smartphone.

Short Message Service (SMS) remains one of the most popular communication channels since its introduction in 2G cellular networks. In this paper, we demonstrate that merely receiving silent SMS messages regularly opens a stealthy side-channel that allows other regular network users to infer the whereabouts of the SMS recipient. The core idea is that receiving an SMS inevitably generates Delivery Reports whose reception bestows a timing attack vector at the sender. We conducted experiments across various countries, operators, and devices to show that an attacker can deduce the location of an SMS recipient by analyzing timing measurements from typical receiver locations. Our results show that, after training an ML model, the SMS sender can accurately determine multiple locations of the recipient. For example, our model achieves up to 96% accuracy for locations across different countries, and 86% for two locations within Belgium. Due to the way cellular networks are designed, it is difficult to prevent Delivery Reports from being returned to the originator making it challenging to thwart this covert attack without making fundamental changes to the network architecture.

The biggest problem with this method is that it doesn’t depend on any software or anything that needs to be installed on the target phone. You just need a phone that supports SMS, which is pretty much all phones in the market. There is an option to disable delivery reports which would mitigate the threat to an extent but is an opt-out setup rather than an opt-in. One way to reduce this vector would be for manufacturers to disable the delivery report by default and folks who need it can enable it from settings instead of the other way round which is the case right now.

Source: HackerNews: Freaky Leaky SMS: Extracting user locations by analyzing SMS timings
Full Paper: Freaky Leaky SMS: Extracting User Locations by Analyzing SMS Timings

– Suramya

June 12, 2023

A DIY Robot for automating a Cold boot attack now exists

Filed under: Computer Hardware,Computer Security,My Thoughts,Tech Related — Suramya @ 11:58 PM

A Cold boot Attack has been around for a while (It was first demo’d in 2008) but it has been a fairly manual tricky operation till now. But now there is a new DIY Robot has been created that reduces the manual effort for this attack. Now you might be asking what on earth is a Cold Boot Attack? No, it is not referring to having to wear cold shoes in winter. It is actually a very interesting attack where the attacker freezes the RAM chips of a system while it is running and then shuts it down, after which they remove the RAM chip and put it in another device to read the data from it. Because the chip has been cooled significantly it retains the information even after the system is shutdown long enough for information to be extracted from it. The original cold boot attack involved freezing a laptop’s memory by inverting a can of compressed air to chill the computer’s DRAM to around -50°C so that it persists for several minutes, even after the system was powered down.

Ang Cui, founder and CEO of Red Balloon Security has created a process & robot to extract the chip from the system. The robot is a CNC machine which is has a FGPA (field-programmable gate array) connected to it. The robot chills the RAM chips one at a time, extracts them from the board and then inserts them into the FGPA that reads the contents of the chip allowing them to extract the data from it. To make it easier and allow them more time to remove the chip, the system monitors the electromagnetic emanation of the device which allows them to identify when the system is running CPU bound operations. Once they identify that, they can extract the chip when the system is using the CPU and not reading/writing to the RAM. This gives the robot a window of ~10 milliseconds to extract the chips instead of having to do it in nanoseconds.

Cui and colleagues demonstrated their robot on a Siemens SIMATIC S7-1500 PLC, from which they were able to recover the contents of encrypted firmware binaries. They also conducted a similarly successful attack on DDR3 DRAM in a CISCO IP Phone 8800 series to access the runtime ARM TrustZone memory.

They believe their technique is applicable to more sophisticated DDR4 and DDR5 if a more expensive (like, about $10,000) FPGA-based memory readout platform is used – a cost they expect will decline in time.

Cold boot attacks can be countered with physical memory encryption, Cui said.

This is not an attack the average user has to worry about but it is something that folks working on critical systems like banking servers, government systems, weapons etc need to be aware of and guard against. More details on the attack will be provided during a talk at the REcon reverse engineering conference in Canada titled “Ice Ice Baby: Coppin’ RAM With DIY Cryo-Mechanical Robot

Source: Hacker News: Robot can rip the data out of RAM chips

– Suramya

May 19, 2023

KeePass exploit helps retrieve cleartext master password – Fix ETA July 2023

Filed under: Computer Security,My Thoughts,Tech Related — Suramya @ 8:06 PM

Security is hard to do and no matter how careful you are while coding every software will have bugs in it and some of these bugs have major security implications. Keepass which is a very popular password manager is vulnerable to extracting the master password from the application’s memory, allowing attackers who compromise a device to retrieve the password even with the database is locked. The bug is being tracked as CVE-2023-32784.

The issue was discovered by a security researcher known as ‘vdohney’ who has unfortunately also published PoC code that exploits the vulnerability called the “KeePass Master Password Dumper” on GitHub.

KeePass Master Password Dumper is a simple proof-of-concept tool used to dump the master password from KeePass’s memory. Apart from the first password character, it is mostly able to recover the password in plaintext. No code execution on the target system is required, just a memory dump. It doesn’t matter where the memory comes from – can be the process dump, swap file (pagefile.sys), hibernation file (hiberfil.sys), various crash dumps or RAM dump of the entire system. It doesn’t matter whether or not the workspace is locked. It is also possible to dump the password from RAM after KeePass is no longer running, although the chance of that working goes down with the time it’s been since then.

Tested with KeePass 2.53.1 on Windows (English) and KeePass 2.47 on Debian (keepass2 package). PoC might have issues with different encodings (languages), but that’s not confirmed as of now (see issue #3). Should work for the macOS version as well. Unfortunately, enabling the Enter master key on secure desktop option doesn’t help in preventing the attack.

The attack does require either physical access to the system or the system would need to be infected with Malware that give an attacker remote access with the ability to perform thread dumps. They can also extract the password from the process dump, swap file (pagefile.sys), hibernation file (hiberfil.sys) or RAM dump of the entire system.

The fix for the problem is in the works and the initial testing looks promising. Personally I think that the security researcher should have waited to release the PoC code till the fix is available but to each their own I guess.

Source: KeePass exploit helps retrieve cleartext master password, fix coming soon

Older Posts »

Powered by WordPress