April 21, 2024

Crescendo Method enables Jailbreaking of LLMs Using ‘Benign’ Prompts

LLMs are becoming more and more popular across all industries and that creates a new attack surface for attackers to target to misuse for malicious purposes. To prevent this LLM models have multiple layers of defenses (with more being created every day), one of the layers attempts to limit the capability of the LLM to what the developer intended. For example, a LLM running a chat service for software support would be limited to answer questions about software identified by the developer. Attackers attempt to bypass these safeguards with the intent to achieve unauthorized actions or “jailbreak” the LLM. Depending on the LLM, this can be easy or complicated.

Earlier this month Microsoft published a paper showcasing the “Crescendo” LLM jailbreak method called “Great, Now Write an Article About That: The Crescendo Multi-Turn LLM Jailbreak Attack“. Using this method a successful attack could usually be completed in a chain of fewer than 10 interaction turns.

Large Language Models (LLMs) have risen significantly in popularity and are increasingly being adopted across multiple applications. These LLMs are heavily aligned to resist engaging in illegal or unethical topics as a means to avoid contributing to responsible AI harms. However, a recent line of attacks, known as “jailbreaks”, seek to overcome this alignment. Intuitively, jailbreak attacks aim to narrow the gap between what the model can do and what it is willing to do. In this paper, we introduce a novel jailbreak attack called Crescendo. Unlike existing jailbreak methods, Crescendo is a multi-turn jailbreak that interacts with the model in a seemingly benign manner. It begins with a general prompt or question about the task at hand and then gradually escalates the dialogue by referencing the model’s replies, progressively leading to a successful jailbreak. We evaluate Crescendo on various public systems, including ChatGPT, Gemini Pro, Gemini-Ultra, LlaMA-2 70b Chat, and Anthropic Chat. Our results demonstrate the strong efficacy of Crescendo, with it achieving high attack success rates across all evaluated models and tasks. Furthermore, we introduce Crescendomation, a tool that automates the Crescendo attack, and our evaluation showcases its effectiveness against state-of-the-art models.

Microsoft has also published a Blog post that goes over this attack and potential mitigation steps that can be implemented along with details on new tools developed to counter this attack using their “AI Watchdog” and “AI Spotlight” features. The tools attempt to identify adversarial content in both input and outputs to prevent prompt injection attacks.

SCM Magazine has a good writeup on the attack and the defenses against it.

Source: Slashdot: ‘Crescendo’ Method Can Jailbreak LLMs Using Seemingly Benign Prompts

April 20, 2024

Don’t define yourself so narrowly that your wife not being impressed by vim is a reason for a divorce

When I first saw the screenshot below I actually thought it was a troll posting, but then I remembered that there are actually people in the world who define their whole personality and existence based on a single tool/movie/series/comics etc. For these people nothing is more important than their pet obsession. Case in point, we have a person here who’s personality is so one dimensional that the fact that their wife is unimpressed by Vim is enough to consider leaving their wife of 10 years.

doobltroobl -  My wife was unimpressed by Vim - please advise. Last evening I made a small demo to my wife. Nothing fancy, just jumping around the page, moving lines around, deleting several words at a time, the kind of things that blew my mind when I first saw Vim. Alas, my wife couldn't care less, and she even told me so. I've been married for 10 years, but I'm starting to have some doubts. So I'm appealing to this fine community in this moment of crisis. Where can I go from here? What path should | take?
My wife was unimpressed by Vim – please advise

I mean I am a geek and I have bored the ears off Jani talking about the work I do. In fact, one of my criteria for a compatible wife (before I married Jani) was that the girl should be a techie so that she can understand what I am talking about when I get excited about things. Then I grew up and realized that the ability to understand tech is not the most important thing in a partner. We both are polar opposites in most things except for the core principles we both live by and that makes/keeps the marriage interesting. She talks to me about Immigration & HR policies and a lot of it goes over my head, but we both support each others interests which is what is needed in a relationship.

I don’t get these people, why would you base your entire existence on a single point/item/thing. The problem is that because they only have this one item that they think makes them special they tend to react badly to people changing it. New people joining the group or even hinting towards liking it as well have to prove themselves to these people as being ‘worthy’ of being called fans.

A constant remark you will hear from these folks is that the change/reboot/continuation has ‘ruined their childhood’. Personally, I don’t think any single change has the power to ruin my childhood because I had so many different experiences and things I did as a child (reading/gymnastics/singing/soccer/mountaineering/family time etc) that even if I don’t like the changes to one of them I can ignore it and go on with my life.

I do realize that not everyone has had a happy childhood and that can cause people to fixate on things, but that is no way to live… Therapy is not just for weak minded people, it is a legitimate tool that helps you. Once you stop trying to fit everything into a single point of view obsessing about things you will find that there are so many more things in the world that you can consume and people you can meet.

Don’t define yourself using a single data point, go out and explore this amazing world we have and have fun in it.

April 19, 2024

Would Tesla cars still work if Tesla went out of business?

Dave Winer asked the following question on MastodonIf Tesla went out of business, would my Model Y stop working??” and at the first glance it sounds like a ridiculous question. In fact, if you told someone even 15 years ago that you were worried that your car would stop working if the company that manufactured it went out of business they would laugh at you. But thanks to the over proliferation of Things as a Service which is used by a lot of manufacturers to control and profit out of stuff that should be included this is no longer the case.

Auto manufacturers are now adding functionality as a service to their cars for things that were included for free earlier. For example, BMW started selling Seat Heating as a Service in 2022. Tesla has subscriptions for Premium connectivity and ‘self-driving’. Mercedes goes even further and charges an extra $1200/year to unlock a fully functional accelerator.

However the big problem with Tesla (and other cars) is that all the critical software components are protected by DRM. Once a device has DRM on it, Section 1201 of the DMCA makes it a felony to bypass that DRM, even for legitimate purposes.

We have already seen cases where owners are unable to start their cars from the mobile app when the Tesla servers went down (Apparently the manual key worked in this case). Others have seen problems starting their car when they lost connectivity during software updates. I do seem to remember reading somewhere that there is a phone home system built into Tesla’s that would stop the car from working fully if it could no longer talk to the company servers but I can’t find the link to the story anywhere.

So long story short, if Tesla went out of business a lot of the functionality in the car would stop working. As per a forum post on ‘Tesla Motors Club’ from 2021 the following would stop working if the car didn’t have connectivity (I can’t verify this because I don’t have a Tesla and no desire to get one):

  • control aircon remotely turn on/off adjust temperature
  • turn sentry mode on/off
  • control heated seats and heated steering wheel
  • open/close trunk
  • check location/speed of the car
  • unlock remotely
  • allow someone to drive the car (while you’re in a different location to the car)
  • Smart summon
  • vent or close the windows
  • sentry mode alarm alerts
  • restrict speed
  • valet mode

I think some of these might work with physical controls but not sure. I think I will stick with my Honda City for now 🙂

April 18, 2024

Debris from Space Station crash into Florida home destroying two floors

A long time ago I watched a show called ‘Dead Like Me‘ where the main character (George) is killed in the pilot episode by a toilet seat falling from the deorbiting Mir space station. At that time it was portrayed as an absurd way to die and George is understandably upset about it.

Showing that at times life does imitate fiction, last month a piece of space junk from the International Space Station crashed through the roof and two floors of a Florida home. This was confirmed by NASA earlier this week. NASA and others have been dumping things into orbit with the assumption that they will burn up during re-entry and this debris was from a cargo pallet intentionally released from the space station three years ago.

The piece of space junk is roughly cylindrical in shape and is about 4-inches tall and 1.6-inches wide. NASA said agency staff studied the object’s features and metal composition and matched it to the hardware that had been jettisoned from the space station in 2021.

At that time, new lithium-ion batteries had recently been installed at the space station, so the old nickel hydrogen batteries were packed up for disposal. The space station’s robotic arm released the 5,800-pound cargo pallet containing the batteries over the Pacific Ocean, as the outpost orbited 260 miles above the Earth’s surface, according to NASA.

I think that this habit is a bad idea and should be reconsidered. When items burn up in the atmosphere they release toxic byproducts that pollute the environment and if the item doesn’t burn up completely (as was the case here) they can cause significant damage when they crash into the Earth.

April 17, 2024

When an Engineering Manager submits a PR

Saw this in my feed and it made me laugh:

When the team lets the engineering manager submit a PR. 33 year old fruit bat with arthritis goes on 'flights' to keep him active
When the team lets the engineering manager submit a PR

Dedicated to all the other EM’s out there. 🙂

April 16, 2024

Creating a Tic-Tac-Toe game using a single printf statement in a loop

The printf statement in C/C++ (and other languages) is a fairly innocuous command that prints information to the screen (or any other output stream). Reading over JWZ’s blog post (The Turing Police say “X Wins”) I found that I was mistaken as it is much more powerful than that. In fact, a single printf statement in a loop can be used to create a full interactive game of tic-tac-toe and this is demo’d by Nicholas Carlini, who has implemented this and you can view the code over at their GitHub Repo: tic-tac-toe in a single call to printf.

Apparently, this was inspired by the International Obfuscated C Code Contest. The repo has an explanation on how this works and I am still going through it to wrap my head around how it works and understand it fully. Check it out if you have some time.

April 15, 2024

Hiring goons to beat up your manager when they pressure you to work harder is a bad idea

All of us have had managers who push you to do more work and work harder and there are various options on how to deal with them depending of the situation. Two folks in Bengaluru had a very unique approach to this, instead of working with the manager or switching jobs they decided the best option was to hire goons to beat up the manager. Once hired, the Goons attacked the manager in the middle of the road in daytime and the whole thing was caught on camera and the video has since then gone viral.

The victim, identified as Suresh, is said to be an auditor in a private firm. He joined the firm about a year ago. He reportedly has been pressuring two other employees of the firm, Umashankar and Vinesh, to work faster. He reportedly used to pressure the duo to clear transactions daily, which they used to take days to complete.

Feeling aggrieved, they purportedly engaged goons to attack the auditor.

I do get the temptation to have a manager beaten up when you feel that they are putting undue pressure on you but that is a wrong way to deal with such a situation. They could have reached out to more senior leaders and reported the pressure being put on them by the manager so that they can address it. Or complained to HR or looked for another job. There are so many other ways to handle this that I am stunned they thought this was a good idea.

April 2, 2024

Soon it will be possible to update Apple Devices while still in the box

Apple has come up with an interesting new technology that allows stores to install the latest updates to an iPhone without removing it from the box. If the technology works (and it looks like it does) it will remove one of the major hassles of buying a new phone or device which is to install the latest updates and patches on the phone.

This device can wirelessly turn on the iPhone, update its software and then power it back down. We still don’t have a full explanation on how it works but based on at a guess, it leverages the fact that the NFC chip in the phone can work potentially work even when the phone is switched off (it already works with a low battery). Placing the phone in the device would potentially trigger the NFC chip which would then start the phone in a special mode that allows it to connect to the WiFi and download the updates. Post completion the system would shutdown the phone and it would be ready to use.

In theory this sounds like a great enhancement but I fear that unless the system has sufficient controls and checks around it it will open up a whole new attack vector. Previously, there have been attacks where Nation States or Criminal organizations would intercept hardware being delivered to a target open the package, make changes and then reseal and send it on to the target. This is a sure shot way of ensuring that a device is compromised before it reaches the target, however it requires a lot of resources and manual effort to implement and there is a risk of exposure since multiple folks are involved. With this new update option an attacker just has to have physical access to the device and can be done by simply taking the packaged device and putting it in the updater for a little while.

This assumes that the security checks and authentication built around the process can be bypassed. That being said, once the tech is live there are going to be a lot of very smart people trying to bypass the checks to be able to update the phone. Keep in mind that there is nothing stopping anyone from updating the phone using this method even after someone is actively using it.

Source: arstechnica

April 1, 2024

ISRO successfully tested their Reusable launch vehicle Pushpak

Filed under: Astronomy / Space,My Thoughts,Science Related — Suramya @ 6:00 PM

ISRO’s successfully tested the latest version of their Reusable launch vehicle (RLV) technology through the RLV LEX-02 landing experiment. The Lander called Pushpak (RLV-TD) landed autonomously with precision on the runway after being released from an off-nominal position.

RLV-LEX-02/Pushpak landing autonomously
RLV-LEX-02/Pushpak landing autonomously (Pic Credit: ISRO)

The winged vehicle, called Pushpak, was lifted by an Indian Airforce Chinook helicopter and was released from 4.5 km altitude. After release at a distance of 4 km from the runway, Pushpak autonomously approached the runway along with cross-range corrections. It landed precisely on the runway and came to a halt using its brake parachute, landing gear brakes and nose wheel steering system.

This mission successfully simulated the approach and high-speed landing conditions of RLV returning from space. With this second mission, ISRO has re-validated the indigenously developed technologies in the areas of navigation, control systems, landing gear and deceleration systems essential for performing a high-speed autonomous landing of a space-returning vehicle. The winged body and all flight systems used in RLV-LEX-01 were reused in the RLV-LEX-02 mission after due certification/clearances. Hence reuse capability of flight hardware and flight systems is also demonstrated in this mission. Based on the observations from RLV-LEX-01, the airframe structure and landing gear were strengthened to tolerate higher landing loads.

This was the second successful test of the system and the winged body and all flight systems used in RLV-LEX-01 were reused in the RLV-LEX-02 demonstrating the reuse capability of flight hardware and flight systems. This system is essential to the creation and use of RLV technology in future launches which will enable us to reduce the cost of the launches going forward. This will also allow us to increase the number of launches and the payload we can put in orbit in a given time period. Another key point to note is that all the technology used in the craft was developed indigenously in India.

Source: ISRO achieves yet another success in the RLV Landing Experiment

