Suramya's Blog : Welcome to my crazy life…

April 21, 2024

Crescendo Method enables Jailbreaking of LLMs Using ‘Benign’ Prompts

LLMs are becoming more and more popular across all industries and that creates a new attack surface for attackers to target to misuse for malicious purposes. To prevent this LLM models have multiple layers of defenses (with more being created every day), one of the layers attempts to limit the capability of the LLM to what the developer intended. For example, a LLM running a chat service for software support would be limited to answer questions about software identified by the developer. Attackers attempt to bypass these safeguards with the intent to achieve unauthorized actions or “jailbreak” the LLM. Depending on the LLM, this can be easy or complicated.

Earlier this month Microsoft published a paper showcasing the “Crescendo” LLM jailbreak method called “Great, Now Write an Article About That: The Crescendo Multi-Turn LLM Jailbreak Attack“. Using this method a successful attack could usually be completed in a chain of fewer than 10 interaction turns.

Large Language Models (LLMs) have risen significantly in popularity and are increasingly being adopted across multiple applications. These LLMs are heavily aligned to resist engaging in illegal or unethical topics as a means to avoid contributing to responsible AI harms. However, a recent line of attacks, known as “jailbreaks”, seek to overcome this alignment. Intuitively, jailbreak attacks aim to narrow the gap between what the model can do and what it is willing to do. In this paper, we introduce a novel jailbreak attack called Crescendo. Unlike existing jailbreak methods, Crescendo is a multi-turn jailbreak that interacts with the model in a seemingly benign manner. It begins with a general prompt or question about the task at hand and then gradually escalates the dialogue by referencing the model’s replies, progressively leading to a successful jailbreak. We evaluate Crescendo on various public systems, including ChatGPT, Gemini Pro, Gemini-Ultra, LlaMA-2 70b Chat, and Anthropic Chat. Our results demonstrate the strong efficacy of Crescendo, with it achieving high attack success rates across all evaluated models and tasks. Furthermore, we introduce Crescendomation, a tool that automates the Crescendo attack, and our evaluation showcases its effectiveness against state-of-the-art models.

Microsoft has also published a Blog post that goes over this attack and potential mitigation steps that can be implemented along with details on new tools developed to counter this attack using their “AI Watchdog” and “AI Spotlight” features. The tools attempt to identify adversarial content in both input and outputs to prevent prompt injection attacks.

SCM Magazine has a good writeup on the attack and the defenses against it.

– Suramya

Source: Slashdot: ‘Crescendo’ Method Can Jailbreak LLMs Using Seemingly Benign Prompts

April 20, 2024

Don’t define yourself so narrowly that your wife not being impressed by vim is a reason for a divorce

Filed under: My Thoughts,Tech Related — Suramya @ 4:31 AM

When I first saw the screenshot below I actually thought it was a troll posting, but then I remembered that there are actually people in the world who define their whole personality and existence based on a single tool/movie/series/comics etc. For these people nothing is more important than their pet obsession. Case in point, we have a person here who’s personality is so one dimensional that the fact that their wife is unimpressed by Vim is enough to consider leaving their wife of 10 years.

doobltroobl -  My wife was unimpressed by Vim - please advise. Last evening I made a small demo to my wife. Nothing fancy, just jumping around the page, moving lines around, deleting several words at a time, the kind of things that blew my mind when I first saw Vim. Alas, my wife couldn't care less, and she even told me so. I've been married for 10 years, but I'm starting to have some doubts. So I'm appealing to this fine community in this moment of crisis. Where can I go from here? What path should | take?
My wife was unimpressed by Vim – please advise

I mean I am a geek and I have bored the ears off Jani talking about the work I do. In fact, one of my criteria for a compatible wife (before I married Jani) was that the girl should be a techie so that she can understand what I am talking about when I get excited about things. Then I grew up and realized that the ability to understand tech is not the most important thing in a partner. We both are polar opposites in most things except for the core principles we both live by and that makes/keeps the marriage interesting. She talks to me about Immigration & HR policies and a lot of it goes over my head, but we both support each others interests which is what is needed in a relationship.

I don’t get these people, why would you base your entire existence on a single point/item/thing. The problem is that because they only have this one item that they think makes them special they tend to react badly to people changing it. New people joining the group or even hinting towards liking it as well have to prove themselves to these people as being ‘worthy’ of being called fans.

A constant remark you will hear from these folks is that the change/reboot/continuation has ‘ruined their childhood’. Personally, I don’t think any single change has the power to ruin my childhood because I had so many different experiences and things I did as a child (reading/gymnastics/singing/soccer/mountaineering/family time etc) that even if I don’t like the changes to one of them I can ignore it and go on with my life.

I do realize that not everyone has had a happy childhood and that can cause people to fixate on things, but that is no way to live… Therapy is not just for weak minded people, it is a legitimate tool that helps you. Once you stop trying to fit everything into a single point of view obsessing about things you will find that there are so many more things in the world that you can consume and people you can meet.

Don’t define yourself using a single data point, go out and explore this amazing world we have and have fun in it.

– Suramya

April 19, 2024

Would Tesla cars still work if Tesla went out of business?

Filed under: Computer Software,My Thoughts,Tech Related — Suramya @ 9:18 AM

Dave Winer asked the following question on MastodonIf Tesla went out of business, would my Model Y stop working??” and at the first glance it sounds like a ridiculous question. In fact, if you told someone even 15 years ago that you were worried that your car would stop working if the company that manufactured it went out of business they would laugh at you. But thanks to the over proliferation of Things as a Service which is used by a lot of manufacturers to control and profit out of stuff that should be included this is no longer the case.

Auto manufacturers are now adding functionality as a service to their cars for things that were included for free earlier. For example, BMW started selling Seat Heating as a Service in 2022. Tesla has subscriptions for Premium connectivity and ‘self-driving’. Mercedes goes even further and charges an extra $1200/year to unlock a fully functional accelerator.

However the big problem with Tesla (and other cars) is that all the critical software components are protected by DRM. Once a device has DRM on it, Section 1201 of the DMCA makes it a felony to bypass that DRM, even for legitimate purposes.

We have already seen cases where owners are unable to start their cars from the mobile app when the Tesla servers went down (Apparently the manual key worked in this case). Others have seen problems starting their car when they lost connectivity during software updates. I do seem to remember reading somewhere that there is a phone home system built into Tesla’s that would stop the car from working fully if it could no longer talk to the company servers but I can’t find the link to the story anywhere.

So long story short, if Tesla went out of business a lot of the functionality in the car would stop working. As per a forum post on ‘Tesla Motors Club’ from 2021 the following would stop working if the car didn’t have connectivity (I can’t verify this because I don’t have a Tesla and no desire to get one):

  • control aircon remotely turn on/off adjust temperature
  • turn sentry mode on/off
  • control heated seats and heated steering wheel
  • open/close trunk
  • check location/speed of the car
  • unlock remotely
  • allow someone to drive the car (while you’re in a different location to the car)
  • Smart summon
  • vent or close the windows
  • sentry mode alarm alerts
  • restrict speed
  • valet mode

I think some of these might work with physical controls but not sure. I think I will stick with my Honda City for now 🙂

– Suramya

April 16, 2024

Creating a Tic-Tac-Toe game using a single printf statement in a loop

Filed under: Computer Software,Interesting Sites,My Thoughts,Tech Related — Suramya @ 12:19 PM

The printf statement in C/C++ (and other languages) is a fairly innocuous command that prints information to the screen (or any other output stream). Reading over JWZ’s blog post (The Turing Police say “X Wins”) I found that I was mistaken as it is much more powerful than that. In fact, a single printf statement in a loop can be used to create a full interactive game of tic-tac-toe and this is demo’d by Nicholas Carlini, who has implemented this and you can view the code over at their GitHub Repo: tic-tac-toe in a single call to printf.

Apparently, this was inspired by the International Obfuscated C Code Contest. The repo has an explanation on how this works and I am still going through it to wrap my head around how it works and understand it fully. Check it out if you have some time.

– Suramya

March 23, 2024

Threads is launching their fediverse integration and that is a good thing

Filed under: Emerging Tech,My Thoughts,Tech Related — Suramya @ 12:19 AM

Yesterday Threads launched a beta version of their fediverse integration and that is actually a great thing. Users who have access can enable federation on their account and users on other Federated systems such as Mastodon can follow them, comment and like their posts as if they were directly on the Threads server. (Comments are not yet propagated back to Threads but is in the works).

First a bit of background information. Threads is a microblogging site similar to Twitter that was created by Meta to take advantage of the fact that Twitter was becoming more and more unusable. Fediverse is a ensemble of social networks which can communicate with each other, while remaining independent platforms. Basically it works similar to how email works, where I can have an account at suramya.com and still be able to communicate with someone who has an account at gmail.com. The system allows the individual servers / systems to communicate over the ActivityPub protocol, and anyone can implement it in their system. For example, my blog is connected to the Fediverse (Mastodon) and all posts here are automatically posted there. If I want I can enable more features here to allow me to browse/post posts across the various servers directly from my blog itself.

As you can imagine this is quite powerful and addresses the “Switching cost” which is an economists’ term for everything you have to give up when you change products or services. For social media networks the cost is that you will lose your connections on the network who don’t agree to shift with you and you loose access to the communities that are based in the network you want to leave. For example, a lot of Hobby networks, apartment groups etc are still on Facebook which means that I have to use FB if I want to keep myself up to date. A lot of government/company accounts are still on Twitter so I need to keep my account there if I want to connect with them or keep myself updated. Now imagine if that wasn’t the case. That is what federation means. People who want to use Threads can continue to use Threads and post over there and I would be able to follow them from my Mastodon account seamlessly and still interact with their posts.

Unfortunately, as always there is a vocal minority who is raising a ruckus about this and are blocking them preemptively. These folks do raise some valid concerns such as posts on Mastodon servers will end up getting monetized and targeted for Ads, or the trolls from Meta services will start targeting folks on Mastodon servers or Fediverse will go the way of the XMPP (Extensible Messaging and Presence Protocol) that was adopted by Google and then slowly phased out/killed…

Lets talk about the last point first because it is a critical point. XMPP was a very promising protocol that was expected to allow users to chat with users of any chat service seamlessly. But it had a lot of issues some of which are discussed in the Hackernews Thread: Why XMPP failed and SMTP didn’t?. The highlights are:

  • So multi device did not work: One device was designated “active” and would receive the messages, others would not.
  • There was no history sync of any sort: If you had checked messages from home, they would not appear in your work computer’s history next morning. If you replied from home, you won’t be able to see your own messages at work PC.
  • Anything mobile (mobile phone, laptop in coffee shops) was also unusable — you cannot start app and catch up on all missing messages. You had to be online to receive them.

These drawbacks (amongst others) did more to kill the protocol than anything Google did. The workarounds to resolve the issues listed above required a lot of custom hacks, kludges and prayers to get them to work and I talk about that from experience since I did setup and manage a XMPP server for a while.

Coming to the other points they are not a new concern that Threads is bringing in. We already have to worry about them in the existing ecosystem where we have servers that abuse the service and end up getting blocked. That is always an option where if they start doing things that we don’t like we can block them. Most posts on the Fediverse are already public by default so nothing is stopping Meta from consuming them to train their advertising system on it. The point about Trolls and Harassment campaigns doesn’t have an easy solution and for some servers that cater to marginalized communities blocking the entire Threads server might be a good idea.

Personally I like the approach Dansup from Pixelfed took, where they allow members, to block specific domains which blocks content and interactions from the domains the user chooses.

Having the ability to follow and interact with the formerly walled gardens from outside the garden is a good thing and we should build it up. Instead of dunking on it because it doesn’t do everything we want it to do. Lowering the Switching cost is a good thing longterm and we need more of this instead of ideological puritans jumping around about how the ‘impure’ masses from ‘the bad place’ are able to talk to and interact with folks in the ‘pure’ server defiling it’s ideological purity.

In a way it is a similar argument that we used to have about open source and closed source systems. Purists wanted everything from BIOS up to be Open Source and realistically speaking this is not possible. Users will want proprietary software because the open source ones are not up to the mark. As an example, I run MS Office on my Linux machine using Crossover because Libreoffice(or OpenOffice) still doesn’t have full compatibility. I did this after the nth time OpenOffice messed up the layout after I edited it using OO. Asking users to compromise on their work is not going to happen. They just want things to work and work seamlessly and having to account for compatibility issues or usability issues is a non-starter. Once a system is usable and reliable like Apache server or Linux folks will start using it as evidenced by a majority of the servers on the web running Open Source software. Till that happens we need to ensure we support workarounds and not look down on people using the workarounds.

Remember, perfection is the enemy of good enough as it prevents implementation of good improvements because they are not perfect.

– Suramya

March 22, 2024

Please don’t use AI to identify edible mushrooms or anything else for that matter

Filed under: Artificial Intelligence,My Thoughts,Tech Related — Suramya @ 8:16 PM

AI proponents claim to solve all problems just with the addition of their magical-AI pixie dust. But that claim doesn’t hold up in a majority of the cases when dealing with real world situations. The latest example of this is highlighted in Citizen.org’s report “Mushrooming Risk: Unreliable A.I. Tools Generate Mushroom Misinformation” published earlier this week where they found that: “Emerging A.I. technologies are being deployed to help beginner foragers find edible wild mushrooms. Distinguishing edible mushrooms from toxic mushrooms in the wild is a high-risk activity that requires real-world skills that current A.I. systems cannot reliably emulate. Individuals relying solely on A.I. technology for mushroom identification have been severely sickened and hospitalized after consuming wild mushrooms that A.I. systems misidentified as edible”

Some risk comes from the seeming simplicity of using identification apps. Automation bias – the human tendency to place excess faith and trust in decisions made by machines – must be resisted. Because of how these apps are marketed, users may understandably believe that identifying a mushroom is as simple as snapping a photo of the mushroom and allowing the A.I. to deliver a reliable identification.

To identify a mushroom with confidence, a basic understanding of its anatomy is required – an understanding that many casual users lack. A photo of the top of a mushroom’s cap, for example, will almost never provide enough information to identify its species with any degree of confidence. Physical features on the underside of the cap, the cap margin, the stipe (stem), and the base of the stipe all should be taken into consideration, as should the mushroom’s substrate (i.e., whether it’s growing on the ground or on wood, and what species of wood). Some mushrooms bruise when cut, such as from yellow to blue, and whether they bruise and how quickly are additional identifying characteristics. Smell also can be a key identifying feature – and, for experienced identifiers, so can taste (followed by immediately spitting out the tasted portion). A.I. species-identification tools are not capable of taking any factors into consideration aside from the mushroom’s immediate appearance.

Australian poison researchers tested three applications that are often used by foragers to identify wild mushrooms and they found the following:

  • The best-performing app (Picture Mushroom) provided accurate identifications from digital photos less than half (49%) of the time, and identified toxic mushrooms 44% of the time;
  • In terms of which app was most successful at identifying the death cap (Amanita phalloides), Mushroom Identificator performed the best, identifying 67% of the specimens, compared to Picture Mushroom (60%) and iNaturalist (27%);
  • In some of the apps’ misidentification errors, toxic mushrooms were misidentified as edible mushrooms;

A 49% accuracy might sound ok for a first run of the AI datamodel which has no real world impact, but when you take into account that there is a 51% chance that the app is incorrectly identifying toxic mushrooms as edible mushrooms which can (and in fact has resulted) in deaths, you realize that the Apps are actively dangerous and about as accurate as flipping a coin.

My request to everyone trying out AI applications is to use that for reference only and don’t rely on them for expert opinion but instead leverage human expertise in situations where there is a realworld impact.

Source: Washington Post: Using AI to spot edible mushrooms could kill you

– Suramya

March 19, 2024

Is it possible to untrain a LLM?

Filed under: Artificial Intelligence,My Thoughts,Tech Related — Suramya @ 6:45 PM

We are seeing a lot of cases (I am being polite) where LLM’s are trained on copyright protected data/images or has been trained with incorrect data. Currently as far as I know there is no easy way to fix this other than to train the entire model again from scratch excluding the problematic dataset. This is obviously not feasible and scalable at all.

Another sticky point is the Right to be forgotten which is a part of the GDPR and a few other countries. It requires systems to remove private information about a person from Internet searches and other directories under some circumstances. With LLM’s starting to infest search engines it means that in order to be compliant they need to be able to remove information from the model as well.

So it got me thinking if it would be possible to create an algorithm/process that allows us to untrain an LLM. A search across academic papers and the Internet shows that it is an emerging field of research and as of now mostly theoretical. Primarily because of the way the models work (or are supposed to work) we also claim that the models do not contain any information about a specific image/text by an artist.

Examples of ongoing Research on Transformer editing are Locating and Editing Factual Associations in GPT and Mass-Editing Memory in a Transformer. I did try reading though the papers and understood parts of them, the others kind of went over my head but still this is a research field I will be keeping a close eye on as it will have a large impact of the future of LLM’s and their usefulness.

– Suramya

March 13, 2024

Computers/Technology is not the cure to the worlds problems and its time we stop pretending otherwise

Filed under: Artificial Intelligence,My Thoughts,Tech Related — Suramya @ 11:56 PM

As a software developer we tend to be pretty confident that software or algorithms can solve all the problems in the world because we are using ‘technology’/AI/LLM/Blockchain or whatever the buzzword of the day is to solve a problem. This is an issue because when we look at a problem from an outsider’s perspective it looks fairly simple because we don’t know enough to realize the complexity. Or put another way we don’t know enough to know what we don’t know (the unknown unknowns). As always XKCD has a comic that talks about this:


Megan: Our field has been struggling with this problem for years.
Cueball: Struggle no more! I’m here to solve it with algorithms!
Six months later:
Cueball: Wow, this problem is really hard.
Megan: You don’t say.

To be fair, computers have solved a lot of problems in the world and have had a tremendous impact on it, but that doesn’t mean that they are the key solving for every problem. There is a saying that I love quoting “When all you have is a hammer, everything looks like a nail” and as a developer/techie a lot of us tend to forget this. We look at a problem and think that its an easily solved problem and in most cases that is true during the testing in controlled situations. Once you try the same in the real world things turn out a lot more differently. For example, in a 2020 study, a deep learning model was shown to be more accurate in predicting whether bladder cancer has spread in a patient and other models also showed similar results. Unfortunately, when the model was implemented in the real world the results where a lot more ambiguous and not as rosy as we thought.

The major problem we have right now is that AI can give us information at sounds authoritative and accurate especially if it is about a topic you know nothing about because you don’t quite know well enough to identify the nonsense it sprouts. This is similar to how movies and TV shows portray technology or medical science, they will bombard us with buzz words and if you know nothing about the topic it sounds impressive otherwise you are either completely confused or rolling on the floor laughing.

We need to actually look at the problem, understand it and then start implementing a solution. Move fast and break things is not a feasible working model unless you just want to create a buzz so that your technology/company gets acquired and then it is not your problem to get it to work.

– Suramya

January 20, 2024

NFTs, AI and the sad state of Thought Leaders/Tech Influencer’s

Filed under: Artificial Intelligence,My Thoughts,Tech Related — Suramya @ 11:59 PM

NFTs became such a big thing in last few years, going from millions of dollars to 95% of them being worth $0 in Sept 2023. The whole concept of a JPG of an ugly drawing never made sense to me but you won’t believe the no of people who tried to convince me otherwise.

Today I was watching Lift on Netflix and the first 20 minutes are this group of thieves stealing an NFT Oceans 11 style. It is one of the most ridiculous things that I have seen that someone would spend so much effort showing a NFT heist but the movie was scripted in 2021 when the NFT craze was starting to become insane. Haven’t finished the full movie yet and I doubt I will ever do so as it is very slow/corny and has poor acting and script (as if the whole NFT heist thing didn’t give that away).

It is interesting that all the folks who were shilling NFTs a few years ago have ‘pivoted’ to AI now. If you read the posts from Infuencers you will think that AI is the best thing since sliced bread. Saw the following in my feed and I did question the sanity of the person posting such ‘thought leadership’.

I can suggest an equation that has the potential to impact the future: E=mc?+ Al This equation combines Einstein's famous equation E=mc?, which relates energy (E) to mass (m) and the speed of light (c), with the addition of Al (Artificial Intelligence). By including Al in the equation, it symbolizes the increasing role of artificial intelligence in shaping and transforming our future. This equation highlights the potential for Al to unlock new forms of energy, enhance scientific discoveries, and revolutionize various fields such as healthcare, transportation, and technology.
Technology Consultant’s thoughts on AI

Each influencer keeps posting things like this to make them sound more technical and forward thinking but if you start digging into it then you will find out that they are just regurgitating a word salad that really doesn’t mean much but sounds impressive. Actually now that I think about it, they are just like an AI bot that sounds impressive if you are not experienced in the that area but when you start digging into it, you find out that there is no substance to what they are stating.

The current state of AI is basically a massive hype machine which is trying to get folks to buy things or invest in companies because they are working creating an intelligent entity. Whereas in reality, what we have today is a really good Auto Complete or in some cases really nice Machine learning system. It does some things quite well but is nowhere close to being “Intelligent”. What we have now is something that is really good at extrapolating and guessing which can reduce manual efforts in a lot of things but it is not the cure all that everyone is making it out to be.

For example, Github Copilot automates a lot of grunt work while coding allowing users to reduce the time spent of writing code, but in a recent study it was found that Users Write More Insecure Code with AI Assistants. Now this might change in the future with advances in compute power, data and something that we haven’t even thought of yet. But the problem is that in the short term these can cause immense harm and problems.

– Suramya

October 31, 2023

Firefox built-in local translation works quite well

Filed under: Tech Related — Suramya @ 11:59 PM

Firefox recently released Firefox 118 and one of the interesting features in the release was the inclusion of the local translation of websites. Meaning that all the translation was done locally on the machine running Firefox without sending the content to an external service such as Google Translate.

I have been using it infrequently and am impressed with the quality of the translations. Historically the local translation tools don’t seem to be able to translate well and most of the times we end up with a literal translation of each word. Firefox translate is high quality and uses language packs, which the user has to download once to the local system. Post which the system can start translating websites. The supported languages in the initial release were English, German, French, Italian, Spanish, Portuguese, Dutch, Polish and Bulgarian. Support for additional languages is being added in an iterative manner.

The next release of Firefox (120) will have support for new languages: Catalan, Czech, Estonian, Finnish, Hungarian, Icelandic, Norwegian (Bokmål and Nynorsk), Persian, Russian, Ukrainian. You can try them out in the nightly build but the support is still a work in progress and not ready for prime time use. I am waiting for support for the Indian languages to be added along with support for pages which have content in a mix of languages.

You should download the latest version of Firefox and try it out. It is free and doesn’t have all the monitoring tools that Chrome has built-in.

– Suramya

Older Posts »

Powered by WordPress