Suramya's Blog : Welcome to my crazy life…

March 26, 2024

Sharpshooters bugs catapult their urine out into the world faster than expected

Filed under: Interesting Sites,Science Related — Suramya @ 11:38 PM

Nature is awesome and we are still trying to figure out how a lot of the things common in nature work and how to artificially create the same. So far most of our efforts have been poor copies of what is there in nature.

A bug called sharpshooter has the ability to catapult their urine out into the world from a special liquid-shooter in their butts. Interestingly the water being expelled is faster than the speed at which it was launched which has implications on rocket science, fluid dynamics and many other areas.

“The sharpshooter gets all its nutrition from the thin, watery liquid inside a plant, called xylem sap, which it sucks out with this tube-shaped stylet. That sap has so little nutrition that sharpshooters need to guzzle nonstop. Taking all that liquid in presents a problem – how to move it out. The sharpshooter has evolved the perfect tool for the job: an anal stylus — or butt flicker. Here’s something incredible: Each drop of pee actually travels faster than the speed at which the butt flicker launched it. Learn about this incredible creature’s super-propulsive pee in this video!”

Source: boingboing.netSharpshooters are bugs that catapult their urine out into the world

– Suramya

March 23, 2024

Threads is launching their fediverse integration and that is a good thing

Filed under: Emerging Tech,My Thoughts,Tech Related — Suramya @ 12:19 AM

Yesterday Threads launched a beta version of their fediverse integration and that is actually a great thing. Users who have access can enable federation on their account and users on other Federated systems such as Mastodon can follow them, comment and like their posts as if they were directly on the Threads server. (Comments are not yet propagated back to Threads but is in the works).

First a bit of background information. Threads is a microblogging site similar to Twitter that was created by Meta to take advantage of the fact that Twitter was becoming more and more unusable. Fediverse is a ensemble of social networks which can communicate with each other, while remaining independent platforms. Basically it works similar to how email works, where I can have an account at suramya.com and still be able to communicate with someone who has an account at gmail.com. The system allows the individual servers / systems to communicate over the ActivityPub protocol, and anyone can implement it in their system. For example, my blog is connected to the Fediverse (Mastodon) and all posts here are automatically posted there. If I want I can enable more features here to allow me to browse/post posts across the various servers directly from my blog itself.

As you can imagine this is quite powerful and addresses the “Switching cost” which is an economists’ term for everything you have to give up when you change products or services. For social media networks the cost is that you will lose your connections on the network who don’t agree to shift with you and you loose access to the communities that are based in the network you want to leave. For example, a lot of Hobby networks, apartment groups etc are still on Facebook which means that I have to use FB if I want to keep myself up to date. A lot of government/company accounts are still on Twitter so I need to keep my account there if I want to connect with them or keep myself updated. Now imagine if that wasn’t the case. That is what federation means. People who want to use Threads can continue to use Threads and post over there and I would be able to follow them from my Mastodon account seamlessly and still interact with their posts.

Unfortunately, as always there is a vocal minority who is raising a ruckus about this and are blocking them preemptively. These folks do raise some valid concerns such as posts on Mastodon servers will end up getting monetized and targeted for Ads, or the trolls from Meta services will start targeting folks on Mastodon servers or Fediverse will go the way of the XMPP (Extensible Messaging and Presence Protocol) that was adopted by Google and then slowly phased out/killed…

Lets talk about the last point first because it is a critical point. XMPP was a very promising protocol that was expected to allow users to chat with users of any chat service seamlessly. But it had a lot of issues some of which are discussed in the Hackernews Thread: Why XMPP failed and SMTP didn’t?. The highlights are:

  • So multi device did not work: One device was designated “active” and would receive the messages, others would not.
  • There was no history sync of any sort: If you had checked messages from home, they would not appear in your work computer’s history next morning. If you replied from home, you won’t be able to see your own messages at work PC.
  • Anything mobile (mobile phone, laptop in coffee shops) was also unusable — you cannot start app and catch up on all missing messages. You had to be online to receive them.

These drawbacks (amongst others) did more to kill the protocol than anything Google did. The workarounds to resolve the issues listed above required a lot of custom hacks, kludges and prayers to get them to work and I talk about that from experience since I did setup and manage a XMPP server for a while.

Coming to the other points they are not a new concern that Threads is bringing in. We already have to worry about them in the existing ecosystem where we have servers that abuse the service and end up getting blocked. That is always an option where if they start doing things that we don’t like we can block them. Most posts on the Fediverse are already public by default so nothing is stopping Meta from consuming them to train their advertising system on it. The point about Trolls and Harassment campaigns doesn’t have an easy solution and for some servers that cater to marginalized communities blocking the entire Threads server might be a good idea.

Personally I like the approach Dansup from Pixelfed took, where they allow members, to block specific domains which blocks content and interactions from the domains the user chooses.

Having the ability to follow and interact with the formerly walled gardens from outside the garden is a good thing and we should build it up. Instead of dunking on it because it doesn’t do everything we want it to do. Lowering the Switching cost is a good thing longterm and we need more of this instead of ideological puritans jumping around about how the ‘impure’ masses from ‘the bad place’ are able to talk to and interact with folks in the ‘pure’ server defiling it’s ideological purity.

In a way it is a similar argument that we used to have about open source and closed source systems. Purists wanted everything from BIOS up to be Open Source and realistically speaking this is not possible. Users will want proprietary software because the open source ones are not up to the mark. As an example, I run MS Office on my Linux machine using Crossover because Libreoffice(or OpenOffice) still doesn’t have full compatibility. I did this after the nth time OpenOffice messed up the layout after I edited it using OO. Asking users to compromise on their work is not going to happen. They just want things to work and work seamlessly and having to account for compatibility issues or usability issues is a non-starter. Once a system is usable and reliable like Apache server or Linux folks will start using it as evidenced by a majority of the servers on the web running Open Source software. Till that happens we need to ensure we support workarounds and not look down on people using the workarounds.

Remember, perfection is the enemy of good enough as it prevents implementation of good improvements because they are not perfect.

– Suramya

March 22, 2024

Please don’t use AI to identify edible mushrooms or anything else for that matter

Filed under: Artificial Intelligence,My Thoughts,Tech Related — Suramya @ 8:16 PM

AI proponents claim to solve all problems just with the addition of their magical-AI pixie dust. But that claim doesn’t hold up in a majority of the cases when dealing with real world situations. The latest example of this is highlighted in Citizen.org’s report “Mushrooming Risk: Unreliable A.I. Tools Generate Mushroom Misinformation” published earlier this week where they found that: “Emerging A.I. technologies are being deployed to help beginner foragers find edible wild mushrooms. Distinguishing edible mushrooms from toxic mushrooms in the wild is a high-risk activity that requires real-world skills that current A.I. systems cannot reliably emulate. Individuals relying solely on A.I. technology for mushroom identification have been severely sickened and hospitalized after consuming wild mushrooms that A.I. systems misidentified as edible”

Some risk comes from the seeming simplicity of using identification apps. Automation bias – the human tendency to place excess faith and trust in decisions made by machines – must be resisted. Because of how these apps are marketed, users may understandably believe that identifying a mushroom is as simple as snapping a photo of the mushroom and allowing the A.I. to deliver a reliable identification.

To identify a mushroom with confidence, a basic understanding of its anatomy is required – an understanding that many casual users lack. A photo of the top of a mushroom’s cap, for example, will almost never provide enough information to identify its species with any degree of confidence. Physical features on the underside of the cap, the cap margin, the stipe (stem), and the base of the stipe all should be taken into consideration, as should the mushroom’s substrate (i.e., whether it’s growing on the ground or on wood, and what species of wood). Some mushrooms bruise when cut, such as from yellow to blue, and whether they bruise and how quickly are additional identifying characteristics. Smell also can be a key identifying feature – and, for experienced identifiers, so can taste (followed by immediately spitting out the tasted portion). A.I. species-identification tools are not capable of taking any factors into consideration aside from the mushroom’s immediate appearance.

Australian poison researchers tested three applications that are often used by foragers to identify wild mushrooms and they found the following:

  • The best-performing app (Picture Mushroom) provided accurate identifications from digital photos less than half (49%) of the time, and identified toxic mushrooms 44% of the time;
  • In terms of which app was most successful at identifying the death cap (Amanita phalloides), Mushroom Identificator performed the best, identifying 67% of the specimens, compared to Picture Mushroom (60%) and iNaturalist (27%);
  • In some of the apps’ misidentification errors, toxic mushrooms were misidentified as edible mushrooms;

A 49% accuracy might sound ok for a first run of the AI datamodel which has no real world impact, but when you take into account that there is a 51% chance that the app is incorrectly identifying toxic mushrooms as edible mushrooms which can (and in fact has resulted) in deaths, you realize that the Apps are actively dangerous and about as accurate as flipping a coin.

My request to everyone trying out AI applications is to use that for reference only and don’t rely on them for expert opinion but instead leverage human expertise in situations where there is a realworld impact.

Source: Washington Post: Using AI to spot edible mushrooms could kill you

– Suramya

March 19, 2024

Is it possible to untrain a LLM?

Filed under: Artificial Intelligence,My Thoughts,Tech Related — Suramya @ 6:45 PM

We are seeing a lot of cases (I am being polite) where LLM’s are trained on copyright protected data/images or has been trained with incorrect data. Currently as far as I know there is no easy way to fix this other than to train the entire model again from scratch excluding the problematic dataset. This is obviously not feasible and scalable at all.

Another sticky point is the Right to be forgotten which is a part of the GDPR and a few other countries. It requires systems to remove private information about a person from Internet searches and other directories under some circumstances. With LLM’s starting to infest search engines it means that in order to be compliant they need to be able to remove information from the model as well.

So it got me thinking if it would be possible to create an algorithm/process that allows us to untrain an LLM. A search across academic papers and the Internet shows that it is an emerging field of research and as of now mostly theoretical. Primarily because of the way the models work (or are supposed to work) we also claim that the models do not contain any information about a specific image/text by an artist.

Examples of ongoing Research on Transformer editing are Locating and Editing Factual Associations in GPT and Mass-Editing Memory in a Transformer. I did try reading though the papers and understood parts of them, the others kind of went over my head but still this is a research field I will be keeping a close eye on as it will have a large impact of the future of LLM’s and their usefulness.

– Suramya

March 13, 2024

Computers/Technology is not the cure to the worlds problems and its time we stop pretending otherwise

Filed under: Artificial Intelligence,My Thoughts,Tech Related — Suramya @ 11:56 PM

As a software developer we tend to be pretty confident that software or algorithms can solve all the problems in the world because we are using ‘technology’/AI/LLM/Blockchain or whatever the buzzword of the day is to solve a problem. This is an issue because when we look at a problem from an outsider’s perspective it looks fairly simple because we don’t know enough to realize the complexity. Or put another way we don’t know enough to know what we don’t know (the unknown unknowns). As always XKCD has a comic that talks about this:


Megan: Our field has been struggling with this problem for years.
Cueball: Struggle no more! I’m here to solve it with algorithms!
Six months later:
Cueball: Wow, this problem is really hard.
Megan: You don’t say.

To be fair, computers have solved a lot of problems in the world and have had a tremendous impact on it, but that doesn’t mean that they are the key solving for every problem. There is a saying that I love quoting “When all you have is a hammer, everything looks like a nail” and as a developer/techie a lot of us tend to forget this. We look at a problem and think that its an easily solved problem and in most cases that is true during the testing in controlled situations. Once you try the same in the real world things turn out a lot more differently. For example, in a 2020 study, a deep learning model was shown to be more accurate in predicting whether bladder cancer has spread in a patient and other models also showed similar results. Unfortunately, when the model was implemented in the real world the results where a lot more ambiguous and not as rosy as we thought.

The major problem we have right now is that AI can give us information at sounds authoritative and accurate especially if it is about a topic you know nothing about because you don’t quite know well enough to identify the nonsense it sprouts. This is similar to how movies and TV shows portray technology or medical science, they will bombard us with buzz words and if you know nothing about the topic it sounds impressive otherwise you are either completely confused or rolling on the floor laughing.

We need to actually look at the problem, understand it and then start implementing a solution. Move fast and break things is not a feasible working model unless you just want to create a buzz so that your technology/company gets acquired and then it is not your problem to get it to work.

– Suramya

March 11, 2024

Fans who froze themselves to watch a match rewarded by having to get their frostbitten limbs amputated

Filed under: My Thoughts — Suramya @ 4:14 PM

People are crazy about sports. In India its cricket, US it’s football, UK is rugby and Brazil has soccer. I am not a huge fan but even if I was I don’t think I would sit outside in minus-27 degrees just to watch a game. However, there are people in the US who actually did this to watch the Kansas city & Chiefs game earlier this year and ended up getting hypothermia & frostbite bad enough that 70% of them need amputations.

Yes you read that right. 70% of the fools who sat outside in the freezing cold are now looking at having their body parts amputated just because they didn’t want to leave a football match. I mean come on…

The other 30% are not out completely either, they’ll have sensitivity and pain for the rest of their lives and always will be more susceptible to frostbite in the future. The weather is not a joke, whether its the heat or the cold. You can pretend you don’t believe in it and that it is not becoming more extreme but you will still be suffering if you don’t take precautions and not take chances by avoiding exposure whenever possible.

– Suramya

March 8, 2024

alphaXiv – Forum to discuss any papers posted on ArXiv

Filed under: Interesting Sites,Science Related — Suramya @ 4:40 PM

Research papers can be hard to understand and sometimes you don’t have people or a community around where you can discuss a given paper and collaborate. In Bangalore we have a community called ‘Papers we love’ which did something similar but for a single paper every week as an in-person event.

There is a site called alphaxiv.org/ that I found out about recently. It is an interactive forum for anyone to comment line-by-line on arXiv papers. This allows you to collaborate and discuss the paper with others who find it interesting as well. One thing to keep in mind is that ArXiv papers can be pre-published versions which are not fully peer reviewed yet.

To start using the site, you need to create an account with them. If you don’t have an educational institute email address then you will have to verify your phone no as well in the near future. As of now you don’t need to do that. I guess they are doing this to prevent automated spam. Once you have an account and are logged in using the site is quite easy. You can browse for currently trending topics or search for a specific paper. If you are reading a paper on arXiv, you can change “arxiv.org” to “alphaxiv.org” in your URL search bar, and you will automatically be redirected to the forum for that paper.

Check it out when you get a chance as it is quite interesting.

– Suramya

March 7, 2024

Cloudflare announces Firewall for LLMs to protect them

Filed under: Artificial Intelligence,Computer Security,My Thoughts — Suramya @ 10:52 PM

As is always the case when the attackers invent technology / systems to attack a system the defenders will immediately come up with a technology to protect (might not always be great protection at the beginning). Yesterday I posted about Researchers demo the first worm that spreads through LLM prompt injection and today while going through my feeds I saw the news that earlier this week cloudflare announced a Firewall for AI . Initially when I read the headline I thought it was yet another group of people who are claiming to have created a ‘perfect firewall’ using AI. Thankfully that was not the case and in this instance it looks like an interesting application that will probably become as common as the regular firewall.

What this system does is quite simple, it is setup in front of a LLM so that all interactions with the LLM goes through the firewall and every request with an LLM prompt is scanned for patterns and signatures of possible attacks. As per their blog post attacks like Prompt Injection, Model Denial of Service, and Sensitive Information Disclosure can be mitigated by adopting a proxy security solution like Cloudflare Firewall for AI.

Firewall for AI is an advanced Web Application Firewall (WAF) specifically tailored for applications using LLMs. It will comprise a set of tools that can be deployed in front of applications to detect vulnerabilities and provide visibility to model owners. The tool kit will include products that are already part of WAF, such as Rate Limiting and Sensitive Data Detection, and a new protection layer which is currently under development. This new validation analyzes the prompt submitted by the end user to identify attempts to exploit the model to extract data and other abuse attempts. Leveraging the size of Cloudflare network, Firewall for AI runs as close to the user as possible, allowing us to identify attacks early and protect both end user and models from abuses and attacks.

OWASP has published their Top 10 for Large Language Model Applications, which is a fantastic read and a good overview of the security risks targeting LLM’s. As per cloudfare this firewall mitigates some of the risks highlighted in OWASP for LLM’s. I would suggest taking the announcement with a grain of salt till we have independent validation of the claims. That being said it is def a step in the correct direction though.

– Suramya

Source: Hacker News: Cloudflare Announces Firewall for AI

March 6, 2024

Researchers demo the first worm that spreads through LLM prompt injection

Filed under: Artificial Intelligence,Computer Security,Computer Software — Suramya @ 10:17 PM

In the past year we have seen an uptick in the tech industry looking towards embedding LLM (Large Language Models) or AI as they are being pitched to the world in all possible places. Windows 11 now has built in Copilot that is extremely hard to disable. Email systems are using LLM’s to get additional details/information using the data from the email to add context etc. This creates new attack surfaces that attackers can target and we have seen instances where attackers have used prompt injection to gain access to data or systems that were restricted.

Building on top of that researchers have now created (and demo’d) the first worm that spreads through prompt injection. This is breakthrough work similar to how the Morris Worm was in the late 80’s. Basically, researchers created an email which has an adversarial prompt embedded in it. This prompt is then ingested by an LLM (using Retrieval-Augmented Generation which allows it to enhance the reliability of the LLM by fetching data from external sources when the email is processed by the LLM) where it jailbreaks the GenAI service and can steal data from the emails (or do whatever else the attacker wants such as changing email text, removing data etc). In addition the prompt also has the ability to make the email assistant forward the email with the malicious prompt to other email addresses allowing it to spread. The researchers have christened their worm as Morris II giving homage to the first email worm.

Abstract: In the past year, numerous companies have incorporated Generative AI (GenAI) capabilities into new and existing applications, forming interconnected Generative AI (GenAI) ecosystems consisting of semi/fully autonomous agents powered by GenAI services. While ongoing research highlighted risks associated with the GenAI layer of agents (e.g., dialog poisoning, membership inference, prompt leaking, jailbreaking), a critical question emerges: Can attackers develop malware to exploit the GenAI component of an agent and launch cyber-attacks on the entire GenAI ecosystem?

This paper introduces Morris II, the first worm designed to target GenAI ecosystems through the use of adversarial self-replicating prompts. The study demonstrates that attackers can insert such prompts into inputs that, when processed by GenAI models, prompt the model to replicate the input as output (replication), engaging in malicious activities (payload). Additionally, these inputs compel the agent to deliver them (propagate) to new agents by exploiting the connectivity within the GenAI ecosystem. We demonstrate the application of Morris II against GenAI-powered email assistants in two use cases (spamming and exfiltrating personal data), under two settings (black-box and white-box accesses), using two types of input data (text and images). The worm is tested against three different GenAI models (Gemini Pro, ChatGPT 4.0, and LLaVA), and various factors (e.g., propagation rate, replication, malicious activity) influencing the performance of the worm are evaluated.

This is pretty fascinating work and I think that this kind of attack will start becoming more common as the LLM usage goes up. The research paper is available at: ComPromptMized: Unleashing Zero-click Worms that Target GenAI-Powered Applications.

– Suramya

March 5, 2024

Yet another example on why we need controls and audit logs around sensitive data

Filed under: Computer Security,My Thoughts — Suramya @ 11:40 AM

People like the one in the example below are why cyber security and privacy policies insist on having access control rules and oversight on who has access to data and audit logs for why they are accessing that data.

My favourite thing about working in HR is being able to look up anyone's age or salary. Its like having a version of IMDB, but for real people.
My favorite thing about working in HR is being able to look up anyone’s age or salary. Its like having a version of IMDB, but for real people.

If you are building of maintaining a system that has sensitive data, or PII (Personal Identifiable Information) you need to ensure that you not just have access controls around the data but also have a way to audit who is accessing the data and for what reason. If the reason is not work related then action should be taken and their access revoked. Law enforcement has access to various monitoring systems and there have been multiple examples in the past where law enforcement officers looked up their ex’s, stalked people etc. Again, that is something that can be prevented in part by having strictly enforced policies on who can access the data and for what purpose.

In one of my previous companies, everytime you accessed any production or critical systems you had to give a reason and link to either a support or incident ticket. Then a manager and the system owner would review the access log along with the keylogs from the sessions and sign off on it. With that they were personally confirming that all activity was required and was justified. If that wasn’t the case then action would be taken against the person who signed off on the logs along with the person who did the accessing.

We need more of that in all systems.

– Suramya

Older Posts »

Powered by WordPress