Suramya's Blog : Welcome to my crazy life…

December 5, 2023

Near real-time Generative AI art is now possible using LCM-LoRA model

Filed under: Artificial Intelligence,My Thoughts — Suramya @ 6:21 PM

There are a lot of advancements happening in Generative AI and while I don’t agree that we have created intelligence (at least not yet) the advances in the Computer generated art are phenomenal. The most recent one is LCM-LoRA, short for “Latent Consistency Model- Low-Rank Adaptation” developed by researchers at the Institute for Interdisciplinary Information Sciences (IIIS) at Tsinghua University in China. Their paper LCM-LORA: A Universal Stable-Diffusion Acceleration Module (PDF) has been published on Arxiv.org last week.

This model allows a system to generate an image given a text prompt in near real-time instead of having to wait a few seconds which was the case earlier. So you can modify the prompt as you go and get immediate feedback which can then be used to modify a prompt. You can test it out at Fal.ai

Latent Consistency Models (LCMs) (Luo et al., 2023) have achieved impressive performance in accelerating text-to-image generative tasks, producing high quality images with minimal inference steps. LCMs are distilled from pre-trained latent diffusion models (LDMs), requiring only ∼32 A100 GPU training hours. This report further extends LCMs’ potential in two aspects: First, by applying LoRA distillation to Stable-Diffusion models including SD-V1.5 (Rombach et al., 2022), SSD-1B (Segmind., 2023), and SDXL (Podell et al., 2023), we have expanded LCM’s scope to larger models with significantly less memory consumption, achieving superior image generation quality. Second, we identify the LoRA parameters obtained through LCM distillation as a universal Stable-Diffusion acceleration module, named LCM-LoRA. LCM-LoRA can be directly plugged into various Stable-Diffusion fine-tuned models or LoRAs with-out training, thus representing a universally applicable accelerator for diverse image generation tasks. Compared with previous numerical PF-ODE solvers such as DDIM (Song et al., 2020), DPM-Solver (Lu et al., 2022a;b), LCM-LoRA can be viewed as a plug-in neural PF-ODE solver that possesses strong generalization abilities. Project page: https://github.com/luosiallen/latent-consistency-model.

The technique works not only for 2D images, but 3D assets as well, meaning artists could theoretically quickly create immersive environments instantly for use in mixed reality (AR/VR/XR), computer and video games, and other experiences. I did try going over the paper but a majority of it went over my head. That being said it is fun playing with this tech.

The model doesn’t address the existing issues with AI Art such as how should the artist’s whose art was used as part of the training data sets be compensated, or the issue of copyright infringement as the art is not public art. We also need to start thinking about who would own the copyright to the art generated using AI. There are a few open court cases on this topic but as of now the courts have refused to give any copyright protection to art generated by AI which would make it a non-starter for use in any commercial project such as a movie or game etc.

– Suramya

Source: Realtime generative AI art is here thanks to LCM-LoRA

October 28, 2023

New tool called Nightshade allows artists to ‘poison’ AI models

Filed under: Artificial Intelligence,Tech Related — Suramya @ 12:20 AM

Generative AI has burst into the scene with a bang and while the Image generation tech is not perfect yet it is getting more and more sophisticated. Due to the way the tech works, the model needs to be trained on existing art and most of the models in the market right now have been trained on artwork available on the internet whether or not it was in the public domain. Because of this multiple lawsuits have been filed against AI companies by artists.

Unfortunately this has not stopped AI models from using these images as training data, so while the question is being debated in the courts, the researchers over at University of Chicago have created a new tool called Nightshade that allows artists to poison the training data for AI models. This functionality will be an optional setting in the their prior product Glaze, which cloak’s digital artwork and alter its pixels to confuse AI models about its style. Nightshade goes one step further by making the AI learn the wrong names for objects etc in a given image.

Optimized prompt-specific poisoning attack we call Nightshade. Nightshade uses multiple optimization techniques (including targeted adversarial perturbations) to generate stealthy and highly effective poison samples, with four observable benefits.

  • Nightshade poison samples are benign images shifted in the feature space. Thus a Nightshade sample for the prompt “castle” still looks like a castle to the human eye, but teaches the model to produce images of an old truck.
  • Nightshade samples produce stronger poisoning effects, enabling highly successful poisoning attacks with very few (e.g., 100) samples.
  • Nightshade samples produce poisoning effects that effectively “bleed-through” to related concepts, and thus cannot be circumvented by prompt replacement, e.g., Nightshade samples poisoning “fantasy art” also affect “dragon” and “Michael Whelan” (a well-known fantasy and SciFi artist).
  • We demonstrate that when multiple concepts are poisoned by Nightshade, the attacks remain successful when these concepts appear in a single prompt, and actually stack with cumulative effect. Furthermore, when many Nightshade attacks target different prompts on a single model (e.g., 250 attacks on SDXL), general features in the model become corrupted, and the model’s image generation function collapses.

In their tests the researchers poisoned images of dogs to include information in the pixels that made it appear to an AI model as a cat. After sampling and learning from just 50 poisoned image samples, the AI began generating images of dogs with strange legs and unsettling appearances. After 100 poison samples, it reliably generated a cat when asked by a user for a dog. After 300, any request for a dog returned a near perfect looking cat.

Obviously this is not a permanent solution as the AI training models will start working on fixing this issue immediately and then the whack-a-mole process of fixes/updates to one up will continue (similar to how virus & anti-virus programs have been at it) for the foreseeable future.

Full paper: Prompt-Specific Poisoning Attacks on Text-to-Image Generative Models (PDF)
Source: Venturebeat: Meet Nightshade, the new tool allowing artists to ‘poison’ AI models

– Suramya

October 9, 2023

Microsoft AI responds with absolute nonsense when asked about a prominent Cyber Security expert

Filed under: Artificial Intelligence,Computer Software — Suramya @ 11:39 PM

The more I read about the Microsoft implementation of ‘AI’ the more I wonder what on earth are they thinking? Their AI system is an absolute shambles and about 99% of the output is nonsense. See the example below:

I did not realise how inaccurate Microsoft's Al is. It's really bad
Microsoft AI returns absolute nonsense when asked about who Kevin Beaumont is

I did not realise how inaccurate Microsoft’s Al is. It’s really bad This is just one example – it lists a range of lawsuits I’ve filed, but they’re all fictional – it invented them and made up the citations. It says I gave Microsoft’s data to @briankrebs. It says Krebs is suing me. It says @malwaretech works for me. The list goes on and on. Very eyebrow raising this is being baked into next release of Windows 11 and Office. It will directly harm people who have no knowledge or recourse.

I mean I can understand if it got one or two facts wrong because the data sources might not be correct, but to get every single detail wrong requires extra skill. The really scary part is that Google AI search is not much better and both companies are in a race to replace their search engine with AI responses. Microsoft is going a step further and including it as a default option in Windows. I wonder how much of the user data being stored on a windows computer is being used to train these AI engines.

There needs to be an effort to create a search engine that filters out these AI generated responses and websites to go back to the old style search engines that actually returned useful & correct results.

– Suramya

October 7, 2023

Oxford researchers develop promising 3D printing method for repairing brain injuries

Filed under: Emerging Tech,Science Related — Suramya @ 11:59 PM

Brain injuries are traditionally extremely hard for us to cure with the current state of medical knowledge. Mild cases of Traumatic brain injury (TBI) or concussion can be treated with rest and slow return to normal activities. However, for severe TBI’s the care mostly focuses on stabilizing the patient, ensuring the brain is getting enough enough oxygen, controlling blood and brain pressure, and preventing further injury to the head or neck. Post stabilization if the patient is stable we use therapies to recover functions, relearn skills etc. But that is just training the brain to use different neurons to perform tasks that the damaged ones used to do.

The researchers at the University of Oxford have had a breakthrough that brings the ability to provide tailored repairs for those who suffer brain injuries. The researchers demonstrated for the first time that neural cells can be 3D printed to mimic the architecture of the cerebral cortex. This research has been published in Nature Communications earlier this month.

Engineering human tissue with diverse cell types and architectures remains challenging. The cerebral cortex, which has a layered cellular architecture composed of layer-specific neurons organised into vertical columns, delivers higher cognition through intricately wired neural circuits. However, current tissue engineering approaches cannot produce such structures. Here, we use a droplet printing technique to fabricate tissues comprising simplified cerebral cortical columns. Human induced pluripotent stem cells are differentiated into upper- and deep-layer neural progenitors, which are then printed to form cerebral cortical tissues with a two-layer organization. The tissues show layer-specific biomarker expression and develop a structurally integrated network of processes. Implantation of the printed cortical tissues into ex vivo mouse brain explants results in substantial structural implant-host integration across the tissue boundaries as demonstrated by the projection of processes and the migration of neurons, and leads to the appearance of correlated Ca2+ oscillations across the interface. The presented approach might be used for the evaluation of drugs and nutrients that promote tissue integration. Importantly, our methodology offers a technical reservoir for future personalized implantation treatments that use 3D tissues derived from a patient’s own induced pluripotent stem cells.

I did try reading the paper but it pretty much went over my head. However I am extremely happy to see significant progress being made in this field and look forward to reading more about this technology as it is refined and improved.

Source: Oxford researchers develop 3D printing method that shows promise for repairing brain injuries

– Suramya

September 7, 2023

Youtube2Webpage: Create Websites with Text from Videos

In my last post, I had talked about preferring text content to videos and coincidentally my Hacker News feed happened to cover a tool that takes a video link and creates a webpage with a transcript generated from the video’s closed captions paired with screenshots of the video. The program is called Youtube-to-Webpage. It is a Perl script that uses yt-dlp & ffmpeg to do the processing.

I tried it out using the curl video I talked about in the previous command as the input and the software did a decent job capturing the details. The output is very plain and looks like the following:

Transcription of Curl Training video
Transcription of Curl Training video

Since the program uses the built-in YouTube captions for getting the text from the video, the transcription is only as good as how good the captions are. One enhancement, that could make it better is to use a Speech-to-Text engine and use that text in the output. The slightly tricky part would be to match the screenshots with the audio/transcription timestamps.

Check it out if you prefer to read text instead of videos. I wonder how the output would look if we feed this to a LLM and ask it to make it like an article. That can be something we can explore for the next post 🙂

– Suramya

August 31, 2023

Using LLM’s to change writing style to hide author?

Filed under: Artificial Intelligence,My Thoughts,Tech Related — Suramya @ 12:17 PM

It is fairly well known that folks can identify a writer based on their writing style. In fact there is a whole field of work called Stylometry that analyses writings to try to attribute authorship of documents/writings. This is used in when new text/writings are found that are not attributed to any person or when agencies receive letters or there are articles posted on extremist sites.

In 1964, Frederick Mosteller and David Wallace published a three-year study of the distribution of common words in the Federalist Papers and showed that the writing style of Alexander Hamilton and James Madison differed in subtle ways. The researchers analyzed some disputed papers that both writers claimed to have written and based on their analysis of the respective writing styles of both authors were able to identify that the papers were likely to have come from Madison’s pen. Since then the technique has only improved, with AI and Machine learning algorithms coming into play researchers are able to identify the writing style from a post quickly and mostly accurately. Keep in mind that this is still more of an art than a science. However, it works well enough that people were able to identify the author of a book: “The Cuckoo’s Calling, by Robert Galbraith” as none other than JK Rowling.

Now, as we all know “With every action there is an equal and opposite reaction” and there has been a lot of work put into what is called Adversarial stylometry. Which is the practice of consciously changing the writing style when writing so remove the identifying characteristics that can be used to figure out who wrote the text. Historically, this was achieved by paraphrasing the text and by having another person edit & modify the content.

With the recent advances in LLM especially systems like ChatGPT it might be possible to use them to whitewash the text and remove enough identifying marks that the text becomes relatively more anonymous. I tried it out with the previous paragraph as the input and got the following as the result:

As commonly understood, the principle of “For every action, there is an opposing reaction” holds true. Considerable effort has been dedicated to a field known as Adversarial Stylometry. This involves intentionally altering one’s writing style to eliminate distinctive traits that could be employed to determine the authorship of a text. In the past, this was accomplished by rephrasing the content and enlisting the assistance of a third party to revise and adjust the material.

As you can see the output is significantly different in both words, tone and phrasing that it would it hard to attribute correctly. Do it a few times in a loop and after ensuring the original meaning is not lost you can use it and be relatively sure that it will be difficult to identify the author of the text.

Obviously, if LLM’s are going to be used to obfuscate text then other LLM’s will be created that try to de-obfuscate the text and the systems will keep getting more and more smarter.

– Suramya

August 19, 2023

ChatGPT is not an all powerful AI it can’t even perform simple multiplications

Filed under: Artificial Intelligence — Suramya @ 9:49 PM

In my previous post I was talking about how I am usually quite skeptical of the claims being made about AI and how it is going to solve the world’s problems. There are some good usecases and there are some advantages to adding Machine Learning to the system. I have written about a few of them here.

But we are nowhere close to the Singularity and most of the folks who are pushing AI right now are the same ones who where pushing NFT’s earlier and Crypto before that… Case in point, according to founder of AI is an existential threat and OpenAI ChatGPT is a showcase of how advanced AI has become. However, this ‘advanced’ AI can’t even multiply two numbers to get the correct answer. I asked ‘What is 7467*5730’ and it gave me ‘42,777,810’. Then I asked my calculator to perform the same multiplication and that gave me: (Drum roll please…) 42,785,910. (See the screenshots below)


ChatGPT can’t perform a simple multiplication

Others have faced the same issue and I rest my case about why we need to be very careful using these AI assistants because they give answers that look legitimate and correct but are completely wrong or have subtle errors that need expertise to identify.

– Suramya

August 18, 2023

The Judge by GoblinTools is a great tool to tell how written text comes across qualitatively

Readers of this blog (and those who know me) know that I am usually quite skeptical of tools that claim to judge the emotional content of text or tell how the text comes across qualitatively. However, this new site called The Judge – Goblin Tools is quite impressive. As per James over on Mastodon, the tool was created to help neurodivergent folks communicate better with the normals avoiding misunderstanding, bad assumptions about intent, etc.

I tested it out with content from my blog and I have to say that it summarized the tone and content quite nicely. For example below is the summary it gave for my post on Celebrating India’s 76th Independence day!

Emotionally, this text comes across as proud, patriotic, and reflective. It begins by acknowledging and celebrating India’s 76th Independence Day, recognizing the significance of gaining freedom from British rule. It highlights the sacrifices, bloodshed, and tears endured by many individuals who fought for India’s freedom, emphasizing that not all their names are known and that this is a tragic aspect. The text expresses a sense of frustration with the tendency to focus solely on iconic figures like Gandhi and Nehru, disregarding the efforts of numerous other freedom fighters.

As the text looks towards the future, there is a strong emotional undertone of remembrance and gratitude towards those who paved the way for freedom. It cites famous slogans from freedom fighters, adding a sense of inspiration and reverence for their contribution. The passage acknowledges the importance of remembering the sacrifices made by past generations for the betterment of the nation.

There is a shift in emotion when it proudly proclaims India’s progress and achievements in various fields, such as technology, space exploration, and healthcare. It displays a sense of national pride and triumph over past perceptions that diminished India’s image. The text mentions India’s relief efforts during the COVID-19 pandemic and positions India as a nation capable of compassionate and humanitarian actions.

Moving forward, the text acknowledges areas that still require improvement, alluding to societal customs and mindsets. It expresses a desire for growth and change while also highlighting the need for self-reflection in other countries that criticize India, pointing out their own flaws and challenges.

The passage concludes with a message of unity and inclusivity, promoting the idea that humanity should transcend differences in caste, religion, nationality, sexuality, and age. It embraces the concept of a global family and emphasizes the values of humanity and national loyalty.

Overall, this text conveys a mixture of pride in India’s accomplishments, gratitude for the sacrifices made by freedom fighters, and a sense of reflection on the challenges that still exist. It aims to inspire unity, progress, and a commitment to the values that define the nation.

It did manage to capture the exact tone I was going for so that is quite nice. According to their About Us page, the site uses AI technologies in the backend including OpenAI’s models. They are planning to move to open source alternatives as and when they become available and reliable. Keeping that in mind, I would not recommend using this to analyze any text that has private or confidential information because it might be used to train OpenAI’s models. That being said I would be keeping an eye on this software and once they go to an opensource model and allow us to run a local instance of the tool I certainly wouldn’t mind using as part of my writing toolkit.

– Suramya

July 27, 2023

GPS Data Could potentially be used to Detect Large Earthquakes in advance

Filed under: Emerging Tech,My Thoughts — Suramya @ 10:31 PM

Earthquakes are extremely devastating and because we don’t have a way to predict them in advance they end up taking a huge toll on lives. The existing systems for earthquake prediction are fraught with false positives to the point of being useless. However, that hasn’t stopped people from trying to predict them, and in a new paper researchers Quentin Bletery and Jean-Mathieu Nocquet claim to have found a unique way to predict them up to 2 hours in advance using GPS data.

They analyzed high-rate GPS time series before 90 different earthquakes that were magnitude 7 and above to find a precursor signal and they observed a subtle signal that rose from the noise about 2 hours before these major earthquakes occurred. This looks extremely promising and if validated can change how we approach disaster management of earthquakes. However, the study still needs to be validated and we don’t yet know if the precursor signal could ever be measured for individual events with the accuracy needed to provide a useful warning.

The existence of an observable precursory phase of slip on the fault before large earthquakes has been debated for decades. Although observations preceding several large earthquakes have been proposed as possible indicators of precursory slip, these observations do not directly precede earthquakes, are not seen before most events, and are also commonly observed without being followed by earthquakes. We conducted a global search for short-term precursory slip in GPS data. We summed the displacements measured by 3026 high-rate GPS time series—projected onto the directions expected from precursory slip at the hypocenter—during 48 hours before 90 (moment magnitude ≥7) earthquakes. Our approach reveals a ≈2-hour-long exponential acceleration of slip before the ruptures, suggesting that large earthquakes start with a precursory phase of slip, which improvements in measurement precision and density could more effectively detect and possibly monitor.

This is an area where Machine Learning might prove to be useful to extrapolate and predict but that being said we still need to validate and verify before implementing it or depending on it. The paper with their findings was published in Science (DOI: 10.1126/science.adg2565)

Source: Hacker News: Early Warning: GPS Data Could Detect Large Earthquakes Hours Before They Happen

– Suramya

July 26, 2023

New Double sided solar panels nearly double the power production per panel

Filed under: Emerging Tech,Science Related — Suramya @ 11:12 PM

Solar Panels have come a long way in the last few decades and their efficiency has been consistently increasing over the time as well. We have gone from an efficiency of ~10% on an average to more than 25% today. Now researchers from US Department of Energy’s National Renewable Energy Laboratory (NREL) have created a new double sided solar panel that generates electricity from both sides of the panel. Basically it uses reflected light on the back side of the panel to generate power. Even though the amount of power generated by the backside of the panel is only ~90% of the power generated by the front end adding them both together means that a single panel is generating almost double the power than traditional panels.

Bifacial photovoltaics (PV) harvest solar irradiance from both their front and rear surfaces, boosting energy conversion efficiency to maximize their electrical power production. For single-junction perovskite solar cells (PSCs), the performance of bifacial configurations is still far behind that of their state-of-the-art monofacial counterparts. Here, we report on highly efficient, bifacial, single-junction PSCs based on the p-i-n (or inverted) architecture. We used optical and electrical modeling to design a transparent conducting rear electrode for bifacial PSCs to enable optimized efficiency under a variety of albedo illumination conditions. The bifaciality of the PSCs was about 91%–93%. Under concurrent bifacial measurement conditions, we obtained equivalent, stabilized bifacial power output densities of 26.9, 28.5, and 30.1 mW/cm2 under albedos of 0.2, 0.3, and 0.5, respectively. We further showed that bifacial perovskite PV technology has the potential to outperform its monofacial counterparts with higher energy yields and lower levelized cost of energy (LCOE).

This is a significant breakthrough and the research was published in the journal Joule titled “Highly efficient bifacial single-junction perovskite solar cells”.

I love the fact that renewable energy is getting so much more push nowadays. I have been exploring putting solar at my place, but since I am in an apartment I don’t have much options available that would make financial sense. The panels I could put up would barely supply enough power making the whole thing not cost effective. Parents have put solar at our house in Delhi and my cousin has done the same at their farm where most of their power consumption is managed by their solar setup.

– Suramya

« Newer PostsOlder Posts »

Powered by WordPress