Suramya's Blog : Welcome to my crazy life…

December 5, 2023

Near real-time Generative AI art is now possible using LCM-LoRA model

Filed under: Artificial Intelligence,My Thoughts — Suramya @ 6:21 PM

There are a lot of advancements happening in Generative AI and while I don’t agree that we have created intelligence (at least not yet) the advances in the Computer generated art are phenomenal. The most recent one is LCM-LoRA, short for “Latent Consistency Model- Low-Rank Adaptation” developed by researchers at the Institute for Interdisciplinary Information Sciences (IIIS) at Tsinghua University in China. Their paper LCM-LORA: A Universal Stable-Diffusion Acceleration Module (PDF) has been published on Arxiv.org last week.

This model allows a system to generate an image given a text prompt in near real-time instead of having to wait a few seconds which was the case earlier. So you can modify the prompt as you go and get immediate feedback which can then be used to modify a prompt. You can test it out at Fal.ai

Latent Consistency Models (LCMs) (Luo et al., 2023) have achieved impressive performance in accelerating text-to-image generative tasks, producing high quality images with minimal inference steps. LCMs are distilled from pre-trained latent diffusion models (LDMs), requiring only ∼32 A100 GPU training hours. This report further extends LCMs’ potential in two aspects: First, by applying LoRA distillation to Stable-Diffusion models including SD-V1.5 (Rombach et al., 2022), SSD-1B (Segmind., 2023), and SDXL (Podell et al., 2023), we have expanded LCM’s scope to larger models with significantly less memory consumption, achieving superior image generation quality. Second, we identify the LoRA parameters obtained through LCM distillation as a universal Stable-Diffusion acceleration module, named LCM-LoRA. LCM-LoRA can be directly plugged into various Stable-Diffusion fine-tuned models or LoRAs with-out training, thus representing a universally applicable accelerator for diverse image generation tasks. Compared with previous numerical PF-ODE solvers such as DDIM (Song et al., 2020), DPM-Solver (Lu et al., 2022a;b), LCM-LoRA can be viewed as a plug-in neural PF-ODE solver that possesses strong generalization abilities. Project page: https://github.com/luosiallen/latent-consistency-model.

The technique works not only for 2D images, but 3D assets as well, meaning artists could theoretically quickly create immersive environments instantly for use in mixed reality (AR/VR/XR), computer and video games, and other experiences. I did try going over the paper but a majority of it went over my head. That being said it is fun playing with this tech.

The model doesn’t address the existing issues with AI Art such as how should the artist’s whose art was used as part of the training data sets be compensated, or the issue of copyright infringement as the art is not public art. We also need to start thinking about who would own the copyright to the art generated using AI. There are a few open court cases on this topic but as of now the courts have refused to give any copyright protection to art generated by AI which would make it a non-starter for use in any commercial project such as a movie or game etc.

– Suramya

Source: Realtime generative AI art is here thanks to LCM-LoRA

No Comments »

No comments yet.

RSS feed for comments on this post. TrackBack URL

Leave a comment

Powered by WordPress