We are seeing a lot of cases (I am being polite) where LLM’s are trained on copyright protected data/images or has been trained with incorrect data. Currently as far as I know there is no easy way to fix this other than to train the entire model again from scratch excluding the problematic dataset. This is obviously not feasible and scalable at all.
Another sticky point is the Right to be forgotten which is a part of the GDPR and a few other countries. It requires systems to remove private information about a person from Internet searches and other directories under some circumstances. With LLM’s starting to infest search engines it means that in order to be compliant they need to be able to remove information from the model as well.
So it got me thinking if it would be possible to create an algorithm/process that allows us to untrain an LLM. A search across academic papers and the Internet shows that it is an emerging field of research and as of now mostly theoretical. Primarily because of the way the models work (or are supposed to work) we also claim that the models do not contain any information about a specific image/text by an artist.
Examples of ongoing Research on Transformer editing are Locating and Editing Factual Associations in GPT and Mass-Editing Memory in a Transformer. I did try reading though the papers and understood parts of them, the others kind of went over my head but still this is a research field I will be keeping a close eye on as it will have a large impact of the future of LLM’s and their usefulness.
– Suramya