March 22, 2005


I was reading the April 2005 issue of the Communications of the ACM and saw this really interesting paper on Self-Plagiarism written by Christian Collberg and Stephen Kobourov. In this paper they talk about what is legimitate reuse of your own work and what is Self-Plagiarism. In case you are wondering what is Self-Plagiarism just like I was when I first saw the title, here’s a brief definition of the term according to Miguel Roig

In writing, self-plagiarism occurs when authors reuse their own previously written work or data in a new or written product without letting the reader know that this material has appeared elsewhere.

Now I find this to be pretty interesting concept ’cause its pretty easy to define when you stole from someone else, but how do you identify that you stole from yourself? Programmers are esp susceptible to this (In my opinion) as we are so used to reusing code in our programs that we don’t even think twice about reusing parts of old papers/articles we wrote when creating a new one. I know I have done it a couple of times… What about you? According to ACM & IEEE an article is only considered for publication when more than 25% of it is unpublished work. But even then over 75% of the article is composed of results that were previously published elsewhere… Think about it, in a lot of the papers out there in which over 75% is old work that was regurgitated in a new layout with some minor changes…

To demonstrate the widespread nature of the problem the Arizona computer science department has created a tool called Self-Plagiarism Tool (SPlaT) that searches through the publications from over 50 institutions and analyzes them for similarity. Check it out its a pretty cool tool. I wish it would let me check my old articles and stuff to see how much I reuse….

Original Paper: Communications of the ACM, April 2005 (ACM Membership needed to read the complete paper)

