content fraud

How Copyscape is Fraudulent on Plagiarism and Content Fraud (Part 1 of 3)

copyscape plagiarismCopyscape is a popular plagiarism detection service that many folks use to see if their content is being stolen, as well as to see if prepared content has been plagiarized from other sources. Many are happy with Copyscape and the service it provides, presuming that it does a good job of catching plagiarism and content fraud. However, I hate the darn thing, and more professional writers ought to share in my enmity. Copyscape does not do as good a job as people think it's doing. My rage is due to the fact that a few days ago I was falsely accused of plagiarism by a potential client, because of the Copyscape results he received for my article. In our conversation, he never specified what it had flagged; just said that "chunks" of it were copied. Since I didn't know what it caught, I had no idea how defend myself. I guessed that Copyscape caught the survey statistics I mentioned, and offered that as the explanation, but he didn't like that. He said this whole thing was unprofessional and didn't want to take the risk working with me. Obviously, I did not get the gig, and I did not appreciate the quick and harsh accusation.

Worried of the potential damage this could have to my career and credibility, I ran the article through Copyscape myself to see what it flagged. It flagged TWO sentences, out of this 400-word article. To boot, these two sentences were meant to be a technical definition, something that you'd want to have verbatim to ensure accuracy. He also didn't see that I had included several hyperlinks throughout the article, including a hyperlink to the web page I got these two sentences since technical difficulties forced me to send him a text only version, instead of the actual document that included the hyperlinks (in my experience, one can't hyperlink in chat boxes). If he was able to see the hyperlinks, he would have seen that I had hyperlinked this definition to the web page I got it from. I explained the technical difficulty to the client twice, but it didn't seem to matter. All that mattered was that some words matched some other words somewhere else online, coming to the conclusion that the whole article was copied and that I'm not to be trusted.

Copyscape had also listed 20 results of copied content, except it was 20 different sites that had these same two sentences, so really it was one result instead of 20. Copyscape also didn't catch the survey statistics, which I actually did pull verbatim from the website. I don't think the client really perused these results, cause he would have seen that the results were a false positive.

And I am not the only one. A writer based in El Paso, Texas, who asked to remain anonymous, shared her story with me. Anonymous wrote a piece on gambling addiction, and the editor sent it back to her saying there was plagiarism. The results from Copyspace revealed a few phrases and a hotline from a web page as the plagiarism. Her editor now wants her to rework the piece or write something entirely different. She could rework the piece, but Anonymous fears that the editor won't trust that the rest of her work is original.

I've proceeded to run a few more of my articles (ones that are published and live on the web) through the system, with mixed results. It caught some in their entirety. Others, it only caught sentences and statistics, and not the whole article. There was one article where it didn't catch anything at all, leading me to believe that Copyscape isn't as reliable as people are hoping and thinking it is.

According to Wikipedia, plagiarism is "the mere copying of text, but also the presentation of another's ideas as one's own, regardless of the specific words or constructs used to express that idea". Meaning, in order for text to be considered plagiarized, it needs to be a copy or close copy of the text AND lack attribution to the original author or source. Yes, I copied that definition verbatim from the Wikipedia, but it's not plagiarism as I attributed the definition, placed the definition in quotes, and provided a hyperlink to the very web page I pulled the definition from. And, lovely lovely Copyscape flagged this paragraph as plagiarism, despite my extra efforts.

Attribution for online content is different from print content like an academic paper. It's not as if endnotes or footnotes really look great on a blog or web page. I think that proper online attribution means a hyperlink and/or a statement of the source, with quotation marks if the words are exact words. Since hyperlinks help in Google rankings, I don't think anyone would challenge 

In contrast, many so-called plagiarism detection services, LIKE COPYSCAPE, can only detect blatant word-for-word copies of text. A mere word-for-word copy is NOT plagiarism, I repeat, it is NOT plagiarism. It only counts if it is not properly attributed. There are many times when a word-for-word copy would be perfectly appropriate, like a definition, a direct quote, or a set of statistics.

Here, then, is a brief list from the Purdue Online Writing Lab of what needs to be credited or documented:

  • Words or ideas presented in a magazine, book, newspaper, song, TV program, movie, Web page, computer program, letter, advertisement, or any other medium
  • Information you gain through interviewing or conversing with another person, face to face, over the phone, or in writing
  • When you copy the exact words or a unique phrase (which means that a word-for-word copy is okay, as long as it is attributed)
  • When you reprint any diagrams, illustrations, charts, pictures, or other visual materials
  • When you reuse or repost any electronically-available media, including images, audio, video, or other media

There are, of course, certain things that do not need documentation or credit, which is important to note because services like Copyscape just look at the text, but don't look at how the text is used, what the text says, or if the text comes with the proper attributions, Things that don't need documentation or credit, also taken from the Purdue Online Writing Lab's page on plagiarism, include:

  • Writing your own lived experiences, your own observations and insights, your own thoughts, and your own conclusions about a subject
  • When you are writing up your own results obtained through lab or field experiments
  • When you use your own artwork, digital photographs, video, audio, etc.
  • When you are using "common knowledge," things like folklore, common sense observations, myths, urban legends, and historical events (but not historical documents)
  • When you are using generally-accepted facts, e.g., pollution is bad for the environment, including facts that are accepted within particular discourse communities, e.g., in the field of composition studies, "writing is a process" is a generally-accepted fact.

I suspect that Anonymous and I aren't the only ones who've been wrongly accused of such an unethical deed. This one incident wouldn't be a big deal, except that as a professional writer, an accusation of plagiarism could have widespread and career-damaging consequences, whether the accusation is true or not. After all, a man cleared from death row after 20 years in prison doesn't suddenly have the ordeal over and done with. That sort of thing remains with you long after the whole thing, just like an "act" of plagiarism.

Writers who've been dealt injustice because of faulty Copyscape results need to come forward with their stories, to show that you are not alone and that this is problem. Those wanting our content need to understand what plagiarism really is, and realize that Copyscape shouldn't be taken as foolproof and  absolute.

In Part II, I will complete a full statistical analysis of Copyscape, running all of my online articles through the system and summarizing the results. I have hundreds of articles live on the web, so the results should be valid. In Part III, I will offer alternatives to catching plagiarism and content fraud.