Wednesday, December 4, 2013

Code reuse and plagiarism

Gautham's been doing a bunch of refactoring of our existing codebase for spot counting, in particular using a lot of object oriented design strategies. One of the primary goals is code reuse, which is commonly accepted as A Good Thing in programming. On the other hand, a student in lab has been writing a grant proposal on a topic that is very similar to another grant proposal we have submitted (and yes, we checked with the funding agency, it's okay in this context). But of course, my student felt compelled to have completely new language so that it wasn't "plagiarism", which is A Bad Thing. Which got me wondering: why are we so concerned about plagiarism in this context? Why is reuse of language in one context a worthy goal in and of itself, and complete blasphemy in another? Here's another example: I was at a thesis defense in which the candidate was strong second author on a paper and had included some of the figure legends from the published paper in the figures in her thesis. One of the committee members complained about this, saying that it was important to "write this in one's own words". I agree that it may be a good exercise, but I'm not convinced that using the figure legends is per se A Bad Thing. There just aren't all that many clear and concise ways of writing the same thing.

Maybe it's time to rethink the plagiarism taboo. Just like you can use someone else's computer code (with appropriate attribution), why not be able to use someone else's language? If someone wrote something really well, what's the point in rewriting it–probably less well–just for the sake of rewriting it? Would anyone rewrite a super efficient algorithm just for the sake of saying "I wrote it myself"? All that you need is a good mechanism for attribution, like quotes and links and stuff, you know, like all that stuff we already use in writing. In fact, I would argue that attribution is far more transparent in writing than in computer code, because typically only the programmer sees the code, not the end user. If I run a program, I typically am not looking at the source code, so I don't really know who did what, even if the README file gives credit to other code sources.

One might object to copy-and-paste writing becoming a mish-mash of ill-fitting pieces. First off, this is just bad writing, just as code that just jams together different pieces can be a mess, and will typically require one to code stuff to make things flow together well. But in a world where writing demands are growing daily (certainly the case for me), maybe it's time to consider text-reuse as a good practice, or at least not necessarily a bad one.

No comments:

Post a Comment