I’m in Victoria, British Columbia this week attending the Digital Humanities Summer Institute! It’s been fun, disorienting, interesting, and challenging.
What is digital humanities and why am I here? In my understanding, which is still limited at this point, the digital humanities is simply all iterations of digital-aided scholarship in humanities where the digital takes a front-seat role–it could be textual analysis such as digitally comparing large chunks of text to a corpus (big body or collection, like thousands of books or newspaper articles) to find unusual/significant words or phrases in, say, Jane Eyre as compared to a bunch of other Victorian novels; it could be displaying the results of analysis in a visualization–basically a pretty graph-like image which is both aesthetically pleasing and says something about the text, like how long sentences are in different parts of a book. There are also people here who are learning how to make “enhanced reality” applications, so, for instance, your smart phone could guide you as you literally walk along the route of Leopold Bloom in Dublin and chunks of the story, pictures, and quotations could come up on your phone as you walk the route. Others here (there are around 400 people here!) are discussing how to build in digital components into the humanities classroom, like blogging or more complex things like having students build digital projects. Still others are getting into the nitty-gritty of building digital humanities tools which compare, compile, collaborate, or visualize stuff.
So what how does my plagiarism study have to do with this? The finding part! Up to now, I’ve been relying on the scholarship of others like Paul Saint Amour, and my own two eyes to find plagiarism and repetition/self-plagiarism in Oscar Wilde’s passages. Some of the exact passages were pointed out in others’ work, but not entirely clearly, so I’ve done some “by-hand” Google searching to find the exact passages Wilde took from other sources like William Jones’ The History of Precious Stones. But, I thought: Wouldn’t it be cool to find plagiarism no one else has found? It is so much easier to find student plagiarism with the help of tools like Turnitin.com, which basically “Googles” every word-chunk against the web, scholarly journals, and its own student papers database. Can I “turn in” literary authors?
I tried to use Turnitin.com with Wilde’s work, but came up with the expected result: Wilde’s work is 100% “plagiarized”—because it is all republished in some form on the web. What I really want to compare is Wilde’s work (or any single author’s) against a whole digital library like Google Books to search for chunks of similarities, excluding collections of Wilde’s work as well as the rest of the web. So, that’s what I’m trying to do here—find a more effective digital tool to find literary plagiarism.
What I’ve learned so far:
There may not be such a tool in existence. …one of the common complaints of software like Turnitin.com, or even free tools like Dustball or Advanced Plagiarism Checker or Grammarly is that they do not use Google Books. Why not? Two reasons, I think: 1. Google Books is so darn big that it would take a really long time to search using n-grams, or basically chunks of text. And 2. Google puts a 1,000 searches limit on this. So, that’s where I’m stuck at the moment.
However, I did find some code on a website that a few other students at DHSI have been helping me with that will allow me to find repetitions within a single text file. This will at least be very helpful in finding self-plagiarism/repetition in one file, or I can compile a few files into one document for a comparison. It’s a work in-progress.
Other things I’ve learned:
*There are many tools available out on the web that I will never use, and which I think have very limited use in serious textual analysis. This judgment is based on only a few hours of play, but Wordle, Mandala, and Voyant are all very pretty, but I’m not sure they can tell me anything WordSmith Tools or another concordance-thingie (very professional name) can’t. Again, I’m not an expert (at all). Perhaps their use is mostly in communication and display—they do offer another way to read a text—a distant reading. This could be useful in generating new and interesting perspectives. But I find that I go to the digital humanities not to form ideas and find curiosities, but to answer specific questions. In my opinion, that’s what computers are particularly good for: taking a question of very limited scope and defined parameters and answering it in an exact, traceable, checkable way.
*If you want it, you may have to make it. There is a big buy-in with tinkering with code. The hope is that there is a big win on the other side. If I can get a good working plagiarism-checker, I think that will be a huge win as I suspect that there are tons of un-found literary plagiarism out there.
*Gamification. This is unrelated to my plagiarism focus, but I went to a little session on this, which was really interesting. The concept is to make non-games, into game-like activities to make activities more fun, addictive, encourage loyalty, teaching…whatever. Basically, it tricks people into doing what you want them to, which is, I have found, a lot of what teaching is about. Instinctively, I’ve not been for this particular approach—ideally, my students should love the subject and want to learn it. But, in freshman comp they seem to do much better when they are tricked into it—the students themselves call this “motivated.” So, I’ve been thinking about ways to make my comp class into more of a “game space” (gosh, that sounds fluffy).
* How to navigate the Victoria public transit system!
*And much more that I will expand upon later.