With ChatGPT, is it back to good old pen-and-paper assignments in colleges?

The burden of undertaking an additional evaluation for algiarism, that is algorithmic plagiarism, may undermine the evaluation process in many ways, write the authors.
Representative image of students in a college building
Representative image of students in a college building

If you graduated in the last 15 or so years from any university, chances are that your dissertation or several of your assignment submissions were inspected for plagiarism by an automated software. These automated software – which go by interesting names such as TurnItIn and iThenticate – compare a submitted document against several online databases to estimate the amount of copied content, i.e., plagiarism. Students often wait anxiously to receive that ever-so-important plagiarism score: if it is 30+, it indicates a sticky wicket, and likely puts the onus on the student to justify that it was her own work and not plagiarised. These plagiarism detection software are often notoriously buggy – they sometimes identify several generic phrases and sentence segments that appear on the web, and beef up the plagiarism score, to the peril of students. Even though every academic might also silently admit that these software are not as accurate as they claim, plagiarism is such a big problem for universities today that they cannot but use these software to ensure some form of automated verification.

2023 may be the year when plagiarism matures to an extent far beyond the worst nightmare of university administrators. Yes, we are talking about ChatGPT, a software that Noam Chomsky, one of the most noted philosophers of our time, described in a recent interview as ‘high-tech plagiarism’. Enter algiarism – algorithmic plagiarism – the new beast in the era of generative artificial intelligence (AI) such as ChatGPT.

The likes of ChatGPT can compose text in a query specific manner. Suppose your assignment brief reads: ‘Write an essay on how science has influenced religion.’ Just stick that very question into ChatGPT’s chat window and you get a great essay (yes, we have tried it). If you are afraid that every student in your class might do the same and you may be caught for collusion, just add a little bit of your interests and write a more specific query. Suppose you were always interested in aviation, you could ask ChatGPT: ‘Write an essay on how science has influenced religion in aviation’, and make it quite distinctive.

The essays produced by ChatGPT are not copied from online content, but are newly generated for you by the program through carefully blending bits and pieces from varied sources on the web. This is in fact a key difference – no plagiarism detector would knock these down, since it only knows to compare against existing web content! This is the scary – or fascinating, depending on your perspective – world of algiarism in a nutshell.

In this article, we consider unpacking the algiarism challenge around two aspects. Firstly, we consider how algiarism could vary across disciplines. Secondly, we consider what structural changes it could bring about in the assignment evaluation process and our university ecosystems.

Algiarism challenge across disciplines

There are significant differences in the kind of assignments across disciplines. In mathematical and engineering oriented disciplines, questions often require calculations and inferences of some kind. On the other hand, in social sciences, questions often require thinking, reflection, and composition of thoughts and ideas. Software programs like ChatGPT vary in suitability across disciplines in significant ways. The easiest way to illustrate this is by means of illustrative examples, which is exactly what we will do.

Fig 1

Let us take a very simple mathematical question, suitable even for a primary school student, which reads: ‘There are four houses and six people. Can you prove that there is at least one house with three people?’ Real assignments in mathematical disciplines would be considerably harder, but this is sufficient to illustrate what is at play.

Coming back to the question, even for a lay reader, it may be easy to see that you can house two people each in two houses, and the remaining two people can have one house each for themselves. In other words, we can have a housing plan {2, 2, 1, 1} which does *not* require one house to have three people. The answer to the question is, thus, very simple: “There is no need to have at least one house with three people, since I can come up with a housing plan that does not necessitate it, viz., {2, 2, 1, 1}”. Let us see how ChatGPT performs here, and you can read it for yourself in Fig 1. It calls upon mathematical tools such as the pigeonhole principle (a well-known principle, by the way, but totally unnecessary here!), and then makes an argument that there should be one house with at least two people. However, it then magically changes two to three and calls it proof! Quite absurd, by any standard. Also, it’s quite surprising to note that ChatGPT responds to the mathematical question with contradictory sentences within the same answer!

But why does ChatGPT make such an elementary mistake? While it has a lot of mathematical literature within its training data, it does not really *understand* simple mathematics (in fact, it doesn’t understand anything at all, in the way humans understand it). It can create a narrative based on pertinent words, but a meaningful answer for such mathematical questions requires a reasoning based on solid understanding, and ChatGPT evidently fails miserably here. In other words, the way it approaches the problem – as something to be solved by implicitly identifying patterns from similar content and putting them together – is totally inadequate.

Fig 2

Let us try an example from the social sciences, history in particular. Consider an assignment that involves producing a long-form essay on life in a particular region during a particular era. Now, let us say that a student, out of curiosity, tries several combinations on ChatGPT, and narrows in on ‘Life in Peninsular India during Greek era’. There is no known Greek era in peninsular India (what we sometimes refer to as south India); however, ChatGPT does indeed produce an essay, part of which is illustrated in Fig 2. It even demarcated a timeframe for the era and gave a vivid description of how Greek ideas influenced life in peninsular India. Needless to say, all of this is fictitious and bears no semblance to reality – if you chose not to be so mild in your description, you could well and truly call it *fake news*!

Yet, let us view the output from the point of view of evaluation. While the content is not factual, a key aspect is that it remains a ‘realistic sounding’ essay. When a student turns in such an essay, what would you do as an assessor? You may think that there is a realistic chance that the student has done some deep research and identified a latent connection between Alexander’s pursuit and life in peninsular India. Can you dismiss it outright as fiction? What if the student comes back later with the results of her research and embarrasses you? The foolproof way to assess this would be to do some research on your own, focused on the claims in the essay, and verify its truthfulness.

As you can see, this is significantly different from detecting algiarism in a quantitative and mathematical assignment. Reflective assignments in the social sciences are significantly harder to detect and put a massive burden on the assessor of disproving it. These issues may be mitigated if the student were asked to provide references, but it’s fairly easy to type in a few terms on Google Scholar for every other sentence and pepper the essay with references. The burden, once again, is on the assessor, to go through those references and disprove the claims.

In short, ChatGPT-driven algiarism may not be a significant challenge in assignments calling for usage of theorems and reasoning based on theorems, such as those in the mathematical sciences. However, it may significantly complicate evaluation within assignments involving critical analysis and reflection, as is often the case in social sciences. This argument can be read the other way too. It may be more tempting for students to employ ChatGPT within reflective essays than within quantitative assignments.

Potential changes in evaluation

Let us now look at the current ‘working assumptions’ in evaluating an assignment. A plagiarism software typically checks for ‘similarity’ of sentences in a document against existing resources curated from the web and from scientific databases. Typical assignments, in pre-ChatGPT times, are already checked by a plagiarism software and thus, when it reaches an evaluator, it comes with an implicit reassurance that it is ‘probably not plagiarised’. The assessor could still identify nuanced forms of plagiarism, but it is to be noted that the assessor is not actively checking for plagiarism.

Within an algiarism era, this working model could change significantly. The assessor may feel an additional burden to check for algiarism, since any downstream identification of algiarism would be an embarrassment. Thus, the assessor may find herself up against two tasks on a regular basis: evaluating the submitted assignment, and critically evaluating it for algiarism presence.

The burden of undertaking an additional evaluation for algiarism may undermine the evaluation process in many ways. It could reduce the objectivity of the evaluation, since the marks could additionally depend on the assessor’s belief of whether the assignment has been algiarised. It could make the entire evaluation process more stressful, may expand it and eat into other parts of the academic’s work life, indirectly influencing the quality of instruction. This could also promote a culture of viewing students’ submissions with perpetual suspicion, damaging relationships between learners and teachers in fundamental ways. In any case, it may impoverish the education system, which is hardly in a healthy state now.

There has been talk about software to detect algiarism, but this may hardly change things. Algiarism involves learning patterns from human-generated text and putting them together in new ways. Given ChatGPT and its ilk’s reliance on patterns in human-authored text, it is hard to see how software programs can effectively differentiate these from ‘real’ human-authored text.

Ways forward

As we ponder on the challenges that generative AI technologies bring, there are non-apparent but simple solutions that we must not lose track of. The obvious one is to limit access to digital technologies in assignments. That’s exactly what the good old paper-based assignments always did. Indeed, some Australian universities are considering going back to pen-and-paper exams. Noam Chomsky, showing a different optimism, suggested that finally the college essay may get replaced by something “more interesting”.

While we grapple with plagiarism and its advancements, it is important not to lose sight of fundamental issues within our universities. Our universities, in an increasingly competitive society, are being viewed as a pathway to a job rather than as a place to experience the joy of learning. Consequently, the relationship between students and teachers, which ought to be synergistic in a natural way, is increasingly viewed as an adversarial one. It is up to society – and all actors within it, including the government, media, community organisations – to advance the idea of a university as a place for learning, development, and knowledge production. If our students appreciate the fundamental values of learning, they would engage with it more sincerely and not resort to unethical shortcuts such as plagiarism in the first place.

If ChatGPT and the like can nudge us to step back, see systemic issues, and lead us to deliberations to address them, that may be an unforeseen victory for us as a society.

Deepak P is an Associate Professor of Computer Science at Queen’s University Belfast, with research interests in AI ethics.

Santhosh Kumar G is a Professor of Computer Science at the Cochin University of Science and Technology, with research interests in Natural Language Processing and other allied areas.

Views expressed are the authors’ own.

Related Stories

No stories found.
The News Minute