Time commitment
5 - 10 minutes
Description
This video and accompanying handout discusses the potential risks of inputting unpublished research and data into large language models (LLMs) and offers potential mitigation strategies.
Video
Transcript
Welcome everyone to this presentation I've titled "AI and Your Unpublished Writing: Risks of Using Large Language Models" or LLM. My name is Dr. Jodie Salter and I am a writing specialist in the McLaughlin Library at the University of Guelph. This presentation developed out of a project I began earlier this year. I'm interested in how AI tools can help us deepen our thinking as content experts by using generative prompts to challenge our assumptions. I was playing with ChatGPT, the free version, and I wanted to create a workshop on how ChatGPT could help graduate students prepare for oral and comprehensive exams. But what I discovered in the process was how little I knew about the inherent risk of inputting unpublished writing into ChatGPT. So I paused and switched focus to first educate myself. And in doing so I realized that I should more broadly share this information with all researchers be it faculty, staff and graduate students who are writing and publishing their own work. This presentation has three components and will address the risks of LLMs. So first I'm going to discuss four areas of concern if you input your unpublished writing and data into AI providers and large language models. Then I will propose various strategies for mitigating risk. And I will conclude with some guidance for you to consider and follow when using LLMs with any unpublished writing or data. As an FYI, I've created a handout on the library learning common website that accompanies my talk. So there are four areas of concern you need to consider if and when inputting your unpublished writing or data into large language models like ChatGPT. And each of these areas poses numerous risks to as a scholar and a researcher. Concerns about retaining your intellectual property rights, concerns about privacy for your subjects, concerns about data security, and reputational concerns. If you input your unpublished ideas, research, writing, and or data into an LLM, you could compromise your ability to retain full intellectual property rights. So you risk losing the option of publishing your own work in a journal or as part of your thesis if you're a graduate student. So be aware of all the risks before using any LLM. There are two main risks pertaining to intellectual property. If unpublished research or writing is intentionally or unintentionally incorporated into a model's training and future outputs, your own work could resurface without attribution. And this could lead to potential accusations of plagiarism against you. And number two, if you're developing cutting edge or proprietary research, using LLMs that store data, even temporarily, may lead to unintentional early disclosures, which could undermine patent eligibility and or enable other independent developments. So you could unintentionally plagiarize your own work or lose proprietary ownership of your own work because it now exists in the public realm via the AI provider storage and training outputs. Privacy is always a concern in the digital space but with confidential research data privacy becomes increasingly important consideration. So number three, if sensitive or confidential material is input into an LLM, this sharing could violate internal institutional data governance policies. And these slides contain links to important resources. And as I said, I'll make this information available to you after the presentation. Number four, if data contains sensitive or personally identifiable information known as PII, this sharing could violate privacy laws such as FIPPA, which is our Ontario Freedom of Information and Protection of Privacy Act. In terms of data security, LLM providers may be operating in jurisdictions or countries with different data protection laws, which potentially compromises the privacy of your data, which could potentially lead to legal complications for you and your research team. And all of these actions I just listed hold reputational risk for you, for the University of Guelph, and for any of your research partners. So before using any generative AI tool to support your research and writing, educate yourself about their terms and use of privacy, approvals and disclosures, data management and storage and training. So read the AI providers terms and use and privacy policy to understand how input data is used and who owns the generated data. You can also direct your questions to the University of Guelph information security team at infosec@uoguelph.ca or CCS at ithelp@uoguelph.ca To use AI tools for any part of your research or writing process, you must comply with spec specified terms and conditions for approved use from your thesis supervisor, lab supervisors, research partners, data owners, project sponsors, peer review journals have their own rules, and granting agencies such as SSHRC and NSERC and CIHR. So last year in November 2024, the tri-council agencies SSHRC and NSERC and CIHR posted a new disclosure policy titled guidance on the use of artificial intelligence in the development and review of research grant proposals. And this policy states that all researchers need to disclose any use of AI when applying for research grants. quote, "Applicants must state if and how generative AI has been used in the development of their application and are required to follow specific instructions which will be provided for each funding opportunity as they become available. So if you have received tri-council funding or you work for a research center or a professor who who holds a tri-council grant, you must comply with their policy. So the rules and regulations are developing quickly. So familiarize yourself with the rules and regulations and be attentive to any changes in their policies. Inform yourself of the university's organizational research data management and AI use policies and follow them. I've provided a number of links here to various resources from our UofG IT department and the office of research including research data classification, data storage guidelines, acceptable use policy, safeguarding research, and the information security guidelines for the use of generative AI intelligence. And I'll return to this last resource, the information security guidelines in a moment to highlight for you what I think are some of the most important excerpts. Knowing how AI providers store data and train their models is potentially the biggest risk factor for us. So, educate yourself. Know which providers store and use data for training and do not use them for any of your unpublished writing or data. They can be useful tools for research, helping us gather sources, summarizing, comparing abstracts, generating research questions, challenging our assumptions, etc. But do not put any of your own unpublished work into them. And if you are a faculty member who supervises graduate students, do not input any of your graduate students thesis drafts into an LLM to provide feedback because you can then compromise their ability to retain their copyright. This slide shows a comparison table of AI providers and their training and data storage practices. It's not a comprehensive list. I sort of copied the top five from a longer list that I created. So the first three on the list, the free version of ChatGPT, the free version of Copilot and Anthropic or also known as Claude all use their inputs to train their models. They do not offer fully private modes and they all store your data. Now, interestingly, when I was preparing this presentation, Anthropic changed their policies on August 28th, 2025, and they went from not training from your data to fully retaining all of your shared inputs. So, be careful because these changes happen fast and they often happen without public disclosures. We need to be vigilant and share knowledge with our colleagues and research partners to keep our data safe. The ChatGPT paid versions, they vary in cost with one plan currently at $20 a month. It uses your inputs to train its models, but you can opt out. It does offer an incognito mode for some privacy, but your data is stored for 30 days. The Microsoft 365 Copilot, which is our University of Guelph licensed version does not use your data to train. Thus, you don't need to opt out and it is a fully private version because it's hosted by UofG. It caches your data briefly during the session, but it doesn't permanently store your prompts, files, or responses. So, all U of G students, staff, faculty, and IT admin have access to this Microsoft Copilot chat through their Microsoft 365 A5 subscription. and it is described as the AI-driven assistant that facilitates real-time communication, answers queries, and aids in task completion. The U of G information security guidelines states, quote, "While public tools may have access to larger training data sets - the advantages of you using a private generative AI tool such as this Microsoft 365 copilot - these private generative AI tools include more control over the data and output, higher security and privacy, and lower ethical and legal risks. So, I provided links for more information about the Microsoft 365 enterprise data protection. There's details on their privacy and protections and I've also included the link to CCS Microsoft Copilot SharePoint site. I've only just begun to explore and compare Microsoft 365 Copilot with ChatGPT to test the robustness of 365's AI generated content and responses. And I've shifted away from using ChatGPT because of these privacy and security reasons. I read the UofG's information security guidelines for the use of generative artificial intelligence and I will share with you eight bullets that I think are of particular relevance for when you plan to use AI providers and tools to support your research and writing. First, do not input your qualitative or quantitative data into an LLM unless you have all applicable approvals. So, obtain consent from data subjects if you plan to use AI tools to collect, process, or disclose personal data and inform data subjects of the purpose, scope, potential risks, and provide an option to opt out. I had a recent conversation with a professor who was gravely concerned when their students enthusiastically proclaimed that they were curious how AI would analyze their qualitative data. Their lab has numerous binding contracts that would be violated if any of the data were input into an LLM. And these violations have legal implications. And the students had no awareness of overarching regulations that govern the research projects. Extra care should be exercised whenever sharing information in an AI tool. So remove any personally identifiable information, the PII, and any sensitive data before inputting it into an LLM. Internal S2, confidential S3, and restricted S4 university data should never be entered into a public AI tool such as ChatGPT or used in a generative AI prompt. Use AI generated content carefully and appropriately. So ensure system outputs are not identical or substantially similar to any copyrighted protected material and remove any problematic material to minimize the risk of intellectual property infringement. And then give proper attribution where appropriate. So indicate explicitly and clearly that generative AI was used to develop content. And if AI use is permitted, be sure to properly site any AI work generated work that you include. And I've provided a link here to the Purdue University Library guide on AI citations. So again, my name is Jodie Salter. My email is jsalter@uoguelph.ca. You can also connect with the UofG information security team at info infosec@uoguelph.ca. You can talk about using AI appropriately and effectively to support your writing and you can book free appointments in the library to meet either online or in person with our professional writing specialists, with our trained graduate writing TAs, and with library research assistants.
Downloads
License

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Recommended
- Ask Chat is a collaborative service
- Ask Us Online Chat hours
- Contact Us
