Artificial Intelligence

Large Language Models & Data Privacy in Higher Education

July 24, 2023

Note: This is part one of three, covering FERPA, GDPR, and CCPA, in our three part series on data privacy in higher education and the risks introduced by generative AI. Part 2 covers GLBA and Part 3 covers NIST 800-171.

Introduction

It’s an exciting era for higher education, with technology paving the way for innovative approaches to teaching, learning, and administration. At Canyon GBS, an Education Technology company, we keep our finger on the pulse of these advancements, eager to harness the potential that they hold. One such trend that has captured the attention of many higher education institutions is the increasing use of large language models (LLMs), particularly OpenAI’s ChatGPT, in the academic sphere.

Large language models, AI systems that are trained to understand and generate human-like text, are revolutionizing the way institutions function, bringing a plethora of benefits to the table. However, as we dive headfirst into this brave new world, it’s crucial to be aware of the associated privacy and data security concerns that come with it.

In this article, we’ll delve into the important state and federal privacy protections in place, the potential pitfalls if protected student data is fed into LLMs like ChatGPT, the importance of exercising caution, and a closer look at the significant benefits these models could bring.

The Dawn of Large Language Models in Higher Education

Technology has been steadily seeping into the realm of higher education for years. From cutting-edge learning management systems to smart predictive analytics tools, universities and colleges worldwide have been using tech solutions to enhance their services, streamline operations, and promote student success. The latest entrant to this list is artificial intelligence, specifically AI-powered language models like ChatGPT.

These large language models use machine learning algorithms to process and generate text in a manner that simulates human understanding. As a result, they’re able to support a range of applications, from tutoring and research assistance to simplifying administrative tasks and beyond. As the buzz around these LLMs grows louder, more and more institutions are getting on board, using these AI platforms to help them analyze large data sets and ease their operational burdens.

However, just as Winston Churchill once said, “where there is great power, there is a great responsibility.” It’s important that we tread carefully, especially when it comes to handling sensitive data.

Navigating the Complex Landscape of Data Privacy and Protection

Data privacy and protection, particularly in the higher education sphere, is a complex field. It involves navigating a maze of state and federal laws, all of which are aimed at safeguarding personal information. Here, we’ll take a closer look at three of the most prominent ones – the Family Educational Rights and Privacy Act (FERPA), the General Data Protection Regulation (GDPR), and the California Consumer Privacy Act (CCPA).

The Family Educational Rights and Privacy Act (FERPA)

FERPA is a federal law that was enacted to protect the privacy of student education records. It provides parents with certain rights with respect to their children’s education records, which are transferred to the student when they turn 18 or enter a postsecondary institution.

The use of protected student data in LLMs like ChatGPT could potentially result in violations of FERPA. For instance, if an institution feeds student data into the model without the student’s consent, it could be viewed as releasing education records without permission. FERPA violations can result in severe penalties, including the loss of federal funding.

To illustrate, let’s consider a hypothetical case where a university decides to use ChatGPT to analyze its student data to predict dropout rates. The university feeds student data, including grades, attendance records, and personal information into the AI model. However, they do so without obtaining explicit consent from the students. This could be considered a breach of FERPA, as the institution has essentially released the student records to an external entity (ChatGPT) without proper authorization.

The General Data Protection Regulation (GDPR)

The GDPR is a comprehensive data protection law enacted by the European Union (EU). It enhances the privacy rights of EU residents and applies to any organization worldwide that processes the personal data of EU residents.

One of the key principles of the GDPR is data minimization, which means that personal data collected should be limited to what is necessary for the purpose for which it is processed. If institutions provide large volumes of protected student data to LLMs for analysis, they could potentially be in violation of this principle.

Consider, for example, a university in the U.S. that has a sizable number of EU students. The university decides to use ChatGPT to analyze student data for research purposes and feeds the model with extensive student information, including personal data. In this case, the university could be in violation of the GDPR, especially if it has processed more data than is necessary for its research purposes, or if it has not obtained the necessary consents.

The California Consumer Privacy Act (CCPA)

The CCPA provides California residents with specific rights regarding their personal information. It includes the right to know what personal information is being collected, how it is used, and whether it is being sold or disclosed and to whom.

If an institution were to use protected data from California students in LLMs without their explicit consent, it could potentially violate their CCPA rights, resulting in hefty fines and damaging lawsuits. For instance, if a Californian student’s data is fed into ChatGPT without their knowledge and consent, the student could potentially sue the institution for violation of their CCPA rights.

Note: Colorado, Connecticut, Utah and Virginia have also enacted comprehensive consumer data privacy laws, and there are proposals being considered all across the country for similar protections at the state level.

The Importance of Caution and Diligence

Considering these legal frameworks, it’s clear that while LLMs like ChatGPT can be incredibly beneficial for higher education, it’s crucial to use them responsibly, especially when it comes to handling sensitive data. Ensuring that only public or anonymized data is used until these tools have been thoroughly vetted by your institution’s information security and legal departments for privacy and security reviews is of paramount importance.

One way institutions can navigate this is by implementing a stringent data review process before data is utilized by these models. Another is to explore ways to anonymize data so it can’t be linked to any individual student, thereby reducing the risk of privacy law violations.

Technology is a useful tool, however, it must be used ethically and your college and university is required to do so within the constraints established by local, state, and federal laws and regulations. The responsibility is on each institution to ensure that as they adopt such innovative tools, they are also upholding the principles of privacy and data protection laws.

Unlocking the Potential of Large Language Models in Higher Education

Despite the possible privacy concerns, it’s undeniable that LLMs like ChatGPT hold immense potential for higher education. With their advanced text generation and analysis capabilities, these models can transform the way institutions function, impacting everything from student services to administrative tasks.

For example, ChatGPT could serve as an intelligent assistant, fielding common student queries, helping with assignments, providing research support, and even facilitating learning outside classroom hours. With its vast knowledge base and ability to generate contextually relevant responses, it can enrich the learning experience and provide support to students in ways that were previously impossible.

Similarly, for administrative staff, LLMs can automate routine tasks, analyze data for insights, aid in decision-making, and more. By reducing the time and effort spent on mundane tasks, it allows staff to focus on more strategic initiatives, thereby improving the overall efficiency and effectiveness of the institution.

The transformation these models can bring about is unparalleled, and they represent a significant stride in education technology. However, while we celebrate the potential of these tools, it’s important that we also consider the privacy and security implications.

Conclusion

AI-powered technology advancements in higher education can create exciting opportunities, but guidance for compliant utilization is lagging behind the release of the technology itself. As such, while LLMs like OpenAI’s ChatGPT open up new horizons in teaching, learning, and administration, they also bring up important questions about data security and privacy that require our attention.

The key lies in finding the balance between harnessing the full potential of these models and ensuring that we uphold the principles of data privacy and protection. By navigating this fine line, we can move towards a future where technology and ethics walk hand in hand, making higher education more efficient, effective, and accessible to all.

Important Note

This article is meant to serve as a resource for understanding the potential privacy concerns surrounding the use of large language models in higher education and the associated legal protections in place. It does not, however, constitute legal advice. Laws and regulations often differ from one jurisdiction to another and are subject to change. It is advisable to seek legal counsel before making decisions based on the information provided in this article. Always consult with a qualified professional to address your particular needs and circumstances.

Cindy Licata

Cindy is the Chief Compliance Officer at Canyon GBS, leading enterprise-wide governance, risk, and regulatory compliance for the company’s education-focused SaaS solutions.

All Posts