USMA Library Homepage: Generative Artificial Intelligence: Data and Privacy Literacy

Data and Privacy Literacy

Watch this 2 minute and 30 seconds video about being a responsible user of GenAI. Though not explicitly identified as such, notice how he includes information about data and privacy literacy skills in his thoughts about responsible GenAI use.

Video shared with license: CC BY-SA 4.0

Data and Privacy Literacy

Data Literacy
Privacy Literacy

Data Literacy

Data literacy is the ability to read, work with, and communicate data.

The use of GenAI tools is increasing and so is the importance of analyzing the output of those tools. Data literacy equips West Point cadets, faculty, and staff with the necessary skills to critically evaluate the content's accuracy and identify potential biases in the generated output.

Understanding the capabilities and limitations of AI through data literacy is crucial. Recognizing that GenAI relies on statistical models rather than human-like reasoning is vital for setting realistic expectations.

Data literacy is crucial for handling data, ensuring adherence to best practices for protecting sensitive information in AI interactions.
Data literacy aids in understanding and conveying the societal and ethical impacts of GenAI.
Data literacy provides a basis for informed decision-making on GenAI use and ensures regulatory compliance.
Data literacy encourages a culture of continuous learning, essential for keeping pace with GenAI advancements and ethical considerations.

Data Literacy Skills

Understanding where the data used in large language models comes from.

Data used in large language models (LLM's) is "scraped" from the Internet. Data scraping involves using an application to extract information from the Internet. Sources of data include:
- Websites
- Digitized books
- Some social media
- Anything public including videos and images
- Certain LLMs are trained on scientific or academic articles
Data set problems:
- Bias - What are the dataset's inherent biases?
- Age - How old is the dataset used in the LLMs' training?
- Privacy - Is private personal information included in the dataset?
- Copyright - Was copyright violated when the dataset was created?

Evaluating GenAI tools and outputs for accuracy and reliability

GenAI Tools
- "The Generative AI Product Tracker(opens in a new tab) lists GenAI products that are either marketed specifically towards postsecondary faculty or students or appear to be actively in use by postsecondary faculty or students for teaching, learning, or research activities. The Tracker is a living document, which we update regularly as new products enter the market or new information about existing products becomes available. For more information, see our issue brief, Generative AI in Higher Ed: The Product Landscape." - Ithaka S+R
- The University of Sydney's "Different Generative AI Tools(opens in a new tab)" provides a curated list of a number of GenAI tools which includes their cost, model used, and features.
- West Point currently has licenses for Microsoft Copilot and is in the process of acquiring or purchasing licenses for Keenious and Scite. These tools will be a good place to start when deciding which GenAI tool meets your needs. For more up-to-date information on GenAI tools acquired or purchased for West Point use, click here(opens in a new tab).

Privacy Literacy

A set of skills that helps you understand how your private information can be used, how it can be stored, what risks you are taking by providing your information online, and how you can protect it.

In the context of Generative AI, there are two major questions which deserve consideration and understanding.

How is privacy compromised during the training phases of large language models?
- GenAI systems are trained on vast datasets. These typically include books, articles, and various online content – potentially even outdated personal information you may have shared years ago.
- It is crucial to note that much of this training data is collected without explicit consent from the original creators or authors. This raises serious ethical questions about data ownership and usage rights in the GenAI environment.
How can privacy be compromised when using Generative AI tools?
- Now, consider the privacy implications when using these GenAI tools. Each interaction, whether it's seeking writing assistance or generating images, potentially involves sharing personal information. While some companies assert that they don't store user prompts, the lack of transparency in this area is concerning.
- Moreover, even if individual prompts aren't stored, your interactions could still contribute to model improvements. This creates a paradox where increased usage enhances GenAI capabilities but potentially at the cost of personal privacy.

Additional Security Awareness

As stated in Appendix F of the DAAW:
- "For security reasons, cadets are prohibited from inputting Controlled Unclassified Information (CUI), personally identifiable information (PII), classified information, or any otherwise restricted information into generative AI tools. See AR 380-5 for the classification, downgrading, declassification, transmission, transportation, and safeguarding of information requiring protection in the interests of national security."
The WREN AUP
- You acknowledge that you will not share CUI (including PII) outside the USMA WREN from your GFE or non-GFE devices via third-party software, cloud services, browser, and other application plugins, including artificial intelligence (AI) capabilities (e.g., AI assistants, generative AI chatbots, multimedia or data generators) that are not approved for CUI processing by USMA.

The policy above is about safeguarding sensitive information, specifically Controlled Unclassified Information (CUI) and Personal Identifiable Information (PII). You're not allowed to input any such data into systems that haven't been pre-approved by USMA for handling CUI. This rule applies regardless of whether you're using the WREN network or any other network, and it doesn't matter if you're using a government-issued or personal device. Examples of materials that fall under this policy include cadet papers, grades, departmental documents, and research data. However, this list isn't exhaustive, so it's important to be cautious with any information that might be considered sensitive. Always err on the side of caution and use only approved systems when dealing with potentially sensitive or controlled information.

Steps to Protect Your Privacy

Remember, do not enter the following information into a GenAI tool:

Personally identifiable information
Health information
Legal information
Financial information

Also consider limiting what information you make available on the Internet in general.

When using GenAI tools and on various websites and apps, carefully read the privacy policy to determine if and how your personal information will be used and/or stored.

While the United States does not have comprehensive data privacy legislation, some states are enacting laws to protect citizens against harms such as:

Using Generative AI in the hiring process
Using Generative AI to create a individual's likeness and using it for advertising
Not disclosing when a chatbot is being used as opposed to a human or non-AI interface
The use of facial recognition systems in law enforcement

While only updated as of August 2023, the Electronic Privacy Information Center has a list of state enacted laws about Generative AI and Privacy Protection.

Privacy and Security

A. Golda et al., "Privacy and Security Concerns in Generative AI: A Comprehensive Survey," in IEEE Access, vol. 12, pp. 48126-48144, 2024, doi: 10.1109/ACCESS.2024.3381611 Retrieved May 6, 2024 from Privacy and Security Concerns in Generative AI: A Comprehensive Survey | IEEE Journals & Magazine | IEEE Xplore