Data Privacy Collides with AI in a Rush to Bring the Technology Mainstream
AI is finding its way into everything. OpenAI’s plug-in technology has made it easy to embed large language models like GPT-4 in software systems. New platforms similar to KFlow are making it easier to corral massive amounts of data and train new models using personal data like medical records. Microsoft, Wolfram Alpha, and EPIC are trialing their software applications with OpenAI’s GPT.
Working with copilots and interactive assistants in applications is very tempting. The ease of integrating new AI technologies is exposing a new ease in mishandling private and personal information. Companies must selectively apply AI technologies to minimize security incidents and protect user privacy.
New Awareness Is Needed
The Digital Age introduced a host of threats to information security. Phishing and social engineering were new techniques computer users had to understand and look out for during their ordinary usage of software. Companies and agencies introduced IT Security and Awareness training to inform users of the myriad ways hackers can exploit systems and users to steal information. Privacy conscious companies and government agencies provided their users with Records and Management and Information Handling Training to instruct their personnel on how to handle private health information and publicly identifiable information, protect people from information exposure, and maintain their privacy.
The Intelligence Age will bring about a new set of challenges for data scientists, software designers, users, and security professionals. System architects and software designers must consider strategies to protect user information from data hungry Machine Learning models and generative AI assistants.
Algorithmic Privacy Protection
Organizations are seeking to uncover new insights from their stockpiles of information and offer assistive technologies to increase productivity. You might think an immediate approach to maintaining user privacy is to remove names, ages, and other obvious identifiable information from datasets. Unfortunately, algorithms are very effective at identifying individual people from a few pieces of seemingly anonymous data. Research projects in data privacy have shown patients can be identified from medical details separated from their names and IDs.
Newer techniques like differential privacy use randomization and statistical methods to describe patterns of groups within a dataset while withholding information about individuals in the dataset. This technique lets data scientists and analyst build AI/ML models with a decreased chance of exposing an individual’s privacy. Differential isn’t a silver bullet and can be compromised if certain aspects of an individual are known and can be matched to the patterns modeled using differential privacy.
Large Language Models Require New Awareness
Building custom Large Language Models is an alluring proposition for organizations wanting to harness the power of generative AI. Data privacy safeguards used for training other internal models must be taken while training home grown LLMs.
Increasingly inexpensive, cloud-based, API accessible LLMs like GPT-4 make integrating software an attractive choice for companies that don’t have the expertise and budget to create their own models. The risk of privacy exposure is great when using cloud-based LLMs. Applications must implement safeguards to prevent PII and PHI from being exposed to third party cloud LLM providers. Cloud-based generative models require content and context to be transmitted to the hosted systems. This type of exposure can lead to a myriad of legal risks. Additionally, many cloud-based LLMs give organizations the ability to create a custom version of their models using proprietary information. OpenAI provides its users with the ability to train a custom version of GPT-3 by supplying its own data.
Conclusion
AI/ML models and LLMs can provide opportunities for advanced levels of interaction and productivity for software users. The rush to integrate copilot and new decision making capabilities into business and consumer applications also create risks traditional security and information handling training doesn’t account for. Before implementing new AI/ML and LLM capabilities, organizations must weigh the benefits these generative technologies bring to their users over potential risks.