Responsible Data Handling in the Age of AI

Featured image for “Responsible Data Handling in the Age of AI”
Share:

by Michael Wolman, UT Communications

To mark Data Privacy Week in January, two experts on the subject from The University of Texas’s ethical AI research team Good Systems were interviewed: Sharon Strover, former chair of the Good Systems Executive Team and professor in UT’s School of Journalism and Media, and Amy Kristin Sanders, associate professor in both the School Journalism and Media and the School of Law. 

(L-R) UT School of Journalism and Media Professor Sharon Strover; and UT School of Journalism and Media and School of Law Associate Professor Amy Kristin Sanders. Both Strover and Sanders are part of UT’s ethical AI research team Good Systems.

How does our increasing reliance on technology and artificial intelligence impact individual privacy?

Sanders: Unfortunately, many people are unaware of just how much data their devices collect as they are using them. Lengthy terms of service and privacy policies aren’t really helpful to the average user, so they just agree to whatever default terms are set. Here in the U.S., that almost always means collecting as much data as possible and then storing it for long periods of time. That stored information is ripe for data breaches, which we see happening more and more frequently.

What considerations should be taken into account to ensure responsible data handling and use?

Strover: One approach that doesn’t receive enough attention is simply ascertaining what data are really needed. At this point many companies, governmental units and other institutions are simply gathering a lot of data because they can. Responsible data handling should begin with figuring out what data are really needed, and then whether or not personal identifiers are necessary. In the U.S., our data policies have been driven by opting out. This means people have to actively choose to not be part of services or data gathering efforts. In the European Union, by comparison, opting in is the norm.

Sanders: When I advise organizations about data handling, I encourage them to be thoughtful about the data they collect. It is important to think about why you need the data. Don’t just collect other people’s data because you can. Especially for non-commercial uses, I tend to preach data minimization: collect the smallest amount of data necessary.

People also need to think about how they are transferring data. Most people don’t use VPNs; they transfer lots of sensitive information using apps on their phones —banking and financial data, medical data, etc. Many people send sensitive data via email without encrypting it. We all need to think about secure ways to share this information, and companies need to be incentivized to encourage their customers and clients to only share information securely.

What trends are you seeing when it comes to how citizens view their privacy and what services they are willing (or not willing) to exchange their data for?

Strover: One result of our investigations is that people are unaware of the many technologies in the physical environment that are gathering data. We investigated the presence of cameras in several fairly routine governmental operations—including what the fire department does, what the library does, what the transportation unit does—for the City of Austin and found that ordinary people don’t know about cameras’ presence in ordinary life. When we explicitly ask people about data gathering from such cameras, there is a certain resignation about it. Many people feel helpless, and some feel as if it is simply an exchange that they make for receiving certain services: give up your privacy and you can use this desirable service or application.

Sanders: I teach classes on surveillance and privacy issues. My students are always alarmed once they know what data is being collected and how it is being used. But they also feel a general sense of inevitability. They say things like, “How am I supposed to stop using Google products?” or “Being on social media is part of my job.” I think many of us are concerned about the privacy and security of our data, but we also don’t feel like we have the power to fight Big Tech. This is where government regulation has to step in and level the playing field.

How are state and local governments using emerging AI technologies to collect personal data or monitor citizens?

Strover: State and local government policies around using AI are in flux right now. We found that about 22 cities in the U.S. had explicit policies for using surveillance technologies within city operations, and I believe that number is growing. Still, that’s a small number. State governments are also grappling with AI technologies. The first target appears to be how state entities are using AI within their own operations. Another evolving question has to do with state authority and control of data produced by AI apparatuses, and whether or not such data constitute public records.

Sanders: The City of Austin just announced it was redeploying license plate scanners. Police say this will help them recover stolen cars given the recent wave of auto thefts, but I have real concerns about the ability to track individuals as they move around the city. I worry that many governments are being sold this technology by companies hungry for profits, and they aren’t being properly briefed on the privacy concerns. In addition, few state and local governments have the staff expertise to develop policies and procedures that will ensure not only that they use these technologies legally and ethically, but also that they store information in ways that don’t jeopardize citizens’ privacy and safety.

What legislative actions could improve the governance of public records?

Strover: An opt-in approach to the kinds of data that are subject to public records could be one consideration. While citizens deserve access to the data that the state is gathering about them, one does not necessarily want that data to be available to the entire world. Another helpful step might be having an expert commission or a group of people charged with deliberating around the types of data that should be released and when that data should be de-identified.

Sanders: My research shows the very powerful ways in which AI can be used to improve access to public records. We have developed a tool that can conduct large-scale content analysis of court records to help determine whether the justice system is functioning fairly and efficiently. Similar technology can be used to redact private information from court records and other public records, making it possible to safely release far more government information to the public.

How do you see the balance between innovation and privacy protection evolving? What role can ethical guidelines play in shaping this balance?

Strover: Our innovation industries are motivated by being first in a given market. This has led to trampling over the issues of privacy. While I see a lot more people talking about ethical guidelines, I still don’t see enforceable actions in this country. The most responsive privacy protections at this point seem to be in very conventional legal instruments around liability. Even this, however, represents evolving territory.

Sanders: Right now, several major lawsuits are targeting prominent AI developers alleging the developers violated copyright law in training their tools with data protected by copyright, including photographs, news stories and literary works. It will be interesting to see how these lawsuits play out because I have heard some developers say their business is based on taking risks in spaces where they view the law as being “fuzzy.” If that is the mindset, then I fear ethical guidelines alone would be ineffective at protecting our interests in our privacy and intellectual property.

How can policymakers strike a balance between fostering AI innovation and protecting individual privacy, and what are some key considerations in crafting effective policies?

Strover: This is a really tricky question. A first step should be engaging the public more broadly and deeply around privacy practices and the sorts of benefits that people experience from AI-related technologies. I bring this up in part because we found that a lot of people do appreciate some of the security aspects of surveillance systems. People want to feel safe. However, what becomes of data after it is gathered, who has access to it, and who can reuse it are the sorts of questions that most people—the subjects of the surveillance—can’t answer.

Sanders: The biggest concerns I’m hearing from folks I work with—media lawyers and content creators—have to do with how large language models are trained. Content creators have filed lawsuits alleging developers have used copyrighted material to train their models. Attorneys and other data privacy professionals are raising concerns about the data ingestion that occurs when users type queries into chatbots and other AI tools. If those tools are designed based on reinforcement learning, then anyone entering client information, trade secrets or other proprietary information as part of their prompts is putting their data at risk.

How can AI technologists collaborate with other stakeholders, including legal experts, policymakers, and ethicists, to establish global standards for digital data privacy in the age of AI?

Strover: Global standards are a high bar. There are different approaches around the world to how to protect privacy, and different approaches to the stakes that certain countries have in AI technologies. Some states actively use them, for example, to surveil citizens; others, such as the EU, have newly enacted legislation that establishes allowable data gathering and data privacy standards. I am not optimistic about a global standard at this point.

Sanders: Right now, the U.S. is behind the curve in helping to establish standards for data protection and AI. The EU largely set the standard for data protection with its General Data Protection Regulation, and we’ve seen many countries around the world (except the U.S.) adopt similar comprehensive data protection laws. Similarly, the EU’s AI Act is the world’s first comprehensive AI law, and I suspect many countries will take a similar approach, once again allowing Europe to set the global standard.

How do AI technologies such as facial recognition and large language models (LLMs) pose unique privacy challenges?  What strategies or technical solutions can we employ to mitigate these challenges?

Strover: Facial recognition technologies and LLMs are challenging on several levels. A lot of cities in the U.S. prohibit the use of facial recognition technologies by local government, and some try to regulate its use by local companies as well. One problem is knowing when cameras have facial recognition technology activated. For example, I know of some city cameras here in Austin that are capable of facial recognition, but the city has explicitly not used that capability. However, there is nothing preventing them from using that capability.

Large language models are a little different but also represent problems of misrepresentation. Their training data predispose them toward biased representations textually and visually. It’s difficult to know what the training data are for given models, and of course some of the recent copyright challenges add another layer of problems to how these models operate.

Sanders: I had my students ask Dall-E to generate images of a law professor. Nearly all the images generated were white males. This isn’t surprising if you know how these tools are trained. These biases have real-world consequences beyond just reinforcing existing stereotypes. Because facial recognition was largely trained on images of white faces, it is far less reliable at identifying Black and brown people. This has serious repercussions for law enforcement and other uses of the technology in our everyday lives.

But they also pose real privacy challenges because often tools like facial recognition are offered as time-savers or conveniences. Delta recently rolled out facial recognition boarding. It should worry people that the government and commercial entities are storing and using their biometric data because it makes you easier to track and monitor. Even law-abiding citizens should be concerned about this. If the “good guys” can use this data to track you, then so can the “bad guys.”

What do you see as the biggest challenges regarding data privacy in the coming year?

Strover: I would love to see more attention to data privacy. We need some strong laws, and we need knowledgeable people to write them and to implement them. In the U.S., we’ve had a patchwork of privacy protections over the past century, and the lack of comprehensiveness and clearly spelled-out rights regarding peoples’ authority over their own information or information about themselves has been distressing and disempowering.

Sanders: I agree. The U.S. still lacks a comprehensive data protection law, unlike many countries in the world. I’d say it is imperative that Congress take these issues seriously and pass legislation to protect U.S. residents and their data. In the meantime, states are trying, and struggling, to protect their residents by passing state-level legislation.