Sitemap

Your face is in there

10 min readMar 29, 2025

Tackling bias in AI

This is an edited transcript of an interview with researchers Neeti Sivakumar and Anya Zhukova about their work investigating biased AI models and datasets. You can read a condensed version here.

Image from the Coco dataset: cocodataset.org

Q: Are you both current students?

NS:

I’m not, I graduated last year from ITP, and I’m currently a research resident there.

AZ:

I’m a current student in IMA Shanghai.

What got you interested in this particular project?

NS:

I think we both started in different places, then came together. I was really interested in A People’s Guide to AI, that talks about AI and machine learning and algorithms in ways that are easy to understand. I know Anya was working on it before we both took a class together on ml5.js.

AZ:

Yeah, I took this class called Machine Learning for Artists and Designers, here in Shanghai, about a year and half ago. We used ml5 in class, and my final project was on bias. I used Stable Diffusion with different prompting techniques to show how outcomes can be biased. I used image-to-image generation, and if I prompted it to generate “a strong person”, even if the input image was a black woman for example, it still output a white man, five out of six times.

The next semester I spent in New York, and met Neeti, and we started working on this.

I heard about this from Ellen Nickles. Do you know her?

NS:

Yes, I took a class with her, and was inspired by the initial work she had done. I was working on another project on experimental design methods, and we were working with large language models, trying to develop a way of inputting assumptions and generating questions. A large number of service workers are immigrants, especially South Asians. Using the models generated a very odd set of questions.

So we came together to create a resource for testing ml5 models, and trying to understand how you even check for bias, how it’s defined.

Image labeled “person” in ImageNet dataset: image-net.org

So how do you do it? How do you check?

AZ:

We are trying to figure that out! When we started a year ago, there were maybe two resources we could refer to, in terms of methodological approaches. Now, Google and others publish information on bias, where they didn’t a year ago. So everyone is trying to figure it out at the same time.

We’re working specifically on ml5. But even narrowing down, there are so many approaches. A lot of the time, it comes down to looking at the data that was input for training. Sometimes we look at the outputs and see that something is wrong, and trying to see the inputs. That’s the general approach.

NS:

When we started, there were a few companies doing research into fairness, and developing models to check for bias. We put that in the context of students who are just playing around with machine learning, and understanding at what scale these students would be ready to think about bias. We realized, of course, that what people do in industry is hard to apply to a small student project.

We also sent out a survey about what kind of material people might be interested in. We received interesting feedback. For example, we were told that even just sharing the survey in English, we demonstrated some bias of our own.

So we put together a document, and it lists multiple ways bias can show up, and multiple ways you can check. Of course, with a small team, you are often not able to always check for biases. As Anya said, there are a lot more resources now than there were a year ago — for mitigating biases as well as recognizing existing biases.

As Anya said, biases can manifest in different ways. It could be the input, how the algorithm is defined, the way it’s being used. Is it projecting something we don’t want to project? How much will continue to exist because the data has come from the internet?

There’s an oath that you can take, as soon as you start thinking about using machine learning. It asks you to think about whether your project could cause potential harm — do you know what you’re making? It’s a question for all the students in our departments — we have a lot of fun playing around with technology. But then you think about scaling up, or students graduate and launch startups. Are we thinking about society?

Image from the COCO dataset: cocodataset.org

Maybe it’s useful to look at an example — one model that you’ve looked at?

NS:

We went through each ml5 model. One thing that was interesting was how Google publishes its model cards. To find information about individual models is extremely hard. We spent an extraordinary amount of time looking for information, wondering how they made it.

A lot of them declare limitations in the dataset — that’s a good first step. It’s useful to know the limitations of the model you’re using — it might not only be skin color, but people with disabilities. Does a pose estimation model detect someone sitting in a wheelchair? Those models are based on distances and key points.

AZ:

We are checking these things ourselves. Ideally we need an automated process of testing. With Google — most of the base models used by ml5 were developed by Google — the relevant information is usually buried in a lot of documentation. They do have model cards, and do some fairness testing, across 17 categories, and they have a benchmark: if the difference in confidence in the models in detecting a certain type of people is larger than a certain percentage, then it’s not fair.

I think only two or three ml5 models use publicly available datasets. The rest use Google datasets that don’t reveal their contents — for privacy reasons. Some, for example, was collected from XR and AR data — we presume that’s from people’s phones. So of course they’re reluctant about revealing they’re using this type of data.

We don’t have a reason to completely distrust them. We focus on informing, and encouraging people to think about these things — on the ml5 website, for example.

Image labeled “Hat with a wide brim” from ImageNet dataset: image-net.org

So is it primarily students that you’re doing this for?

NS:

Yeah, basically people trying to pick up machine learning individually. Anyone who’s playing around with it. It’s quite empowering to know what you’re using, knowing what problems there are. Not everyone can create their own dataset.

And is there such a thing as an unbiased dataset? Can a university create benchmarks for datasets? Recognizing that we cannot easily check our own biases. We can critique our own work, as well as other peoples’.

Coming back to an example, the sentiment model was one I was very surprised about. They used an academic model, and on the academic website they acknowledge that it is systematically biased. For example, females tended to express higher levels of emotion, both positive and negative.

In that case, it had pulled data from IMDB, YouTube, Google Images. Doing just a cursory search, one doesn’t always agree with reviews.

Do you think an unbiased model is even possible?

NS:

There are steps you can take: explicitly including diverse perspectives, for example. I personally don’t think it is possible. But something that is crowdsourced rather than scraped can bring more intentionality to the data. Anya, what do you think?

AZ:

This was one of the questions we asked ourselves. Do we want a model to reflect the “real world”? The real world is very biased. Or do we want some type of ideal world?

Of course there are lots of different dimensions. Race and gender are most frequently mentioned, but as Neeti mentioned, there are people with various disabilities. An enormous amount of things that can be easily overlooked.

So I agree with Neeti that one of the key things is getting as many perspectives as possible — not dismissing some. Maybe prioritizing some, maybe not, depending on the process.

The related issue is how images are obtained. We all know that a lot of data is obtained unethically, without consent, sometimes even illegally.

Thinking of that ideal world instead of the real world, could we create synthetic datasets that are somehow balanced?

NS:

A number of academic papers talk about bias at each stage. Understanding the end goal of your model helps you see where you might be perpetuating bias. There are real-world biases and historical biases, and you can address these through labelling and categorization. On an individual level, I can look at what is a “right” or “wrong” image, and plugging in weights.

We found a lot of examples in the legal and medical domains. These have a larger impact when they’re used. If we’re dealing with a particular population, should we have a dataset that is only about that particular population, instead of a one-size-fits-all model?

AZ:

We thought a lot about intentionality. Again, understanding what your model is intended to do. How do I state its use in an intentional and clear way? Which things do I highlight? How should it be used?

For ml5 it’s nice because it’s generally noncommercial. The decision processes are crowdsourced. If somebody wants to use a face mesh model for commercial purposes, for example, there has to be a consensus among ml5 community members, from what I understand. ML5 already has an explicit intended use — educational, in classrooms or for personal learning.

Image labeled “crutch” from ImageNet dataset: image-net.org

What are you doing with ml5, or any kind of machine learning, at the moment?

AZ:

I’ve been learning how image generation works. And the synthetic datasets I mentioned — something we can test with ml5. To lead by example. We experimented a great deal with ml5 when we were researching.

NS:

Once we started to share this with people, I’ve had a few people ask me what better datasets they can use. For example, there’s another research resident at ITP who’s creating a drawing tool, but how could that work without the use of your hands? Voice recognition is impossible in that case.

I don’t think there is any way to make an unbiased dataset. When someone says, “Give me an image of xyz,” you can add the word “diverse” or a certain population. But doing that, you are creating a different kind of bias in the models. The things that we are doing in this project, someone pointed out, are also biased — decisions and choices that we’re making, based on a certain understanding of how a model works. By surfacing one thing, you hide something else.

What else are you working on these days?

NS:

I’m an artist, currently making a video game about caregiving and dementia. It’s a similar approach: thinking about accessibility and depiction of disease, in that case.

I’m also working with a professor on grading rubrics. Asking students to meet a certain benchmark can be unrealistic, because we all have our own benchmarks and who’s to say that one person’s hour of work is equal to another person’s? How do you evaluate labor?

AZ:

I’m working on AI and education. AI will transform education in ways we don’t know yet. It’s different from the internet, because the internet couldn’t generate content in the same way.

It’s about time for some things to change in universities anyway. There might be more oral assessments. There might be more emphasis on connections — interpersonal relationships, collaborative environments.

Zooming out, what’s your general impression of AI these days? Are you generally positive or negative? Are there interesting things you see happening?

NS:

The “move fast and break things” approach is not a good idea. We saw that with some of the Google models we looked at. A lot of things get left behind and overwritten, rather than carefully thought through, in making models “better” or more efficient.

A People’s Guide to AI is great for this — it takes you through all this in a wonderful way, for learners of all levels. Maybe it’s more important to bring people together around these topics, than to keep throwing more money at it.

AZ:

Companies will use AI to replace human labor. I don’t think that will stop. There will be new professions that emerge. In the next ten years, we might see 80 percent of people in tech working on AI. It’s a great technology; I use it every day.

That means I’m pretty sure all the big companies know everything about me, from the personal data I’ve put in. But it’s not only me, of course. And this question has been around since the internet came along. Is it better to give our data to China or to Elon Musk? Choose your evil.

We can inform people, we can make informed decisions. So I’m pretty optimistic. Look at DeepSeek — it uses much less energy, it’s free and open source. I do think that AI should be kind of like the internet — almost a human right. But with some restrictions, consent and so forth.

NS:

It’s not just AI that’s biased.

--

--

increasingly unclear
increasingly unclear

No responses yet