AI for Nine Earths

increasingly unclear
21 min readJul 14, 2024

--

Here’s one way AI could directly address climate change

Image from Nine Earths by D-Fuse, used with permission

Concern is growing about the environmental impact of artificial intelligence technologies. For many applications, like the Large Language Models that power chatbots, accuracy increases along with the size of the model. And bigger models mean more data and more computing power, which requires more energy, as well as water to cool down all those computers in data centers.

AI promises huge gains in what we do, and in how (and how fast) we do it. But so far, the costs (financial as well as environmental) outweigh the benefits.

There are ways to mitigate this impact. Smaller models that run locally on devices can perform some tasks better than larger models. The big AI companies are investing in sustainable power sources for all those data centers. AI increasingly plays a role in studying and monitoring climate change. That creates a weird paradox: use more resources in order to reduce resource usage.

I’ve been working on a different approach: Could AI play a more direct role in reducing our consumption of resources, by helping us understand what and how we consume?

This article takes off from a previous one, in which I first introduced the possibility of an AI ethnographer. In this article, I’ll tell you about the work I’ve been doing on our project called Nine Earths, working with artist collective D-Fuse. I’ll focus specifically on the ethnographic work I do for the project, and how AI might play a role as a kind of ethnographer itself.

To introduce the project, I’ll go into some of the background research that went into my work. Then I’ll talk about ethnography, and the AI I’ve been working on that could do some of the things human ethnographers do to help different cultures understand each other. Throughout, I’ll share some of the things I found out from my work.

The big picture

A note here — in this section I get into some fairly political views about climate change. Feel free to skip to the next section, if you want.

Climate change is an abstract concept that’s difficult to see all at once, because it’s so big and because it takes place over long time scales. But it’s grounded in material reality: humans are part of nature, and so are fossil fuels.

One reason it’s so difficult to address in practice is that fossil fuels are so embedded in our contemporary lives (in most cultures) that they pervade everything to such an extent that it’s hard to think beyond them. Just about everything around you right now either contains, or was produced with the help of, fossil fuels.

The academic Heather Davis points out that fossil fuels are designed and marketed to be universal and invisible. “That new car smell” for example is the smell of success, pleasure, progress, wellbeing. And it’s the smell of plastic — one of the most important products created from fossil fuels. It’s associated with cleanliness and democracy, though consumer choice. That’s good marketing.

Don’t get me wrong — plastic is an absolutely amazing innovation — it’s strong, lightweight, preserves perishable items, is reusable, and it lasts a long time. A really long time, we now know. And we know that it breaks down but not completely — at a molecular level it persists in tiny particles. When I say fossil fuels are deeply embedded in our world, this is quite literal, since plastic is in all of our bodies, not to mention in the rain and the oceans and many other places.

So we’re all supposed to use less plastic, fewer fossil fuels, less stuff in general, and to be more “sustainable”, whatever that means. This is supposed to apply to companies — including oil companies (sorry, rebranded as “energy companies”).

I didn’t set out to be “political”, but it’s inevitable because the facts are so obvious that it’s impossible to be neutral on this. (I recognise that my views reflect the sources of information I’ve been exposed to.) The best we can do is to not think about climate change most of the time, which is what most people do. Everyone draws their own line somewhere, and everyone is different but also somehow the same — that’s one thing I learned working on this project.

You know that the “carbon footprint” is a marketing term created by British Petroleum, right? It was invented to shift responsibility onto us, “consumers” and away from producers of fossil fuels and their emissions.

Meanwhile, as my colleague Tomislav Medac points out, “The globalisation of capital flows and the concerns over energy security have made it impossible to tax around 100 carbon majors [yep, BP is one] that are responsible for 63% of all historical emissions and 71% of all emissions since 1988.”

You can say that’s a politically biased background to my work, but these are the sources I kept notes on.

Image from Nine Earths by D-Fuse, used with permission

Nine Earths

Nine Earths refers to the fact that some countries use up to nine Earths worth of resources each year. Qatar is the highest: if all countries used the resources they do, we would need nine Earths worth every year. The rest of the countries use less — you can look up by country here.

That includes both producers and consumers, but our project focuses only on consumption. The aim is to depict a range of countries that consume various planets worth of resources.

The way we do that is simple and counterintuitive — simply by showing normal people going about their everyday lives. People are tired of seeing smokestacks and polar bears; all that abstract, faraway, big-picture stuff about climate change just creates more anxiety, or boredom. Research shows that people prefer to see other people, human scale, and stories.

One study found that information “conveying that one’s country has a significant carbon footprint leads to a sense of collective guilt, and such feelings predict willingness to support sustainable causes and actions.”

But simply comparing countries is too simplistic. While there are big differences in resource usage between countries, inequalities within societies are staggering, reports Medak. In the US, the top 1 percent of carbon emitters releases a hundred times the world’s average. Across the globe, the top ten percent emits 45 percent, while the bottom 50 percent emits just 13 percent.

Those numbers don’t paint much of a picture, so consider that “the inequalities of emissions are roughly following the same distribution pattern as inequalities of income,” Medak writes. “Yet, the costs of living are only a small proportion of costs for high-income groups, who can also easily avoid ecological taxes by buying less-polluting technologies.”

This is another thing I learned in the project — that you can’t make generalisations about people or places. Just London alone is so diverse.

Image from Living on Nine Earths by me for D-Fuse, used with permission

Zooming in

Given our aim to focus on average people as consumers, our intention was not to point fingers at some as being heavy consumers, or uphold others as “good” consumers. It was simply to show different stories of average people, viewed through the lens of consumption.

For me, conveying statistics and quotes about resource usage and consumption becomes preachy. (It’s also an approach that AI could easily take up.) So with my anthropologist hat on, I simply wanted to know how people understand consumption, within their own cultural contexts. A classic way to do this in anthropology is to compare what people say with what they do, to see how their stated beliefs and intentions match up with their actions.

Social scientists have studied consumption patterns over the decades, and found various dimensions that can be investigated with different people and contexts. For example, how people find information about the things they get and use, how they decide on one or another, the behavior they engage in when adopting, using, and disposing of it [source].

I don’t like the word “consumer” because today it almost always refers to someone who buys something to use — to literally consume, and dispose of. Marketing research is full of economic theories about how various groups of people shift between consuming goods, services, or ephemeral experiences.

But if you look beyond consumption as a solely economic activity, you can gain insights from, for example, archaeologists who dig through people’s garbage to uncover the things they’ve used and dispose of — whether they purchased or traded for them, made them themselves, or adapted or reused something.

In ethnography, the sub-field of anthropology that I chose to work within for this project, we can expand that from material culture, as archaeologists study, to compare what people do with what they say, by observing them and talking with them. Specifically, we can explore dimensions like social influence, individual actions and attitudes, tangible experiences, and habits that people form and follow [source].

It was especially those habits that interested me, as “behaviors that persist because they have become relatively automatic over time as a result of regularly encountered contextual cues” [source]. Journalist and anthropologist Gillian Tett points out that “if we want to solve personal or collective problems, it sometimes pays to look to physical rituals and habits for the answer,” because habits are basically physical actions that we repeat without really thinking.

(Habits are similar to rituals, but with some important differences. I hope to discuss rituals in a future article.) We can observe such habits and ask people about their routines, and that’s exactly what we did in this project.

My work specifically involved observing these habits in video that we collected. We commissioned artists in 12 diverse countries to film an average day of an average person. I looked through all this video to detect patterns (you can see how AI can come in) across cultures and countries. I grouped these patterns in themes we developed with climate scientist Mark Maslin: transport, water, waste, food, energy, goods, services, care, communication, nature. Then I combined video clips to make films, like Living on Nine Earths.

You can see how I’ve been progressively narrowing my focus — starting from climate change as an overarching subject area, to look specifically at consumption practices by individuals, and more specifically at the habits they develop, often without thinking (and, as Tett says, to replace thinking). Now let me zoom in even more, so that I can go into depth instead of breadth.

We focus specifically on people between 18 to 34. A 2021 study found that more than half of young people said climate change made them feel powerless, and that humanity is doomed. 39 percent said climate change made them hesitant to have children.

We also focus mainly on cities. In my film, you’ll see Beirut, Dubai, Ho Chi Minh City, Seoul, Tokyo and more. This was partly out of convenience, since that’s where our initial contacts were (in research this is called “convenience sampling” — a convenient way to take a sample of participants in order to say something about the larger population of that group). These initial contacts led us to others (“snowball sampling”).

Cities were also a useful place to focus because more than half the world’s population now lives in them, and they also produce more than two-thirds of the world’s carbon dioxide. Unsurprisingly, transportation makes up the largest proportion of that.

Within this group, we look at what is called “active consumption” — the things people have control over. This means making choices (even if those choices might be constrained by what’s around and available). It means looking at things like people’s involvement versus detachment, and at individual and collective values, situated in particular socio-economic conditions [source].

Taking all that into consideration, I try to say something about global consumption. Not through comparison, just by depicting a multiplicity of individuals, cultures and activities. As the anthropologist George Marcus said, the global is in the local.

Image from Living on Nine Earths by me for D-Fuse, used with permission

De-familiarization

Ethnography is a way of making sense of the relative messiness of the world — usually of a particular culture that’s not your own. It’s about making the strange familiar, but also making the familiar strange.

That generally involves asking questions, like “What’s going on here?” The anthropologist Clifford Geertz called it ”reading a culture over someone’s shoulder.” The main way we do that is by engaging in what Geertz called “thick description” — noting as much detail as possible about a particular event, place or scene.

A great example is the book Laboratory Life by Bruno Latour and Steve Woolgar. Here’s an excerpt:

6 mins. 15 secs. Wilson enters and looks into a number of offices, trying to gather people together for a staff meeting. He receives vague promises. “It’s a question of four thousand bucks which has to be resolved in the next two minutes, at most.” He leaves for the lobby.

6 mins. 20 secs. Bill comes from the chemistry section and gives Spencer a thin vial: “Here are your two hundred micrograms, remember to put this code number on the book,” and he points to the label. He leaves the room.

This is “anthropologist from Mars” type stuff — quite basic descriptions by an outsider. AI can do this pretty well, at a certain level: with some (culturally specific) training, it can recognize objects, people and even what they’re doing, to some extent. But the way it does that is purely technical and statistical: by breaking down an image into small parts, then detecting edges of things, then combining these to look for patterns it recognizes, along with some degree of confidence.

It can communicate this level of confidence, but what if it could also tell you when it’s not sure or something, or needs more information? Can you imagine ChatGPT asking you questions, instead of the other way around?

Asking questions instead of generating hypotheses places ethnography closer to art than science. Like art, it’s self-reflexive: The ethnographer doesn’t aim for scientific objectivity but embraces and makes clear their own subjective position. You start from where you are, and acknowledge who you are.

Since every observer observes from a position (both geographical and ethical), observing, describing, “studying” another culture becomes a political act. Do you like what you see? When ethnography got started a century ago, the ethnographer was seen as a scientist who described “exotic”, faraway cultures to scientific peers back home, aiming for authority and objectivity.

No longer. We may still study cultures very different from our own, but this has become a process of interpretation, negotiation and co-construction with people in that particular culture. The ethnographer admits being somehow inside and outside that culture at the same time, and is never a completely disinterested or detached observer.

You could say that AI systems are in a similar process of learning about human cultures. They’re outsiders, but created and fed by humans, they also embody certain cultural assumptions gained from their creators and their datasets. In order to embrace some sort of self-reflexivity would mean making those cultural assumptions transparent, for a start.

Image from Nine Earths by D-Fuse, used with permission

Getting away from words

Filming and photography have been used in ethnography from its inception, and Visual Ethnography developed as its own sub-field, focusing on analyzing directly observable movement and behavior, how people make and use various tools and objects, and how they communicate in nonverbal ways. We can record things not easily captured by language, and we can compare those with what people say.

In this sense, multimodal AI models are a step forward.

But of course images are subjective too — cameras can see some things better than others, and the frame can only contain so much.

Ethnographers already use thick description to describe what they see, sometimes as a simple inventory of items or behaviors, sometimes only the most notable ones. Images and video can help with this because, unlike in life, you can freeze the action, zoom in, and rewind in the case of video.

This is exactly what I did in viewing the Nine Earths footage. What is that object? What is that person doing? Where are we? Having video helped a lot: I could pause and rewind, and having a moving camera means you often get different perspectives on an object or scene. Oh, that’s a coffee roasting machine. My “field notes”, full of as many details as I could note down, are full of stuff like this — along with my openly subjective opinions. That reminds me of Asian grocery stores we have in California.

You could imagine AI asking questions like this. Context is important here. Today it can refer to how much of a conversation a chatbot holds in memory as you’re conversing with it (its “context window”). But I’m talking about real-world context — who, what, where, when. AI can infer some of this, and such inferences come with some ethical issues.

Another source of subjectivity in our case is the person doing the shooting. Who were they? What were they thinking? Why shoot this and not that? In most cases, the videographer filmed someone they know well. In a few cases I caught a glimpse of the videographer, when they entered the frame to adjust something, or caught their reflection somewhere.

In a few cases, the person doing the shooting and the person being shot were one and the same. You can see this in the film, when one person is seen hailing a taxi, then tells the driver to wait while he comes to retrieve the camera.

That made clear an important issue of staging: In all cases, participants are aware of the camera, and might adjust their actions, appearance or language accordingly. It makes clear that this is a performance, and raises the question of how much of our lives in general — filmed or not — constitute a conscious performance. The answer, I think, is: almost all.

How could AI possibly pick that up? It knows nothing about fact and fiction, only what it’s seen before.

A classic visual ethnography is called Navajo Film Themselves, produced in 1966. A couple of researchers taught Navajo people some basics of operating a camera (this was way before smartphones and even digital cameras), and asked them to film their own lives in the ways they wanted. What the researchers got back confused them, because it didn’t conform to Western film conventions. But looking closer, that was exactly what was so interesting — a culturally different way to approach filming.

We could compare that to the video footage in our project, which was shot in all cases by a person local to each culture. That was arguably better than someone like me travelling to all these places. But in our case, the videographer was an artist or filmmaker with some technical knowledge. Our project also took place in a fully globalized world. Not only was our footage filmed in vastly different places, there is much more influence of one place on another than in 1966 — primarily Euro-American influence on other places, but not only (I discuss this below).

By conducting my visual ethnography remotely, I undoubtedly missed a lot. On the other hand, I didn’t influence what was shot. I worked on a previous project in which we introduced new technologies to remote farming communities in Kenya, and whenever European researchers went there (as opposed to researchers from the region who understood local languages and customs), there would be be formal ceremonies for the visitors; we couldn’t get as much actual fieldwork done.

Images, including documentary-style video like we commissioned, are indexical — we intuitively see them as a direct representation of a single “reality” but it’s really just one version of reality, filmed and framed in a particular way. Seeing images as “truth” has led to all sorts of problems and misunderstandings over the years, and this goes back way before social media. Ethnography, including the films I produced, makes this framing and subjectivity and power relations explicit.

In this way, video can be much better than an interview for getting inside someone’s head. People are good at empathizing with other people — imagining what we’d do if we were that person on-screen. (AI can read emotions and actions to some extent.) But it also says something about the videographer’s headspace, in the decisions they’ve made in choosing and framing shots, and how they’re edited.

That’s the reason I put the word “truth” in quotation marks. Journalism aims for a single, objective truth by gathering “facts” and triangulating multiple sources. Art, though — including ethnography — goes instead for what artist John Keel calls “truthiness” — a kind of truth that’s subjective, not objective. One draws conclusions, the other leaves things open to different interpretations. This means that art is not opposed to journalism or science, especially when it aims for more accurate descriptions, and moreover makes its biases clear. They aim for different kinds of truth.

This doesn’t mean that ethnography presents a “true” or complete record of “reality”, any more than journalism does. Showing one thing means hiding another. And showing one thing also means showing something else at the same time. What’s that building in the distance? As viewers, we have to determine what’s most important to attend to. And we have to acknowledge what is not seen in video. The things that people choose not to show, do or discuss are often the things that reveal the most about some social reality.

Image from Nine Earths by D-Fuse, used with permission

Thinking like a machine

I realize I’ve been pretty tentative, qualifying what I say in order to be careful about making too many assumptions or overstating my claims with too much certainty. This is intentional. I believe AI could help me look through, describe, and even make some interpretations from the hours of video that I watched. But my point is, if human ethnographers need to be careful about the conclusions they draw, and AI ethnographer needs to be doubly careful — and human oversight should always be included.

All those qualifications given, what did I actually find out? Could AI find similar things?

On a basic level, I figured that AI could indeed take up a lot of this work, because I found that description can be machinic — especially the kind of thick description I did, describing hours of video. Take a single frame and try to describe everything you see. It can contain so much information that (depending on resolution) you can zoom into every small detail; in real life of course, such detail is infinite. What’s that thing on the wall back there? What’s it made of?

I was reminded of this project by Everest Pipkin. He also watched hours of video — in his case, an actual training dataset for AI systems. Before long, he reports that he felt like he was thinking like a machine.

Now I know the feeling. In my case, I was also describing what I saw. At times it felt like filmmaking in reverse — writing a script after the film was shot. Long shot, daytime, interior…

At the same time, I felt like an actual filmmaker — which indeed I was. I color-coded clips I thought were particularly interesting, and I thought of how I could put them together.

And I learned a lot about filmmaking by working in this way. What kind of shots were “good” in terms of narrative or aesthetics? Which ones were useful for ethnographic analysis?

I started paying attention to camera work. “In feature films,” says artist/filmmaker Harun Farocki, “the camera anticipates. In the documentary, the camera pursues.”

Sound was important, and using headphones made a big difference. In our very visual culture, it’s often said that the power of sound is often overlooked. Take a video clip shot from the back seat of a car, camera pointed out the front window, car full of people. Who’s that speaking on the left? Which one is my main participant? I can’t hear everything being said because of the radio.

It was confusing. At one point, one of the Korean participants went to Vietnam on holiday. One of the Indonesian participants worked in a place that looked like a lab, with lots of chemistry equipment, and I had no idea what was being done sometimes; it enabled me to engage in my own version of Laboratory Life.

Describing gender was a challenge. Not only with gender-fluid individuals, but in terms of how anyone should be described, without sounding biased? This is an ongoing challenge for AI.

You also can’t use adjectives, like “a big container”. What is “big” anyway? It’s all relative. So you need to be pretty precise in your descriptions. Even identifying a container as a “pot”, “bucket” or whatever is culturally specific and may be wrong.

On the other hand, knowing precise names of things like items of clothing can be useful: a tank dress, a polo shirt. In some cases I compared things to brands I know: an ambulance the size of a VW bus.

It was interesting watching the Vietnam footage after I read participants’ interviews; in most cases I watched the footage first. In this case, from the interviews I got a strong impression of close family relations — young people helping to cook and clean ,for example — that conflicted with the cultures I’ve lived in.

So I’d built up a certain impression of places in my mind. But the footage completely confounded my expectations. It’s easy for me to say Ho Chi Minh City reminds me of Palermo because it’s so chaotic. You might understand, if you’ve been to one or both places. But such a comparison isn’t fair to either place, and not all neighbourhoods in either city are like that.

Some places (yes, and some people) were more interesting to look at. A rainy street scene in Ho Chi Minh City simply contains a lot more information than the comparatively bland cityscape of Dubai, where chaos is carefully controlled.

It took forever to go through all that footage. But on the other hand, I didn’t want it to end, I enjoyed the process so much. (Could an AI say that?)

Those are some answers to one of my guiding questions going into this project: How feasible and suitable is remote ethnography for illuminating specific local contexts and making cross-cultural comparisons about consumption practices in relation to climate change?

The broad answer to the first part of that question is: This sort of ethnography was indeed feasible — did I mention that the main reason we did this project remotely was due to Covid? Previously, Michael Faulkner, the project lead and head of D-Fuse, did actually fly all around the world. As a highly skilled designer and filmmaker, he produced amazing video, and produced his own visual ethnography with that footage.

This project was different in at least a couple of ways. Asking other people to shoot for us added another layer of mediation between the participants and the viewers of the final films. And because I did the analysis and editing in this project, I lacked any on-the-ground knowledge that Faulkner gained when he was in all of those places.

That said, I think the footage and the process did illuminate local contexts, viewed through the lens of consumption. Just in different way, as I’ve discussed above. Primarily, you’ve got to account for your own biases, and those of the participants and the videographers, as well as the way the technical limitations frame what you can and cannot do, and what sort of video and information you get.

Image from Nine Earths by D-Fuse, used with permission

Same but different

My other guiding question was: What can video collected by artists and young people in different countries tell us about global consumption? I’ve already provided some answers to that one; here are a few more.

I mentioned the themes we drew up with the climate scientist: food, transport, water, waste, etc. I’ll just address a few of these. Food was a big one, prominent across most of the countries. There were supermarkets in Jamaica and the UK, food markets in Brazil and Colombia, takeaway and delivered meals were prominent in Indonesia and Vietnam. People were shown picking food from personal gardens in Jamaica and the UK. Restaurant meals were seen in Brazil, Japan and Korea.

As I said, it’s impossible to generalize or compare across countries. In many cases, the same individual might cook sometimes and eat out at others. But a few things stood out for me.

Drinks were prominent in Asian countries, whether juice, coffee drinks, bubble tea or something more unusual. Sometimes bought from a truck or stand, sometimes delivered (this was partly during Covid lockdowns), always in a big plastic cup with plastic straw, and often in a perfectly sized plastic bag too. Drinks like bubble tea are now common across the world — cultural influence moves in different directions.

Coffee was ubiquitous everywhere. This may have been in prt because some of our Indonesian participants were connected to coffee production and retail. But in almost every country, people drank a morning coffee, met over coffee, or bought one from a truck or cafe. Again, it’s a complex flow of cultural influence, and it was fascinating to see the whole chain of production, from picking coffee beans to roasting to retail. So fascinating, in fact, that I made a 14-minute film just about coffee.

Mobile phones were ubiquitous. This will come as no surprise — smartphones are now in the hands of a majority of adults in every single country on Earth. Footage from several countries showed someone having lunch alone with a fork or a sandwich in one hand and a phone in the other. I made a note of this, and then I caught myself doing exactly the same thing.

This is directly related to the cultural influence I just mentioned — social media is now the carrier of such influence. So the next time you hear the term “influencer”, think like an ethnographer in terms of cultural influence. Maybe it’s leading to a kind of cultural homogenisation, but maybe it also helps to spread good practices as well as bad or silly or superficial ones.

And that’s the main takeaway. We’re all connected now, we all increasingly do (and consume) the same things, with the same consequences. But as climate scientist Mark Maskin told me, we do those things slightly differently, perhaps using more or less or different resources, perhaps with slightly different consequences. “Where you are matters,” he said.

This is why an ethnographic perspective is valuable. To try and be an insider and outsider at the same time means embracing not knowing, instead of trying to project an air of scientific authority — that comes with its own cultural biases.

Not knowing means slowing down, pausing, possibly rewinding, and noticing details you might have taken for granted. As I was looking through all that video, I was reading the PhD of one of my students, Stephen Dawkins, who produces immersive documentaries. He mentioned the essay film, which uses “slowness to train [their] spectator to see politically”. So slowing down, looking closely, and choosing not knowing over a singular “truth” is a political act. We’ve come right back to the beginning.

Now: can you imagine an AI system embracing not knowing? Slowing things down, and being aware of the political implications of this? That would be pretty different from the AI systems out there today. And that’s what I’m currently building.

--

--