Women in Big Data - Podcast: Career, Big Data & Analytics Insights

13. Bias in the AI Lifecycle - A Talk With Veronique Van Vlasselaer (SAS)

January 30, 2024 Help To Grow Talk Episode 13
Women in Big Data - Podcast: Career, Big Data & Analytics Insights
13. Bias in the AI Lifecycle - A Talk With Veronique Van Vlasselaer (SAS)
Women in Big Data: Career, Big Data & Analytics
Hey there! Support us in helping create great content for listeners everywhere.
Starting at $3/month
Support
Show Notes Transcript Chapter Markers

Listen, and get insights into Bias in the AI Lifecycle in this talk with Veronique Van Vlasselaer, Analytics & AI Lead South, West & East Europe at SAS. 

We talk about: How to define Bias in AI and the different types of Bias in the AI Lifecycle (Data Bias, Algorithmic Bias, Decision Bias); The AI Incident 'Child Care Benefits' at the Dutch Tax Authority; The Algorithm Register of the Dutch Government; AI Developer Training; and the Assessment List for Trustworthy Artificial Intelligence (ALTAI) for Self-Assessment.

Guest Info


Resources

Support the Show: Hey There! Become a supporter, and help us create great Women in Big Data content for listeners everywhere: 

Support the Show.


Mentoring Program - Women in Big Data
Mentoring is essential to success at every stage of a women’s career, both as a mentee and mentor. The many WiBD mentoring programs are open to WiBD members and cover opportunities for junior, mid-career, and senior women in technology. Not yet a member? No worries. By joining a mentoring program, you automatically become a WiBD member. Both membership and mentoring are free of charge.


Website: Women in Big Data Podcast
LinkedIn: Follow - Women in Big Data
LinkedIn: Follow - Women in Big Data Brussels
Contact us: datawomen@protonmail.com

00:00 - Intro
Hey, hello, welcome to the Women in Big Data podcast, where we talk about big data, analytics, and career topics. We do this to connect, engage, grow, and champion the success of women in big data.

00:17 - Veronique Van Vlasselaer
"I would like to point everybody to the International AI Incident Database. I don't know if it's a well-known database. But it's definitely a very interesting source where the database collects all kinds of AI incidents. Incidents that happened with AI and summarizes it for us."

00:36 - Desiree Timmermans
In this episode, we talk with Veronique van Vlasselaer about Bias in the AI Lifecycle and we cover: data bias, algorithmic bias, and decision making bias.

Veronique is the Analytics and AI Lead for South, West, and East Europe at SAS. And she's a true data science enthusiast.

Let's start.

01:02 - Desiree Timmermans
Veronique, welcome. We are excited to have you on the podcast.

01:07 - Veronique Van Vlasselaer
Thank you. I'm very excited to be here.

01:09 - Desiree Timmermans
We're going to talk about Bias in AI: data, algorithms, decisions. Can you define for us bias in AI and the different types of bias?

01:19 - Veronique Van Vlasselaer
Yes, but maybe before we go to the dark side of AI, I think we need to stress here in this conversation that AI is wonderful and that AI is here to augment our lives, improve our lives, and add value to our lives.

But we have to be honest: AI has a dark side. AI has some bias, and actually, AI is very sensitive to bias, often in a very unintentional way. That means that AI is making decisions, unfair decisions. Decisions that are not done in today's society, that are treating people differently based on their background, based on their gender, based on their origin, their skin color. Then we're talking about biased AI systems. And that's, of course, something that we need to capture, and we need to deal with that.

02:04 - Desiree Timmermans
If the data is not good, I can imagine that then the algorithm is not good, then the decisions, for instance, I would like to make as a business leader, are based on all this bias. How does that work?

02:16 - Veronique Van Vlasselaer
So bias can actually enter an AI system during the different stages of the AI Lifecycle. And in general, we classify it into three big types of bias.

The first bias, as you mentioned, is data bias. AI systems are actually already doomed to be biased before they're even created. Why is that? Well, because they learn from data. They use the data to learn their logic. And the problem is, If we look a little bit deeper under the surface, this data is actually created by all of us, by humans. And we, as humans, as good as we try to be, we are just biased by nature. We have to be very honest; we are biased by nature. And this bias is intrinsically present in the data; those AI systems are going to use that data to learn their logic. The problem is that implicit bias in the data - that we put in there - is explicitly translated by the AI systems. So that's the first time, even before an AI system is created, bias is already there.

The second type is algorithmic bias. When we create our AI systems, when we develop our AI systems, and even if we use the most neutral data, if we use the most innocent data, by using assumptions that we as AI developers are applying, the most innocent AI systems might get biased. And again, it's because of the assumptions that we as an AI developer take. And there is nothing wrong with that, but we just have to be conscious about those assumptions. And, of course, the potential impact of those assumptions.

And then the last part of where bias can enter an AI system is what we call interpretation bias. You have to see it as if an AI system is trying to tell us something and that we are misinterpreting it completely. So the AI system is telling us something, we are not listening, and we're just taking the results, and we're making our own truth about it. And that's, of course, also very dangerous type of bias.

Now, in summary, I hope that you realize that, actually, it's the human in the loop that is the source why AI systems are biased. And for me, sometimes it's a little bit surprising and a little bit disappointing because we're always pointing the finger to AI systems. We're always blaming the AI systems to be biased, but actually just because of us of humans.

04:47 - Desiree Timmermans
So, humans are the bottleneck because we create the data, we create the algorithm, and we make decisions that are biased based on the input that we did feed the algorithm.

05:00 - Veronique Van Vlasselaer
Yeah, we have to use AI systems as a mirror for us, for our biased way of being designed.

05:07 - Desiree Timmermans
I understand. And now we are talking about all this bias. Veronique, do you have an interesting example that you can share with the listeners?

05:15 - Veronique Van Vlasselaer
I would like to point everybody to the International AI Incident Database. I don't know if it's a well known database, but it's definitely a very interesting source where the database collects all kinds of AI incidents. Incidents that happened with AI and summarizes it for us.

Before we think: oh, AI incidents that's something that doesn't happen here. Well, that's not true. One of the more prevalent cases in the AI incident database actually didn't happen in Belgium, but happened in one of our neighboring countries in the Netherlands. And if you ask people from the Netherlands, they will definitely know about which we're talking. It's about the childcare benefits scandal.

What happened? The Dutch tax authority, they have used an AI system to decide whether you should be inspector or whether you're suspicious to fraud because you received a social benefit, a social allowance. It turned out that thousands of families were falsely accused of fraud, falsely accused of receiving childcare allowances. The worst thing is that most of those families, they came from very specific minority groups in the Netherlands: minority groups that have a dual nationality. And that's, of course, pure discrimination by an AI system.

And I'm always start wondering why? How is it possible that an AI system, that is actually purely a mathematical, a neutral mathematical statistical system, how is it possible that it discriminates? And again, if we look back to the three pillars of bias, or where bias can enter, this example is the best example of data bias. The reason why this AI system is biased is because it learns its logic based on data: data coming from past inspections executed by human investigators, human inspectors, that focused on one specific minority group. Which minority group? Of course, the minority group with a dual nationality. And of course, if you focus your inspections on that minority group, of course, you will find fraud. But if you don't focus your inspections on the other part of the society, of course, you will not find any fraud there. And what the AI system did is basically just copying our own human behavior. And yeah, it amplified it. And now everybody's pointing to the AI system. We cannot use the AI system, but it's not the AI system. It's us humans that made this AI system discriminative.

07:55 - Desiree Timmermans
I can understand, but how is it possible that they did not detect this in the Netherlands at the government?

08:04 - Veronique Van Vlasselaer
The problem is that AI is already a very old technology. So AI already was launched around 1950. But we are now, since a couple of years ago, we are now experiencing the first real applications with AI. And we have to be very realistic about it: we're still in our infancy and we still have to learn. While we focused the last 10 years on the development of AI. So technology has extremely increased. There are many opportunities for the moment. So there are many applications of AI, but we focused mainly on the development of those AI systems. Now we start to realize that the AI systems that we developed can still be improved a lot. We have to start learning about what went wrong and making sure that we do not make the same mistakes in the future.

We are talking today about bias, but there are so many other flavors of where AI can go wrong where we don't create trustworthy AI. I think about data privacy, for example, or the transparency of your AI system. So there are many other flavors of where it can go wrong. And I think it's important for every person that develops or that helps to create or uses A I systems that we're aware of what can go wrong.

09:25 - Desiree Timmermans
I agree with you. And we were talking about this case at the Dutch government, about child care. So what did the Dutch government do once they detected this?

09:37 - Veronique Van Vlasselaer
It was of course a very big scandal and they had to take some precautions. They made sure that The AI system, first of all, was temporarily taken offline so that those decisions were not based on the AI system. And then, of course, they worked a lot on how we can avoid those problems in the future. One thing that the Dutch government did, or is doing, is they're building an algorithm register where they actually describe the different aspects of their AI system. And they also mentioned how they test their models. How they make sure that this discrimination that happened in the social benefits scandal, how this will not occur anymore in the future.

10:20 - Desiree Timmermans
Okay. And this is a public register?

10:23 - Veronique Van Vlasselaer
Yeah. So this algorithm register, that's the only one that already exists in Europe. It's actually a nationwide initiative for the Netherlands. And it forces public institutions like cities to report and publish all the algorithms, all the AI systems that they use to support the decision making process or that are solely relying on AI systems.

Every citizen has the transparency and knows which decisions that impact them are based on AI systems. I think it's a good cause because in that case, you as a citizen, you know: this is not only a human behind the scenes that is making a decision about me, but it's also an AI system. And most of the algorithms that are in that algorithm register are of course not only making the decision. It's always a combination between man, and machine, and the AI system.

11:17 - Desiree Timmermans
So that's a really good thing that they developed this. So is that something, a kind of a best practice that can be done by more countries?

11:27 - Veronique Van Vlasselaer
How it is set up for the moment is that every public institution, every city has its own algorithm register.

I'm a little bit critical about that because there is no consistency yet across the whole nation, but I think it's a very good idea. And, we should definitely think about it, but maybe it's an idea that Europe has to support: that there is some standardization, some consistency in how this is reported, and also what kind of elements are described about the algorithm or what have to be described. Which information do you need to provide?

An example from the city of Amsterdam is that they have an algorithm that helps to prioritize which holiday homes should be inspected first because there might be some illegal activity. What is reported in the algorithm register? Well, first of all, they describe what the algorithm exactly is, why it is there. So also it motivates it: why there is an AI system put in place? And then it describes on which data it is built. So which data is used? The data is, of course, not made publicly available, but you know which data is used and what the elements are in the data on which the AI system based its decisions. And then, and this has, I think, fully to do with the child care benefits scandal. Then there is a separate subsection for each algorithm that describes which metrics, which measures they have taken to make sure that the algorithm is not discriminative. And that's, of course, a very important section.

13:01 - Desiree Timmermans
Do you know a bit more about that section?

13:03 - Veronique Van Vlasselaer
So why is that algorithm for the illegal holiday rental not biased? Well, because they have tested it on two elements: on the origin of the owner of the property, and the zip code. And then I'm thinking: okay, wonderful that you have checked it against the background, so the origin and the zip code. But that's not all the potential bias that can be swooped into the AI system. We actually have to go much further. There are so many factors that might influence bias in AI systems. Personally, I would test the algorithm if it makes a distinction between men and women. That would be my first guess. But other people belong to other groups and they will test it for their group. And this is, I think, something that we have to put in place. We have to develop teams that have a different view on the reality, that have different angles to which they look to the AI algorithm. And only in that case, we can improve the algorithm.

14:06 - Desiree Timmermans
I fully agree with you. And we already talked about conscious bias, but each developer, each of us has also unconscious bias. And as you said, we can mitigate it by creating more diverse teams. Are there other things that we can do?

14:23 - Veronique Van Vlasselaer
So during the development of AI models, you mean AI systems?

14:27 - Desiree Timmermans
Let's start with that, yes.

14:29 - Veronique Van Vlasselaer
So one big issue is that AI developers are trained, and they often don't know it, but they are trained to put bias in AI systems. Because an AI system learns its logic by finding general patterns in data. And actually, as an AI developer, you are trained to find general patterns. That means that we don't want to develop an AI system that says: ah, if your name is Veronique, then you're definitely not a fraudster. Or if your name is Veronique, we should offer you the highest discount code ever. That's something that we don't want. We don't want that AI systems will focus on our first name or on first names. And in the AI world, we call that the problem of overfitting.

What we do want is that AI systems are finding general patterns. For example, the profile, or the behavior of Veronique that fits perfectly this specific profile. And this specific profile, if we look in our historical data, it turns out that this profile is typically not associated with fraudsters. Or, this is a profile of Veronique, this is the behavior of Veronique, and she fits this specific profile. And it turns out that if we offer a very high discount to that specific profile, that it works the best. This is what we want from an AI system. And that's what we call generalization. And I hope that you already see where I'm going. If we overgeneralize too much, then we will ignore, we will neglect specific patterns. And it's especially people that have very unusual tastes, that have unusual behavior, that have unusual characteristics, that are mistreated by those AI systems. And it's because of us, because of developers, because we can play a little bit with this generalization level. If we overgeneralize too much, we will ignore specific people, minority people, and that's of course something that we unintentionally do, but we do it.

16:30 - Desiree Timmermans
So, that's also something that needs to be included in the education of developers more and more. It's not only the general patterns, but you also have to look at some specific things because there will be people otherwise who will have a disadvantage.

16:48 - Veronique Van Vlasselaer
Exactly, but it's a very difficult balance to make. It's very difficult to make a choice between over- and under- generalization. Now, the good thing is that a couple of years ago, there was a big shout out of industry practitioners, AI developers, that said: you know what, I know that we are creating unfair AI systems; we know that we are doing that, but it's not our intention; we don't want to develop unfair AI systems; so please help us with finding some techniques, with inventing some techniques that help us to identify if our AI systems are unfair and how unfair they are.

And I have to say the last couple of years, there has been done an enormous amount of research or finding techniques, finding measures, finding metrics that help to assess the fairness of AI systems. Many software vendors, they have already a whole range of capabilities in their software like SAS that can just be used. And my advice is: use those techniques. As an AI developer, use those techniques because they will only help you with better assessing how fair or how unfair your AI system is.

What I really like in this world today, where we create not only accurate AI systems, but also responsible AI systems, is that we start to see some legislation around it. There is a very good source from the European Commission. It's the EU Assessment List on Trustworthy AI. And it's a list with more than 100 questions. One of the categories is about fairness and bias in AI. And it actually is a very good guideline to be very critical about your AI system. It really helps to evaluate and to assess whether your AI system is fair, is biased, what actions you need to take or what actions you didn't take. So I think from a European perspective, we can be very proud that something like this exists and can really help us in creating not only accurate AI systems, but also responsible AI systems.

18:59 - Desiree Timmermans
And what really helped you at the beginning? Which resource did help you a lot?

19:06 - Veronique Van Vlasselaer
The resource that helps me a lot and makes it a little bit more funnier is that AI Incident Database. Go have a look at that AI Incident Database. It's so interesting to read about AI systems and it really helps you to gain some insights in what's happening behind the scenes.

19:25 - Desiree Timmermans
I understand. So when you understand why something went wrong, you can also focus on to mitigate that and make sure that it goes well.

19:36 - Veronique Van Vlasselaer
Exactly.

19:37 - Desiree Timmermans
I think that's really good advice. So I would say, Veronique, thank you very much for sharing your expertise. It was really interesting. So, thank you very much.

19:48 - Veronique Van Vlasselaer
Alright, thank you.

Outro 
Thanks for listening to the Women in Big Data Podcast. For more information and episodes, subscribe to the show or contact us via: datawomen@protonmail.com 

Tune in next time! 

Intro
The Bias in AI Definition and the Different Types of Bias in the AI Lifecycle: Data Bias, Algorithmic Bias, and Decision Bias
The AI Incident 'Child Care Benefits' at the Dutch Tax Authority
The Algorithm Register of the Dutch Government
AI Developer Training
The Assessment List for Trustworthy Artificial Intelligence (ALTAI) for Self-Assessment
Outro