Mitigating Gender Bias in Occupation Classification

With increasing evidence that some outputs of Natural Language Processing (NLP)-based machine learning models may propagate societal biases, we investigate mitigating gender bias in the occupation classification of job biographies. Through the novel use of a graph convolutional network, we find gender bias in occupation predictions is reduced by removing explicit gender indicators such as “she”. However, significant gaps remain emphasising the need for further study into de-biasing this NLP task. A demo of our trained model and analysis may be found on Github.


Related work investigates reducing gender bias in NLP tasks through adjusting algorithms, de-biasing word embeddings and de-biasing training corpora. We focus on de-biasing training corpora. De-Arteaga et al. (2019)¹ find significant gender gaps in occupation classification with bag-of-words, fasttext word embeddings, and deep recurrent neural networks are reduced through scrubbing gender indicators. Extending this study with the novel use of the Yao et al. (2018)² text graph convolutional network (TextGCN), we explicitly utilise information about the global structure of our training corpus and learn the word and document embeddings jointly, thus capturing more latent information.

Our Data

In our experiments, we consider a subset of the BiosBias³ dataset by De-Arteaga et al. (2019), which consists of biographies and corresponding occupation labels taken from the first sentence of each biography. The subset comprises 97,798 biographies with 28 different occupation labels, for which the first sentence is excluded in the classification task. While professor is the most frequent occupation (35,136 biographies), the least frequent occupation is personal trainer (332 biographies). The proportion of female biographies varies from 12.3% (rapper) to 93.5% (dietitian).

Distribution of biographies by occupation and gender

TextGCN Model

The TextGCN takes one single heterogeneous graph with nodes corresponding to words and biographies as input. Therefore, the number of nodes in the graph equals the number of biographies plus the number of words in vocabulary. Any word-document edges are built if a word exists in a document and are weighted using TF-IDF, the term frequency — inverse document frequency. Similarly, a word-word edge is built if a word pair is semantically similar, which is measured by a positive point-wise mutual information value. Our best performing TextGCN architecture consists of a single hidden layer with 200 units. Between the graph convolutions, ReLU activations are used and a softmax function serves as a final classifier. The objective function is a cross-entropy loss over all labelled biographies.

TextGCN architecture


We define our gender gap as the difference between genders in the probability of an individual’s occupation being correctly predicted given their occupation and gender. For example, the gender gap for model is the probability that a female model is predicted model minus the probability that a male model is predicted model by our trained model. For some occupations, we observe a gender gap of close to zero such as journalist. Whereas, for other occupations we observe a gender gap far from zero such as surgeon. In general, it is visible that for occupations with an underrepresentation of women, the gender gap tends to be negative and vice versa implying a positive correlation between gender gap and the gender imbalance. The correlation coefficient is 0.73.

Gender gap against probability of being female by occupation

Removing explicit gender indicators shrinks the gender gaps, as shown in the visualisation below where our line of best fit has a shallower gradient. Further, our correlation coefficient decreases to 0.69. However, gender gaps remain large for specific occupations with more pronounced gender imbalances. Therefore, although removing gender indicators reduces the gender bias in our predictions, there is more work to be done to de-bias this task completely.

Comparison of gender gaps with and without gender indicators


The recent years have seen a large increase in the adoption of NLP-based machine learning methods for a growing number of tasks such as automated decision making, recommendation tasks and reading comprehension. As the presence of these systems becomes increasingly ubiquitous in our everyday lives, they move from being merely passive systems to having a more active effect on society, influencing which media articles appear on a feed or which job adverts one sees. Within recruitment, we have demonstrated occupation classifiers may make predictions with gender bias. Removing explicit gender indicators only goes so far in mitigating this bias. We hope that our article impresses the need for further investigation into de-biasing NLP-tasks. With a focus on de-biasing occupation classification, we will next be performing a comparative analysis of predictions with BERT, ALBERT and RoBERTa.


[1] De-Arteaga, Maria & Romanov, Alexey & Wallach, Hanna & Chayes, Jennifer & Borgs, Christian & Chouldechova, Alexandra & Geyik, Sahin & Kenthapadi, Krishnaram & Kalai, Adam. (2019). “Bias in Bios: A Case Study of Semantic Representation Bias in a High-Stakes Setting.” 120–128.

[2] Liang Yao, Chengsheng Mao, and Yuan Luo. (2018). “Graph convolutional networks for text classification.”

[3] BiosBias dataset, De-Arteaga et al. (2019),

Original post:

35 comentários em “Mitigating Gender Bias in Occupation Classification

  1. I am the business owner of JustCBD Store company ( and I’m presently trying to broaden my wholesale side of business. I am hoping someone at targetdomain is able to provide some guidance 🙂 I considered that the best way to do this would be to talk to vape companies and cbd retail stores. I was hoping if anybody at all could recommend a qualified web-site where I can buy CBD Shops B2B Data I am currently reviewing, and Not sure which one would be the best option and would appreciate any advice on this. Or would it be much simpler for me to scrape my own leads? Ideas?

  2. I am the owner of JustCBD company ( and I’m presently planning to grow my wholesale side of company. I am hoping anybody at targetdomain give me some advice ! I considered that the most effective way to do this would be to connect to vape companies and cbd stores. I was hoping if anyone could suggest a qualified site where I can purchase Vape Shop B2B Mailing List I am already checking out, and Not exactly sure which one would be the very best choice and would appreciate any support on this. Or would it be much simpler for me to scrape my own leads? Ideas?

  3. You’re so interesting! I don’t believe I have read through anything like that before. So wonderful to discover someone with a few unique thoughts on this topic. Really.. thank you for starting this up. This site is one thing that is needed on the internet, someone with some originality!

  4. I’m impressed, I must say. Seldom do I come across a blog that’s equally educative and interesting, and let me tell you, you have hit the nail on the head. The issue is something not enough men and women are speaking intelligently about. Now i’m very happy I came across this during my hunt for something relating to this.

  5. An impressive share! I’ve just forwarded this onto a colleague who has been doing a little research on this. And he actually bought me breakfast because I found it for him… lol. So allow me to reword this…. Thank YOU for the meal!! But yeah, thanks for spending the time to discuss this issue here on your website.

  6. I’d like to thank you for the efforts you’ve put in writing this website. I’m hoping to view the same high-grade blog posts by you later on as well. In fact, your creative writing abilities has motivated me to get my very own site now 😉

  7. A motivating discussion is definitely worth comment. I believe that you need to write more on this issue, it may not be a taboo subject but usually folks don’t speak about these topics. To the next! Kind regards!!

  8. Achieving your fitness goal doesn’t need a certified personal trainer or an expensive gym membership, especially when you have the budget and the space to consider practically every workout machine on the market.

  9. This is the perfect site for anybody who wishes to find out about this topic. You know so much its almost hard to argue with you (not that I actually will need to…HaHa). You definitely put a fresh spin on a subject that has been written about for a long time. Great stuff, just excellent!

  10. Hi, I do believe this is a great web site. I stumbledupon it 😉 I’m going to return yet again since I book-marked it. Money and freedom is the greatest way to change, may you be rich and continue to help other people.

  11. Having read this I believed it was extremely enlightening. I appreciate you finding the time and effort to put this information together. I once again find myself spending a lot of time both reading and posting comments. But so what, it was still worthwhile!

  12. Aw, this was an extremely good post. Spending some time and actual effort to make a superb article… but what can I say… I put things off a whole lot and don’t manage to get nearly anything done.

  13. Right here is the right website for everyone who really wants to find out about this topic. You know so much its almost hard to argue with you (not that I really would want to…HaHa). You definitely put a new spin on a topic that’s been discussed for years. Wonderful stuff, just wonderful!

  14. I was very pleased to discover this web site. I wanted to thank you for your time for this fantastic read!! I definitely really liked every part of it and I have you book marked to look at new information on your blog.

  15. Greetings, There’s no doubt that your blog could possibly be having internet browser compatibility issues. Whenever I take a look at your website in Safari, it looks fine however, if opening in Internet Explorer, it has some overlapping issues. I simply wanted to provide you with a quick heads up! Apart from that, great site!

  16. I blog often and I truly thank you for your information. This great article has really peaked my interest. I am going to bookmark your blog and keep checking for new information about once per week. I subscribed to your RSS feed too.

  17. Oh my goodness! Awesome article dude! Many thanks, However I am encountering difficulties with your RSS. I don’t know the reason why I am unable to subscribe to it. Is there anybody having similar RSS issues? Anyone that knows the solution will you kindly respond? Thanks!!

  18. I absolutely love your blog and find a lot of your post’s to be just what I’m looking for.
    Do you offer guest writers to write content to suit your
    needs? I wouldn’t mind creating a post or elaborating
    on a number of the subjects you write regarding here. Again, awesome

Leave a Reply

Your email address will not be published. Required fields are marked *