te.ma has been inactive since 1 August 2024 until further notice and is in archive mode. All publications are still accessible. However, the comments section is switched off. Existing user accounts will remain accessible until 31 July 2025, new user accounts can no longer be created. Our newsletter will no longer be updated.

SPECIAL INPUT: Katy Gaffney

Efficient Innovation or Diversity Disaster? A Critical Examination of ATS Software in Contemporary Recruitment Processes

When it comes to recruiting, companies are increasingly leaving applicant selection to AI-powered applicant tracking systems. Katy Gaffney explains how these systems work and the dangers they pose. A Special Input on algorithm-based discrimination in hiring practices and the need for legal regulation.  

KI und Nachhaltigkeit

In 2023, Generative Artificial Intelligence (GAI) broke into the mainstream. It’s often referenced in social and entertainment media. We’re having conversations about it at parties and in the workplace, and everywhere you look are advertisements for all kinds of seemingly time-saving efficiency tools “powered by AI”. It’s permeated the zeitgeist so deeply that in many ways, it feels like a completely new invention. This both is and is not the case. Scientists witnessed their first massive breakthrough in Natural Language Processing in 2010, and have been working on Large Language Model development since 2017, but it has only been since 2022 that the average user can directly experience the fruit of these labours (via Conversational Large Language Foundational Models such as ChatGPT).1 In the context of human technological advancement, twelve years is a breathtakingly short amount of time in which to produce a technology of this magnitude; therefore, it stands to reason that there may be some flies in the ointment. 

As it stands, flaws in the development and implementation of these technologies could be doing more harm than good. Long-held biases can easily be transmitted to machines we assume to be impartial, and the current lack of regulation around GAI can facilitate such ‘in-built’ discrimination while also creating an opaque usage landscape in which those who engage with such tools are often unaware of it. One worrying way in which this manifests is through the widespread adoption of AI-powered Applicant Tracking Systems in modern recruitment processes.

What is ATS Software and why is it problematic?

ATS have been widely used since the early 2000s, with Natural Language Processing (NLP) elements having been introduced over the past two years. It’s now common practice within the recruitment landscape to employ one of the ATS available on the market or to develop a version of one's own, most notably within the tech industry and among larger corporations. According to Jobscan’s 2023 ATS Usage Report, these systems are currently used by an impressive 487 of the Fortune 500 companies.2 So how do these systems operate? In a nutshell, modern ATS compile a database of resumés and highlight those that are considered suitable or worthy of further review based on keywords that are stipulated by the recruiter.3 NLP has streamlined this by allowing the recruiter to automate this process. The job description is scanned as an ideal example, after which the software parses a selection of applications to evaluate them against the ideal. Each application is scored based on its similarity to the job description and those with the highest scores are collected for further review by the human recruiter.4 With the help of Machine Learning technology, it’s now also possible for developers to pre-load an ATS tool with examples of the ideal candidate for a vast number of roles. In theory, this seems like a perfectly logical and supremely useful technology. Perhaps one to even remove human bias at the initial level. What could possibly go wrong?

Let’s begin with the lack of transparency around the usage of these tools. While they are  a standard part of the recruitment process, this isn’t yet common knowledge among applicants. Naturally, tech professionals are required to be familiar with the uses and capabilities of AI and keep up to date with its progress, but ATS is rarely a topic discussed outside recruitment or tech-centric professions. In addition, it isn’t obligatory to indicate in a job advertisement that one of these tools is being used by the recruiting team, meaning that jobseekers can be evaluated by complex data-parsing tools without their knowledge. Considering the recent proliferation and availability of such systems, as well as their possible influence on the outcome of hiring processes, this lack of transparency is alarming.

Assuming candidates are aware that ATS software is being used to evaluate their information, it’s still unclear to what extent. Is it simply to highlight words and skills mentioned in the role requirements? Is it also recognising certain phrasal patterns or figures of speech? How specific is it? If a word has been used in an unexpected context or is perhaps a synonym for one they are looking for, is it still highlighted? 

To answer these questions, let’s look again at how these systems operate. Job descriptions are fed into ATS software to provide a list of keywords and skills that can be used as a scoring framework for applications (e.g. if the job description includes the word “Excel” the system may give one point to each application that includes that word and so on). The language used in this description dictates what language will be deemed ‘correct’ by the , which means that word specificity is extremely important. Synonyms are not taken into account, and even tiny differences in phrasing can deem a key skill incorrect in the eyes of the machine. Not only this, but the formatting of a resumé can also throw off the parsing system completely. Every document processed by ATS is converted to plain text, meaning uncommon fonts, mildly creative designs, and even the use of colour can make a document completely unreadable.5 In this way, widespread use of ATS software actively discourages creativity and individuality in both written and visual communication, automatically filtering out perfectly adequate or even outstanding applicants due to arbitrary factors such as formatting and style. In a working world that claims to champion diversity and ‘outside-the-box thinking’, this is already reason enough to question the functionality of such software. 

However, the true cost of this automation is far deeper and more systemic than simple issues of stylistic recognition and transparency. One chilling example is a case from 2018, in which Amazon was widely criticised for its use of an internally developed ATS model proven to actively perpetuate a gender bias against female applicants.6

To better understand how this happened, let’s break down what Machine Learning (ML) models are, how they’re developed, and by whom. In essence, these models are a combination of sophisticated algorithms that have been trained to recognise patterns. These patterns are then used to predict future outcomes based on what has happened in the past. For these patterns and predictions to be accurate, vast amounts of data must be fed into the algorithms. What’s key to remember here is that the data being used for training ultimately determines the functionality (and biases) of the model.7 

When we consider the drastically skewed gender demographics in Science, Technology, Engineering and Mathematical fields (STEM) which produce these models, it becomes clear why gender bias can be such a pervasive issue in these models. In an industry that has historically been male-dominated, the vast majority of available data will have been written, researched and/or recorded by men. Not only this, but those responsible for using this data to develop and train AI models are also overwhelmingly likely to be male. According to Zippia, women make up just 20.4 percent of data scientists in the United States.8

To summarise: if a model is being trained by male scientists using primarily male data or subjects, then this model has been optimized for male input. Josh Feast of The Harvard Business Review describes how this problem manifested itself in sensory assistance technology: …text-to-speech technology (e.g., Stephen Hawking’s voice) and automatic speech recognition … (e.g., closed captioning) – performed poorly for female speakers as compared to males. This is attributed to the fact that the way the speech was analyzed and modeled was more accurate for taller speakers with longer vocal cords and lower-pitched voices. As a result, speech technology was most accurate for speakers with these characteristics – which are typically males – and far less accurate for those with higher pitched voices – which are typically female.’9

In the Amazon case, the ATS model had been ‘trained to vet applicants by observing patterns in resumés submitted to the company over a 10-year period. Most came from men, a reflection of male dominance across the tech industry.’ As a result of this training technique, Amazon’s system taught itself that male candidates were preferable’, filtering out or downgrading applications which contained words the system wasn’t familiar with, such as ‘women’s’ (e.g., ‘Women’s chess club captain’). It even went so far as to ‘downgrade graduates of two all-women’s colleges’.10

This is strikingly similar to issues with other engineered designs throughout history, most notably that of crash-test dummies used to test car safety systems. According to a Stanford University study, crash test dummies engineered to resemble and mimic the responses of ‘the average male body’ are most commonly used for testing, while a ‘dummy that models female-typical injury tolerance, biomechanics, spinal alignment, neck strength, muscle and ligament strength, dynamic responses to trauma, and other female-typical characteristics’ has yet to be developed.11 This has led to women being far more at risk of severe injury than men in the event of automobile collisions. Similarly in film photography, dark-skinned subjects have routinely been less visible, blotchy or inaccurately represented, due to cameras and film having been designed by and optimised for people with lighter skin tones. More recently, this issue has developed into a digital imaging technology bias, where darker faces are often unrecognisable to facial recognition software.12 What can be seen in these examples is a reflection of the inventors themselves in their creations. When engineers and developers are an overwhelming representation of just one section of society, the experiences of wider society are often disregarded, making it abundantly clear that the issue of diversity in STEM fields cannot be understated.

Outlook and possible solutions

In what ways can this issue be tackled? One long-term solution could lie in altering the demographics of those behind the engineering and training of AI models, through intentional diversification of the Data Science and Machine Learning fields. This would require considerable funding to facilitate access programs aimed at cisgender female, transgender and non-binary people, people of colour, and other minority groups. It would also necessitate widespread awareness programmes, to educate data scientists, engineers and developers about the need to ensure that training data is diverse enough for natural language models to familiarise themselves with patterns of speech that do not fit the hegemonic standard.

While the above acts on root causes, the symptoms of the problem could be greatly reduced in the medium term by systems of regulation and standardisation. This would involve creating independent regulatory bodies to develop a set of standards for the training protocol of Machine Learning models, based on diverse training data and rigorous testing for bias. Indeed, the European Union is currently working on an ‘Artificial Intelligence Act’ that would attempt to regulate some aspects of how these technologies are being used. The Act stipulates that “Generative AI, like ChatGPT, would have to comply with transparency requirements” such as “disclosing that the content was generated by AI” and “publishing summaries of copyrighted data used for training”.13 Such transparency would make it far easier to identify and correct biases should they arise. 

These regulations would make it a legal requirement for employers to disclose details of ATS usage in job descriptions, allowing candidates to make informed decisions about how to structure their applications. It would also require applicants to disclose their usage of ChatGPT and other AI-powered language tools in writing their applications, a practice that is now extremely common too. The issue of opacity would be largely resolved if these regulations make it through the European Parliament unscathed. 

This, however, is not a foregone conclusion. Transparency regulations inevitably raise questions around data protection, a concept that is deeply enshrined in European policy. Furthermore, the internet is a borderless entity, and there are questions as to what extent European regulatory frameworks can dictate the behaviour of non-European companies. It would be hard to imagine the United States, home base and tax residence of so many global tech industry leaders, permitting any kind of regulation that could have a real-world impact on the activity of this industry. Due to these complexities and many others, it may take several years for such a framework to be implemented. 

All the more pressing, then, is the need for immediate solutions. Speaking on a purely practical level, the issue can be boiled down to one of language; the language used by applicants to describe their skills and experience and the language used by employers and recruiters to define roles and ideal candidates, as well as with the training data used to educate ATS models. The goal must be to allow for flexibility and diversity in language patterns, which in turn would diversify the demographic of highly-scoring applicants selected by these systems. 

For recruitment teams to achieve this, training examples (in this case, job descriptions) would have to be run through a natural language processor to produce a reasonable number of synonymous examples using a variety of words and speech patterns, before being fed into ATS. Tools such as Gender Decoder, which checks texts for linguistic gender coding, could also be used to neutralise the inequality. However, this would require a manual step in an otherwise fully automated process and, aside from being simply impractical, incorrectly places the burden of responsibility on the ATS user to mitigate the failings of its software.

These failings must instead be addressed at the source. Not only is it a question of integrating language neutralisers and decoders; training practices must also be questioned. For example, models with pre-defined ‘ideal candidate’ templates for specific roles are trained using data from historically successful applicants, a practice that operates as a self-perpetuating anti-diversity loop. Basing the hallmarks of success on historic examples ensures that hiring processes remain firmly stuck in the past, strengthening affinity bias and rejecting fresh perspectives by default.

Additionally, every ATS on the market–and there are many–has been trained and developed by a different team of individuals, each of which defines a different set of criteria to be sought out during the hiring process. These criteria can be hard-coded into the software in the form of fixed rules (e.g., higher scores for top universities) which apply to every role, vary from one ATS to the next, and act as a representation of what the developers themselves consider desirable in a candidate. Such practices create huge potential for built-in bias and enable each ATS company to have a direct effect on the outcome of their client’s hiring processes in a way that could easily become problematic. Risk areas such as these require developers to implement a rigourous system of checks and integrations designed to catch potential cracks through which bias might infiltrate the machine. 

To ensure that this powerful technology continues to help rather than hinder our society, work must be done by policy makers, industry leaders and developers alike to ensure we are using it responsibly. It’s naive to assume that Artificial Intelligence is a catch-all solution for the fallibility of the human experience. Machine Learning models do not simply materialise through divine manifestation: they are built and trained by human scientists, people with their own deeply held beliefs and opinions that cannot fail to affect their work. As Josh Feast so succinctly puts it: ‘Any examination of bias in AI needs to recognise the fact that these biases mainly stem from humans’ inherent biases. The models and systems we create and train are a reflection of ourselves.’14

Footnotes
14

Sydney Myers: 2023 Applicant Tracking System (ATS) usage report: Key shifts and strategies for job seekers. In: Jobscan. 5 October 2023, retrieved on 27 October 2023.

A brief history of ATS. HireAbility Resume Parsing and Job Parsing Solutions, 28 December 2022, retrieved on 27 October 2023. 

How does an applicant tracking system work? Leading AI Recruitment Software, 19 April 2023, retrieved on 27 October 2023. 

R. Borsellino: Get your resume past the robots and into human hands. In: The Muse. 31 August 2023, retrieved on 01 November 2023.

Jeffrey Dastin: Amazon scraps secret AI recruiting tool that showed bias against women. In: Reuters. 10 October 2018, retrieved on 27 October 2023.

Machine learning models: What they are and how to build them. Coursera, 2023, retrieved on 27 October 2023. 

Data scientist demographics and statistics [2023]: Number of data scientists in the US. Data Scientist Demographics and Statistics [2023]: Number Of Data Scientists In The US, 21 July 2023, retrieved on 27 October 2023. 

Josh Feast: 4 ways to address gender bias in AI. In: Harvard Business Review. 8 October 2020, retrieved on 27 October 2023. 

Jeffrey Dastin: Amazon scraps secret AI recruiting tool that showed bias against women. In: Reuters. 10 October 2018, retrieved on 27 October 2023. 

L. Schiebinger, I. Klinge, I. Sánchez de Madariaga, H.Y. Paik, M. Schraudner, M. Stefanick (Eds.): Gendered Innovations in Science, Health & Medicine, Engineering and Environment. Stanford University, 2011-2021.

Sarah Lewis: The racial bias built into photography. In: The New York Times. 25 April 2019.

European Parliament: EU AI act: First regulation on artificial intelligence: News: European parliament. 14 June 2023, retrieved on 27 October 2023. 

J. Feast: 4 ways to address gender bias in AI. In: Harvard Business Review. 8 October 2020.

Tags

Related Articles

Generative foundation models are particularly large AI systems that are initially trained on huge amounts of unlabeled sample data to generate high-quality text or images in general. From here, the base AI can still be specialized to a very specific task or behavior. In addition, there is a second training phase with feedback: For example, one could have a text AI generate text and give it negative feedback whenever the text produced contains racist viewpoints.

National Language Processing (NLP) has developed at the interface of linguistics and computer science and deals with the automatic processing of natural language. Algorithms from computational linguistics are already used in many areas of everyday life: The spectrum ranges from spelling correction in text editors to automatic translations and language assistants on smartphones to chatbots such as ChatGPT. In addition, the field around NLP also questions the theoretical possibilities and limitations of machine-aided language processing.

Discussions
0 comments
There are new comments!
Te.ma is in archive mode and new comments are disabled.

No comments yet

te.ma does not use cookies. To comply with the General Data Protection Regulation (GDPR), we however have to inform you that embedded media (e.g. from YouTube) may use cookies. You can find more information in our privacy policy.