As it stands, flaws in the development and implementation of these technologies could be doing more harm than good. Long-held biases can easily be transmitted to machines we assume to be impartial, and the current lack of regulation around GAI can facilitate such ‘in-built’ discrimination while also creating an opaque usage landscape in which those who engage with such tools are often unaware of it. One worrying way in which this manifests is through the widespread adoption of AI-powered Applicant Tracking Systems in modern recruitment processes.
What is ATS Software and why is it problematic?
ATS have been widely used since the early 2000s, with
Let’s begin with the lack of transparency around the usage of these tools. While they are a standard part of the recruitment process, this isn’t yet common knowledge among applicants. Naturally, tech professionals are required to be familiar with the uses and capabilities of AI and keep up to date with its progress, but ATS is rarely a topic discussed outside recruitment or tech-centric professions. In addition, it isn’t obligatory to indicate in a job advertisement that one of these tools is being used by the recruiting team, meaning that jobseekers can be evaluated by complex data-parsing tools without their knowledge. Considering the recent proliferation and availability of such systems, as well as their possible influence on the outcome of hiring processes, this lack of transparency is alarming.
Assuming candidates are aware that ATS software is being used to evaluate their information, it’s still unclear to what extent. Is it simply to highlight words and skills mentioned in the role requirements? Is it also recognising certain phrasal patterns or figures of speech? How specific is it? If a word has been used in an unexpected context or is perhaps a synonym for one they are looking for, is it still highlighted?
To answer these questions, let’s look again at how these systems operate. Job descriptions are fed into ATS software to provide a list of keywords and skills that can be used as a scoring framework for applications (e.g. if the job description includes the word “Excel” the system may give one point to each application that includes that word and so on). The language used in this description dictates what language will be deemed ‘correct’ by the , which means that word specificity is extremely important. Synonyms are not taken into account, and even tiny differences in phrasing can deem a key skill incorrect in the eyes of the machine. Not only this, but the formatting of a resumé can also throw off the parsing system completely. Every document processed by ATS is converted to plain text, meaning uncommon fonts, mildly creative designs, and even the use of colour can make a document completely unreadable.
However, the true cost of this automation is far deeper and more systemic than simple issues of stylistic recognition and transparency. One chilling example is a case from 2018, in which Amazon was widely criticised for its use of an internally developed ATS model proven to actively perpetuate a gender bias against female applicants.
To better understand how this happened, let’s break down what Machine Learning (ML) models are, how they’re developed, and by whom. In essence, these models are a combination of sophisticated algorithms that have been trained to recognise patterns. These patterns are then used to predict future outcomes based on what has happened in the past. For these patterns and predictions to be accurate, vast amounts of data must be fed into the algorithms. What’s key to remember here is that the data being used for training ultimately determines the functionality (and biases) of the model.
When we consider the drastically skewed gender demographics in Science, Technology, Engineering and Mathematical fields (STEM) which produce these models, it becomes clear why gender bias can be such a pervasive issue in these models. In an industry that has historically been male-dominated, the vast majority of available data will have been written, researched and/or recorded by men. Not only this, but those responsible for using this data to develop and train AI models are also overwhelmingly likely to be male. According to Zippia, women make up just 20.4 percent of data scientists in the United States.
To summarise: if a model is being trained by male scientists using primarily male data or subjects, then this model has been optimized for male input. Josh Feast of The Harvard Business Review describes how this problem manifested itself in sensory assistance technology: ‘…text-to-speech technology (e.g., Stephen Hawking’s voice) and automatic speech recognition … (e.g., closed captioning) – performed poorly for female speakers as compared to males. This is attributed to the fact that the way the speech was analyzed and modeled was more accurate for taller speakers with longer vocal cords and lower-pitched voices. As a result, speech technology was most accurate for speakers with these characteristics – which are typically males – and far less accurate for those with higher pitched voices – which are typically female.’
In the Amazon case, the ATS model had been ‘trained to vet applicants by observing patterns in resumés submitted to the company over a 10-year period. Most came from men, a reflection of male dominance across the tech industry.’ As a result of this training technique, ‘Amazon’s system taught itself that male candidates were preferable’, filtering out or downgrading applications which contained words the system wasn’t familiar with, such as ‘women’s’ (e.g., ‘Women’s chess club captain’). It even went so far as to ‘downgrade graduates of two all-women’s colleges’.
This is strikingly similar to issues with other engineered designs throughout history, most notably that of crash-test dummies used to test car safety systems. According to a Stanford University study, crash test dummies engineered to resemble and mimic the responses of ‘the average male body’ are most commonly used for testing, while a ‘dummy that models female-typical injury tolerance, biomechanics, spinal alignment, neck strength, muscle and ligament strength, dynamic responses to trauma, and other female-typical characteristics’ has yet to be developed.
Outlook and possible solutions
In what ways can this issue be tackled? One long-term solution could lie in altering the demographics of those behind the engineering and training of AI models, through intentional diversification of the Data Science and Machine Learning fields. This would require considerable funding to facilitate access programs aimed at cisgender female, transgender and non-binary people, people of colour, and other minority groups. It would also necessitate widespread awareness programmes, to educate data scientists, engineers and developers about the need to ensure that training data is diverse enough for natural language models to familiarise themselves with patterns of speech that do not fit the hegemonic standard.
While the above acts on root causes, the symptoms of the problem could be greatly reduced in the medium term by systems of regulation and standardisation. This would involve creating independent regulatory bodies to develop a set of standards for the training protocol of Machine Learning models, based on diverse training data and rigorous testing for bias. Indeed, the European Union is currently working on an ‘Artificial Intelligence Act’ that would attempt to regulate some aspects of how these technologies are being used. The Act stipulates that “Generative AI, like ChatGPT, would have to comply with transparency requirements” such as “disclosing that the content was generated by AI” and “publishing summaries of copyrighted data used for training”.
These regulations would make it a legal requirement for employers to disclose details of ATS usage in job descriptions, allowing candidates to make informed decisions about how to structure their applications. It would also require applicants to disclose their usage of ChatGPT and other AI-powered language tools in writing their applications, a practice that is now extremely common too. The issue of opacity would be largely resolved if these regulations make it through the European Parliament unscathed.
This, however, is not a foregone conclusion. Transparency regulations inevitably raise questions around data protection, a concept that is deeply enshrined in European policy. Furthermore, the internet is a borderless entity, and there are questions as to what extent European regulatory frameworks can dictate the behaviour of non-European companies. It would be hard to imagine the United States, home base and tax residence of so many global tech industry leaders, permitting any kind of regulation that could have a real-world impact on the activity of this industry. Due to these complexities and many others, it may take several years for such a framework to be implemented.
All the more pressing, then, is the need for immediate solutions. Speaking on a purely practical level, the issue can be boiled down to one of language; the language used by applicants to describe their skills and experience and the language used by employers and recruiters to define roles and ideal candidates, as well as with the training data used to educate ATS models. The goal must be to allow for flexibility and diversity in language patterns, which in turn would diversify the demographic of highly-scoring applicants selected by these systems.
For recruitment teams to achieve this, training examples (in this case, job descriptions) would have to be run through a natural language processor to produce a reasonable number of synonymous examples using a variety of words and speech patterns, before being fed into ATS. Tools such as Gender Decoder, which checks texts for linguistic gender coding, could also be used to neutralise the inequality. However, this would require a manual step in an otherwise fully automated process and, aside from being simply impractical, incorrectly places the burden of responsibility on the ATS user to mitigate the failings of its software.
These failings must instead be addressed at the source. Not only is it a question of integrating language neutralisers and decoders; training practices must also be questioned. For example, models with pre-defined ‘ideal candidate’ templates for specific roles are trained using data from historically successful applicants, a practice that operates as a self-perpetuating anti-diversity loop. Basing the hallmarks of success on historic examples ensures that hiring processes remain firmly stuck in the past, strengthening affinity bias and rejecting fresh perspectives by default.
Additionally, every ATS on the market–and there are many–has been trained and developed by a different team of individuals, each of which defines a different set of criteria to be sought out during the hiring process. These criteria can be hard-coded into the software in the form of fixed rules (e.g., higher scores for top universities) which apply to every role, vary from one ATS to the next, and act as a representation of what the developers themselves consider desirable in a candidate. Such practices create huge potential for built-in bias and enable each ATS company to have a direct effect on the outcome of their client’s hiring processes in a way that could easily become problematic. Risk areas such as these require developers to implement a rigourous system of checks and integrations designed to catch potential cracks through which bias might infiltrate the machine.
To ensure that this powerful technology continues to help rather than hinder our society, work must be done by policy makers, industry leaders and developers alike to ensure we are using it responsibly. It’s naive to assume that Artificial Intelligence is a catch-all solution for the fallibility of the human experience. Machine Learning models do not simply materialise through divine manifestation: they are built and trained by human scientists, people with their own deeply held beliefs and opinions that cannot fail to affect their work. As Josh Feast so succinctly puts it: ‘Any examination of bias in AI needs to recognise the fact that these biases mainly stem from humans’ inherent biases. The models and systems we create and train are a reflection of ourselves.’