te.ma has been inactive since 1 August 2024 until further notice and is in archive mode. All publications are still accessible. However, the comments section is switched off. Existing user accounts will remain accessible until 31 July 2025, new user accounts can no longer be created. Our newsletter will no longer be updated.

SPECIAL INPUT: Mar Castell Erill

AI's Silent Influence: Balancing the Wonders and Limitations in Our Digital Realm

Convolutional neural networks (CNNs) can recognise objects in images – provided their environment does not change. Mar Castell Erill explains the functionality and limitations of CNNs and makes us wonder whether AI applications are not sometimes overestimated.

KI und Nachhaltigkeit

In our modern world, artificial intelligence (AI) quietly plays a crucial role, managing various aspects of our digital lives. It's capable of tasks like diagnosing medical conditions, guiding autonomous vehicles, and even composing symphonies. In many ways, AI has surpassed human abilities. It's integrated into our daily lives, from helping with navigation on our phones to translating languages and responding to voice commands in our homes and places of work. 

Despite AI's remarkable progress, neural networks–a fundamental component of modern AI–still have limitations. They lack common-sense reasoning, empathy and the nuanced understanding of human emotions.1 They're not capable of philosophical pondering or engaging in the creative expression that defines human art. CAPTCHAs are a good example of AI’s limitations.2

Hidden within the complex codes and algorithms of the internet lies a battleground where humans and AI engage in a covert competition. Websites, applications and online platforms utilize CAPTCHAs to distinguish human users from automated ones.3 These challenges, though seemingly simple, are based on intricate science that exposes the vulnerabilities of AI. Consider encountering a CAPTCHA that displays distorted characters, digits or symbols, some of which may not be neatly aligned, appearing askew or even rotated. Human cognition easily adapts to this challenge by mentally adjusting character orientations to decode them. However, for AI, this still presents a significant challenge.4

This is particularly true of convolutional neural networks (CNN), a type of deep learning model that aims to replicate the human visual system's image processing capabilities. CNNs use convolutional layers to extract hierarchical features from images, which makes them highly efficient at recognising patterns, shapes and objects. This fundamental architecture has positioned CNNs as a leading choice for image recognition tasks.5

One of the fundamental challenges associated with CNNs is their sensitivity to changes in an object's surroundings.6 Unlike humans, who can easily identify an object regardless of factors like orientation, background, texture or lighting, CNNs tend to produce incorrect results when presented with inputs that even slightly deviate from their established patterns. CNNs rely on fixed filters designed to detect specific features in an image. When an image is altered, perhaps due to factors such as rotation or blurriness, these features may not be represented in the same way. Consequently, the filters may struggle to recognise them, resulting in reduced accuracy in object recognition.7

The limited robustness of CNNs has significant implications for real-world applications.8 For instance, autonomous vehicles depend on image recognition to navigate. If a system based on CNNs encounters rotated road signs or obstacles, it might not perform well, which could raise safety issues. Likewise, in the field of medical imaging, the difficulty of dealing with rotated scans could impact the accuracy of disease diagnosis. These limitations, to some extent, hinder the widespread adoption of AI in our daily life. 

Researchers are actively exploring strategies to mitigate CNNs' brittleness to common perturbations that humans easily navigate. The most obvious approach involves data augmentation, where training data is enriched with rotated versions of images.9 This exposes the network to a broader range of orientations during training, improving its ability to handle novel images to some extent. Other scholars have opted for customising the network’s architecture such as manipulating the pooling layers, the loss function or even introducing traditional machine learning algorithms for classification.10

Despite the challenges presented by neural networks' limited adaptability to physical perturbations, this limitation can be strategically harnessed for our benefit. This interactive relationship between AI and humans illustrates a mutually beneficial connection, where one's weaknesses can complement the strengths of the other. Beyond their primary security function, CAPTCHAs also serve as a benchmark task for evaluating AI technologies, as noted by von Ahn et al., who observed that ‘any program passing the tests generated by a CAPTCHA can be used to tackle challenging unsolved AI problems’.11

In the intricate dance between humans and AI, we've uncovered AI's remarkable capabilities, as well as its flaws. Interestingly, as we depend more on AI, we've learned to leverage its limitations to the point where distinguishing between human and machine intelligence has become a real challenge. As we navigate the hidden AI that shapes our digital world, we find ourselves in a paradoxical embrace in which technology, often written about, is also the author itself. 

Footnotes
11

J. Alrassi, P. J. Katsufrakis, L. Chandran, L.: Technology Can Augment, but Not Replace, Critical Human Skills Needed for Patient Care. In: Academic Medicine. Volume 96, No. 1, 2020, 37–43. https://doi.org/10.1097/acm.0000000000003733; A. Kerasidou: , A.: Artificial intelligence and the ongoing need for empathy, compassion and trust in healthcare. In: Bulletin of the World Health Organization. Volume 98, No. 4, 2020, 245–250. https://doi.org/10.2471/blt.19.237198 

 R. Gossweiler, M. Kamvar, S. Baluja, S.: What’s up CAPTCHA? In: 2009: Proceedings of the 18th International Conference on World Wide Web. 2009. https://doi.org/10.1145/1526709.1526822; J. Kim, W. Chung, H. Cho: A new image-based CAPTCHA using the orientation of the polygonally cropped sub-images. In: The Visual Computer. Volume 26, No. 6–8, 2010, 1135–1143. https://doi.org/10.1007/s00371-010-0469-3; M. Guerar, L. Verderame, M. Migliardi, F. Palmieri, A. Merlo: Gotta CAPTCHA ’Em All: A Survey of Twenty years of the Human-or-Computer Dilemma. In: ACM Computing Surveys. Volume 54, 2021. https://doi.org/10.1145/3477142 

M. Conti, L. Pajola, P.P. Tricomi: Captcha Attack: Turning Captchas against humanity. In: arXiv. 2022. https://doi.org/10.48550/arxiv.2201.04014 

E. Bursztein, M. Martin, J. C. Mitchell: Text-based CAPTCHA strengths and weaknesses. In: CCS ’11: Proceedings of the 18th ACM Conference on Computer and Communications Security. 2011. https://doi.org/10.1145/2046707.2046724 

 Y. LeCun, K. Kavukcuoglu, C. Farabet: Convolutional networks and applications in vision. In: IEEE. 2010. https://doi.org/10.1109/iscas.2010.5537907; Y. LeCun, Y. Bengio, G. E. Hinton: Deep Learning. In: Nature. Volume 521, No. 7553, 2015, 436–444. https://doi.org/10.1038/nature14539; C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, A. Rabinovich, A.: Going deeper with convolutions. arXiv. 2015. https://doi.org/10.1109/cvpr.2015.7298594 

 I. J. Goodfellow, J. Shlens, C. Szegedy: Explaining and harnessing adversarial examples. In: International Conference on Learning Representations. 2015. https://ai.google/research/pubs/pub43405; A. Nguyen, J. Yosinski, J. Clune: Deep Neural Networks are Easily Fooled: High Confidence Predictions for Unrecognizable Images. In: arXiv. 2014. http://export.arxiv.org/pdf/1412.1897; A. Azulay, Y. Weiss, Y.: Why do deep convolutional networks generalize so poorly to small image transformations? In: arXiv. 2018. http://export.arxiv.org/pdf/1805.12177

T. Cohen, M. Welling: Group equivariant convolutional networks. In: arXiv. 2016 48, 2990–2999.

M. Ozdag, S. Raj, S. L. Fernandes, A. Velasquez, L. L. Pullum, S. K. Jha: On the Susceptibility of Deep Neural Networks to Natural Perturbations. In: International Joint Conference on Artificial Intelligence. 2019; R. C. Maron, J. G. Schlager, S.Haggenmüller, C. von Kalle, J. Utikal, F. Meier, F. F. Gellrich, S. Hobelsberger, A. Hauschild, L. E. French, L. Heinzerling, M. Schlaak, K. Ghoreschi, F. J. Hilke, G. Poch, M. V. Heppt, C. Berking, S. Haferkamp, W. Sondermann, T. J. Brinker: A benchmark for neural network robustness in skin cancer classification. In: European Journal of Cancer. Volume 155, 2021, 191–199. https://doi.org/10.1016/j.ejca.2021.06.047 

S. Dieleman, K. Willett, J. Dambre: Rotation-invariant convolutional neural networks for galaxy morphology prediction. In: Monthly Notices of the Royal Astronomical Society. Volume 450, No. 2, 2015, 1441–1459. https://doi.org/10.1093/mnras/stv632; I. Kandel, M. Castelli, L. Manzoni, L.: Brightness as an augmentation technique for image classification. In: Emerging Science Journal. Volume 6, No. 4, 2022, 881–892. https://doi.org/10.28991/esj-2022-06-04-015 

 T. Cohen, M. Welling: Group equivariant convolutional networks. In: arXiv. 2016, 2990–2999; D. Marcos, M. Volpi, N. Komodakis, D. Tuia: Rotation Equivariant Vector Field Networks. In: arXiv. 2017. https://doi.org/10.1109/iccv.2017.540 

Von L. Ahn, M. Blum, N. Hopper, J. Langford: CAPTCHA: Using hard AI problems for security. In: Lecture Notes in Computer Science. 2003, 294–311. https://doi.org/10.1007/3-540-39200-9_18 

Tags

Related Articles

Deep learning is a category of machine learning methods based on particularly deep artificial neural networks. Deep is defined as any artificial neural network that has at least three layers. While such models were little used at the end of the 20th century due to limited computing capacities, this definition applies to almost every artificial neural network today.

Convolutional neural networks (CNN) form filters to recognise patterns, such as objects in images. Each filter detects the presence of characteristic patterns in the input.

A CAPTCHA (abbreviation for 'Completely Automated Public Turing test to tell Computers and Humans Apart') is usually a simple visual test or puzzle designed to determine whether an online user is really a human and not a bot. A human can complete it without much difficulty, but an automated programme cannot understand it.

Artificial neural networks (ANN) are mathematical functions inspired by the functioning of nerves in our brain. Here, signal processing between individual 'artificial neurons' is simulated. Each ANN consists of several layers of these neurons. Incoming signals, such as pixels in an image, are passed from layer to layer until they produce an output - 1 or 0, cat or dog - in the final layer. Multiple layers can approach more complex functions in this process.

Artificial neural networks are widely used in image processing to recognise objects in images. Time series, i.e., data with a temporal dimension, can also be processed with artificial neural networks to complete or continue them. For example, algorithms for language processing or weather forecasting are based on artificial neural networks. A great explanation series on artificial neural networks can be found on YouTube, at https://www.youtube.com/playlist?list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi

Discussions
0 comments
There are new comments!
Te.ma is in archive mode and new comments are disabled.

No comments yet

te.ma does not use cookies. To comply with the General Data Protection Regulation (GDPR), we however have to inform you that embedded media (e.g. from YouTube) may use cookies. You can find more information in our privacy policy.