The pervasiveness of digital solutions in learning, work, culture and social life poses challenges for application and website developers to make digital products accessible to as many people as possible. One such challenge is adapting content to the needs of people with disabilities. Solutions currently on the market are still inadequate and costly, as they largely require manual input of additional descriptions and markings. Sages' innovative project, using artificial intelligence, automates the process of adapting documents and websites to accessibility standards in accordance with WCAG guidelines.
WCAG*(Web Content Accessibility Guidelines*) is a set of guidelines for web content accessibility. In this document we can find recommendations on how to create websites and applications so that people with disabilities such as vision, hearing, movement or intellectual disabilities can use them. In Poland, the promotion of the WCAG standard is handled by the Accessible Cyberspace Forum, which includes various organizations, such as the Widzialni Foundation.
Problems with accessibility of websites, even for public entities
The Widzialni Foundation's statutory goal is to prevent digital and social exclusion, i.e. to enable all citizens to have free access to Internet resources regardless of their age, disability, wealth, hardware and software. Every year, the Visible Foundation prepares a report on the accessibility of public administration services. According to the latest survey, in 2020 the level of accessibility of public entities' websites was only 58%, which, as we can read on the foundation's website, is not satisfactory1
, especially since these entities are obliged to this accessibility by the 2019 law2
What is the reason for this?
The low accessibility rate can be attributed to the long lack of automated solutions on the market that make it possible to easily and quickly adapt a document or service to accessibility requirements. Such a mechanism facilitates the work of editors and developers of websites, and, more importantly, spreads the practice of creating barrier-free websites for people with disabilities.
According to statistics available on the Open Doors Association's portal, 10-15% of citizens of European Union countries are people with various types of disabilities3
, so work on automating these processes is extremely important.
What is the use of the Internet by people with disabilities?
How do people with disabilities use the web and what barriers do they face? To answer this question we asked Sebastian Depta, Digital Accessibility Specialist at the Visible Foundation, who is very familiar with the problems of accessibility of websites and documents, not only because of his position, but also from his own experience, as he is a visually impaired person.
As Sebastian Depta explains - there are a number of programs that facilitate the reading of digital content for visually impaired and blind people. First and foremost are screen readers, which are computer programs that recognize and interpret information displayed on a computer monitor, and then present it to the user in voice form or are sent to a Braille output device. Screen readers are a form of assistive technology used by people who are blind, visually impaired, deaf-blind or have learning disabilities, among others.
There are many readers available on the market, depending on the operating system or device. For example, for Windows there are free tools - NVDA, Windows Narrator and paid tools - JAWS, ZoomText Fusion, for Apple - VoiceOver, on Android - VoiceAssistance or TalkBack, on Linux - ORCA or ChromeVox application in Google Chrome browser.
Whether a blind person can take advantage of the available tools to make web content easier to read is determined by the preparation of the website and its compliance with accessibility standards.
"If someone neglects Web accessibility or doesn't think about it at all, I won't get to the content," Sebastian Depta explains in the publication "Accessible Multimedia "4
The most common barriers to using the Web detailed by Sebastian Depta are:
- links without clear descriptions
- form controls without associated labels or without any labels at all
- graphics without alternative descriptions or with ambiguous descriptions
- improper header structure
- problems with the logical reading order of content
- code problems, which can cause readers to have trouble reading a given page or through errors stop working
- improper contrast
- access from the keyboard
Of course, here one can list many problems, depending on the needs that a disabled person has. One can talk here about the lack of awareness of site developers and editors when it comes to the needs of people with disabilities, but lately this is changing a bit for the better. Personally, however, I notice another problem that stems not from a lack of awareness, but from the wrong approach to solutions - wrong thinking, misinterpretations, etc. which can cause even more problems than before.
In the case of documents, it is similar. It is increasingly rare to find scans, but you can still find sites where such documents are made available. Another is documents that have been improperly exported to PDF format and have many errors that make it difficult to read, such as by not using mid-headings, describing links, graphics, proper implementation of data tables, etc. All this contributes, to the fact that reading such documents is cumbersome/difficult - it requires concentration or one has to spend too much time to find or retrieve a particular piece of information.
As we learn from the publication "Accessible Multimedia" - a common problem and barrier to using the web for people with disabilities is also too difficult content. This applies not only to people with intellectual disabilities, but also, among others, to deaf people, who may have trouble understanding content in Polish, as it is their second language, after sign language.
Adapting sites to accessibility requirements requires manual markings
Sebastian Depta explains how websites are adapted for people with disabilities:
Most often, if a site already exists, this is done through an audit. Experts typify sample sites, then analyze them against the WCAG 2.1 standard at the AA level. Then a report is prepared, which includes a list of errors that were found and presented, there are examples of solutions with links to resources on how the problem can be solved. Such a report is handed over to the creator of the site or application, and here begins the work of developers, who implement the fixes, according to the recommendations.
The second way is that during the creation of the site, the WCAG standard is already taken into account and a given solution is created so that it is digitally accessible.
In short, appropriate corrections are made to the page's source code so that the solution in question complies with the guidelines of the WCAG standard.
This is similar for digital documents. You have to manually describe the various sections of the document using the appropriate HTML/PDF tags - this is an expensive and very time-consuming task - editing one document can even take a whole day.
Why is it important to adapt electronic pages and documents to accessibility standards?
The first and most important point to note is that by removing digital barriers, people with special needs can use websites or documents at a similar level. This provides tremendous opportunities, among other things, to acquire information, in particular to acquire additional competencies, which makes such people have a better chance in the open labor market. Thanks to accessible websites, mobile applications or accessible documents, they can educate themselves, develop their passions, etc. - is the second important element. Another is, for example, dealing with official matters, shopping or using e-banking - I can't imagine nowadays not to make a transfer through an inaccessible site (e.g. pay phone bills, Internet, etc.).
Another important argument is active participation in cultural life. Thanks to accessibility, such people have easier access to films, plays, and can get acquainted with the artist's work (descriptions of paintings, sculptures, etc.)," explains Sebastian Depta.
New Sages project makes it easier for people with disabilities to use the web and digital documents
The aim of the project was to produce an innovative solution that enables automatic adaptation of text and text-graphic documents to accessibility standards, as specified in WCAG 2.1.5
The solutions proposed by Sages, using deep neural networks, OCR software and image analysis methods, determine the structure of the document based on visual features and metadata, automatically generate descriptions of photos, tables, charts and drawings in natural language (Polish, adaptable to other languages, especially English) and recognize these objects in context.
Solutions developed for documents are also applied to websites and applications.
Recognizing document structure and segmenting objects
At the outset, identifying the structure of documents is crucial. Recognizing the correct reading order of such elements as headings, text block (also in multi-page or poster layout), pagination, footnotes, charts, tables, illustrations, etc. - also in documents that are accessible as an image (scan, photo) - allows people with disabilities to navigate freely through the document using text-to-speech tools, for example. The learned model of predicting elements in text documents allows scrolling through text, skipping specific information, moving to the next point or the next table.
Elements in a picture interpreted in context
The WCAG guidelines make it mandatory to use text equivalents for all non-text content. To achieve this on a large scale, automated solutions were needed.
The fact that an algorithm is able to recognize an object in a photo is no longer a surprise to anyone. On the other hand, creating a proper description of a photo in natural language is still not obvious. The problem may be the multifaceted nature of the information conveyed by the image. Photos often contain many elements that can be interpreted in different ways.
Thus, the solution to effectively replace the image with text required not only recognizing the elements in the image, but also deciding on the most relevant relationships between these elements in context. The Sages project envisioned a mechanism that - thanks to an architecture based on deep neural networks - would identify the elements in an image, interpret the scene and generate the textual equivalent of the key information contained in the image in the context of the document's content.
What does this mean? Instead of a lengthy description of individual elements, the algorithm interprets what information is meant in the context of the entire text, taking into account adjacent elements and the original text caption. This allows the generated description to be short and succinct and, according to WCAG guidelines, to serve the same function as the image.
In addition, on one of the elements of the work there was normalization of the image - correction of brightness and contrast, which makes it easier to read the photo also for the visually impaired.
The description of the graph, figure and table is still a challenge for the algorithm/computer(?)
Adapting charts, tables and drawings to accessibility requirements requires a different kind of mechanism than for photo interpretation. For comparison, the mechanism for describing a photo should generate for a chart a description like "A series of green bars of different lengths on a white background," while the procedure for describing charts is to generate a description like "Bar chart. On the X-axis the age ranges are shown, on the Y-axis the population. For the value '30-40 years' the population is the most numerous and is ... people".
Interpretation of the chart requires a description that captures key information and conclusions, and thus semantic interpretation - the relationship between the various elements and recognition of the data. The method developed by Sages allows the generation of textual equivalents that are the equivalent of human interpretation of a graph, so not only identifying the axes of the graph, their descriptions, scales, units and values of the data series, but noting the key features of the data presented, for example, inflection points, minimum, maximum, trend, etc.
A neural network architecture subjected to a learning process using metadata (title, axis names, etc.) was used for automatic generation of chart and table descriptions. In the case of tables, on the other hand, an Encoder-Decoder network was learned.
Another important element of the project was the adaptation of available text generation methods to the specifics of the Polish language.
Automatic identification of elements requiring clarification
The WCAG guidelines indicate the need to make it easier for people with smaller vocabularies by linking them to definition links, e.g. in the case of specialized vocabulary, jargon or abbreviations. This makes digital documents and websites easier to understand for people who are deaf, poorly educated or mentally handicapped.
At this stage of the work, the Sages team focused on creating an algorithm that automated the process of linking difficult words to their definitions. The key here was to identify which words needed clarification. Sages, a Polish version of the Wordnet-type ontology, was used as a definition dictionary. It also involved a complex process of Word Sense Disambiguation.
For this purpose, a model based on unsupervised learning was used, which is characterized by the ability to apply to words previously unseen by the algorithm. Previous solutions to the effectiveness of the algorithm for combining concepts with definitions have focused on disambiguating words in closed sets. Sages, on the other hand, has introduced a new model in which the algorithm effectively identifies and assigns definitions in previously unanalyzed texts.
Automation of other WCAG 2.1 guidelines.
In further phases of the work, the other guidelines identified by WCAG were automated. Specifically:
- introduction of methods that ensured that the viewer could read the content without using color information of individual elements, which is just as valuable for those observing color differences
- generation of natural language description of references and links in the text
- identification of languages present in the document
All the developed methods have been implemented into the application, which on input receives a document in the form of a set of image files PDF, DOCX, EPUB, MOBI, etc., and returns a PDF document annotated according to WCAG 2.1 guidelines. It also works in software-as-a-service mode, i.e. software available on the Web.
Sages solutions in practice
The mechanism reads both scientific publications and infographics or memes
Due to the development of charts, tables or drawings, the mechanism is applicable to scientific publications and official documents. However, its potential does not end with the annotation of such files.
Basically, this mechanism is needed in very many places, not only in the context of documents, but also on Twitter or Facebook, or on websites and web applications. This spectrum of applications is very wide," points out Lukasz Kobylinski, CSO of Sages and project manager.
In the era of image culture and social media, where information is often conveyed through infographics and images (cf. the Ministry of Health's Facebook messages), many people are excluded from receiving this type of content. To remove digital barriers for people with disabilities is to increase accessibility to culture and social life, the acquisition of skills and education, and work.
Why did we do this? Why Sages?
We have experience in the technologies and methods needed to accomplish this task. I am confident that we have assembled the best possible team. This project has the potential to change the reality of many people who, until now, have not had access to information or have had this access made difficult. I am glad that we are carrying this out and I hope that it will actually translate into real change for these people. This is an additional value of this project - Lukasz Kobylinski
In addition to the social dimension, the Sages solution also has great business potential. The project is applicable, for example, in a commercial document annotation service.
Sages sees the future in artificial intelligence
This is not the company's first such innovative project. Sages works with institutions, universities and companies that have a vision of changing the world using machine learning and artificial intelligence.
Sages' portfolio includes projects such as customer service based on artificial intelligence, creating a voice assistant for e-commerce or optimizing the Big Data processing pipeline, as well as Omega-PSIR software, which is the No. 1 scientific information system in Poland and No. 5 in the world when it comes to the number of software implementations of this type.
Sages is made up of specialists with experience in natural language processing, image processing, system security and data science. They combine research with business practice and help transform an idea or research project to bring it to market. The company is open to challenges and welcomes collaboration on NLP and AI/ML projects.
- M. Szczygielska, Accessible Multimedia, https://www.widzialni.org/container/Dostepne-multimedia.pdf