Log in
E-mail
Password
Remember
Forgot password ?
Become a member for free
Sign up
Sign up
New member
Sign up for FREE
New customer
Discover our services
Settings
Settings
Dynamic quotes 
OFFON

MarketScreener Homepage  >  Equities  >  Euronext Bruxelles  >  Global Graphics PLC    GLOG   GB00BYN5BY03

GLOBAL GRAPHICS PLC

(GLOG)
  Report
SummaryQuotesChartsNewsCalendarCompanyFinancials 
SummaryMost relevantAll NewsPress ReleasesOfficial PublicationsSector news

Global Graphics : Improving PDF accessibility with Structure Tagging

12/02/2020 | 04:47am EST

In this week's post, Global Graphics Software's principal engineer Andrew Cardy explores the structure tagging API in the Mako™ Core SDK. This feature is particularly valuable as it allows developers to create PDFs that can be read by screen readers, such as Jaws®. This helps blind or partially sighted users unlock the content of a PDF. Here, Andy explains how to use the structure tagging API in Mako to tag both text and images:

What can we Structure Tag?

Before I begin, let's talk about PDF: PDF is a fixed-format document. This means you can create it once, and it should (aside from font embedding or rendering issues) look identical across machines. This is obviously a great thing for ensuring your document looks great on your user's devices, but the downside is that some PDF generators can create fixed content that is ordered in a way that is hard for screen readers to understand.

Luckily Mako also has an API for page layout analysis. This API will analyze the structure of the PDF, and using various heuristics and techniques, will group the text on the page together in horizontal runs and vertical columns. It'll then assign a page reading order.

The structure tagging API makes it easy to take the layout analysis of the page and use it to tag and structure the text. So, while we're tagging the images, we'll tag the text too!

Mako's Structure Tagging API

Mako's structure tagging API is simple to use. Our architect has done a great job of taking the complicated PDF specification and distilling it down to a number of useful APIs.

Let's take a look at how we use them to structure a document from start to finish:

Setting the Structure Root

Setting the root structure is straight forward. Firstly, we create an instance of IStructure and set it in the document.

Next we create an instance of a Document level IStructureElement and add that to the structure element we've just created.

//Get the document that we want to tag from the assembly
IDocumentPtr document = assembly->getDocument(0);
//Create the root structure
constIStructurePtr structure = IStructure::create(jawsMako);
document->setStructure(structure);
//Create the document structure element
IStructureElementPtr documentElement = IStructureElement::create(jawsMako, 'Document');
documentElement->setTitle('Mako SDK Tagged Document');
documentElement->setLanguage('en');
//Add the document structure element to the root structure
structure->appendElement(documentElement);

One thing that I learnt the hard way, is that Acrobat will not allow child structures to be read by a screen reader if their parent has alternative (alt) text set.

Add alternate text only to tags that don't have child tags. Adding alternate text to a parent tag prevents a screen reader from reading any of that tag's child tags. (Adobe Acrobat help)

Originally, when I started this research project, I had alt text set at the document level, which caused all sorts of confusion when my text and image alt text wasn't read!

Using the Layout Analysis API

Now that we've structured the document, it's time to structure the text. Firstly, we want to understand the layout of the page. To do this, we use IPageLayout. We give it a reference to the page we want to analyze, then perform the analysis on it.

IPageLayoutPtr pageLayout = IPageLayout::create(jawsMako, page->edit());
pageLayout->analyze();

Now the page has been analyzed, it's easy to iterate through the columns and nodes in the page layout data.

//Get the page layout data from the analysis
constIPageLayoutDataPtr layoutData = pageLayout->getLayoutData();
//Go through the columns that have been found
for(uint32 columnIndex = 0; columnIndex getNumberOfColumns(); columnIndex++)
{
//Create a paragraph structure element (this will contain the text structures)
IStructureElementPtr paragraphElement = IStructureElement::create(jawsMako, 'P');
documentElement->appendElement(structure, paragraphElement);
//Iterate through the nodes (runs of text) in the column
IPageLayoutNodeCollection nodeCollection = layoutData->getColumn(columnIndex);
for(uint32 nodeIndex = 0; nodeIndex size(); ++nodeIndex)
{
constIPageLayoutNodePtr node = nodeCollection[nodeIndex];
//If it's a text run
if(node->getType() != ePLTTextRun)
continue;
//Create an add the text structure
...
}
}
Tagging the text

Once we've found our text runs, we can tag our text with a span IStructureElement. We append this new structure element to the parent paragraph created while we were iterating over the columns.

We also tag the original source Mako DOM node against the new span element.

//Create a span element to hold the text run
IStructureElementPtr spanElement = IStructureElement::create(jawsMako, 'Span');
//set the actual and alternative text, as read by the screen reader
spanElement->setAlternate(unicode);
spanElement->setActualText(unicode);
//Add the span element to the paragraph
paragraphElement->appendElement(structure, spanElement);
//Tag the span structure element to the source Mako DOM node
structure->tagNode(spanElement, page, glyphs);
Tagging the images

Once the text is structured, we can structure the images too.

Earlier, I used Microsoft's Vision API to take the images in the document and give us a textual description of them. We can now take this textual description and add it to a figure IStructureElement.

Again, we make sure we tag the new figure structure element against the original source Mako DOM image.

//Create the image structure element (figure in PDF specification terminology)
IStructureElementPtr imageElement = IStructureElement::create(jawsMako, 'Figure');
imageElement->setTitle('Auto Captioned Image');
imageElement->setLanguage('en');
//Set the description (in this case, from Microsoft's computer vision API)
imageElement->setAlternate(description);
//Then put the image structure into the parent. For example,
//this could be a paragraph or document structure element.
parentElement->appendElement(structure, imageElement);
structure->tagNode(imageElement, page, image);
view rawtag_image.cpp hosted with by GitHub
Notifying Readers of the Structure Tags

The last thing we need to do is set some metadata in the document's assembly, this is straight forward enough. Setting this metadata helps viewers to identify that this document is structure tagged.

//Flag in the metadata that we are tagged
IDOMMetadataPtr metadata = assembly->getJobMetadata();
metadata->setProperty(IDOMMetadata::ePDFInfo, 'Marked', PValue(true));
Putting it all Together

So, after we've automated all of that, we now get a nice structure, which, on the whole, flows well and reads well.

We can see this structure in Acrobat DC:

[Link]

And if we take a look at one of the images, we can see our figure structure now has some alternative text, generated by Microsoft's Vision API. The alt text will be read by screen readers.

[Link]Figure properties dialogue

It's not perfect, but then taking a look at how Adobe handles text selection quite nicely illustrates just how hard it is to get it right. In the image below, I've attempted to select the whole of the title text in Acrobat.

[Link]Layout analysis is hard to get right!

In comparison, our page layout analysis seems to have gotten these particular text runs spot on. But how does it fair with the Jaws screen reader? Let's see it in action!

[Link]

So, it does a pretty good job. The images have captions automatically generated, there is a sense of flow and most of the content reads in the correct order. Not bad.

Printing accessible PDFs

You may be aware that the Mako SDK comes with a sample virtual printer driver that can print to PDF. I want to take this one step further and add our accessibility structure tagging tool to the printer driver. This way, we could print from any application, and the output will be accessible PDF!

In the video below I've found an interesting blog post that I want to save and read offline. If I were partially sighted, it may be somewhat problematic as the PDF printer in Windows 10 doesn't provide structure tagging, meaning that the PDF I create may not work so well with my combination of PDF reader and screen reader. However, if I throw in my Mako-based structure and image tagger, we'll see if it can help!

[Link]

Of course, your mileage will vary and the quality of the tagging will depend on the quality and complexity of the source document. The thing is, structural analysis is a hard problem, made harder sometimes by poorly behaving generators, but that's another topic in itself. Until all PDF files are created perfectly, we'll do the best we can!

Want to give it a go?

Please do get in touch if you're interested in having a play with the technology, or just want to chat about it.

[Link]Andy Cardy, Principal Engineer at Global Graphics Software

Andy Cardy is a Principal Engineer for Global Graphics Software and a Developer Advocate for the Mako SDK.

Find out more about Mako's features in Andy's coding demo:

SHARPEN THE SAW - A LIVE CODING DEMO USING MAKO™

In this session Andy uses coding in C++ and C# to show you three complex tasks that you can easily achieve with Mako:
• PDF rendering - visualizing PDF for screen and print (15 mins)
• Using Mako in Cloud-ready frameworks (15 mins)
• Analyzing and editing with the Mako Document Object Model (15 mins)

To be the first to receive our blog posts, news updates and product news why not subscribe to our monthly newsletter? Subscribe here

Follow us on LinkedIn and Twitter

  • ' style='display: inline-block; line-height: 1; vertical-align: bottom; padding: 0px; margin: 0px; text-indent: 0px; text-align: center;'>Share

Disclaimer

Global Graphics plc published this content on 26 November 2020 and is solely responsible for the information contained therein. Distributed by Public, unedited and unaltered, on 02 December 2020 09:46:08 UTC


© Publicnow 2020
All news about GLOBAL GRAPHICS PLC
01/19GLOBAL GRAPHICS : Rentons Labels exceeds demands of growth with HYBRID Software
PU
01/19GLOBAL GRAPHICS : Stronger together with Global Graphics
PU
01/13GLOBAL GRAPHICS : completes acquisition of HYBRID Software Group
PU
01/13GLOBAL GRAPHICS PLC : acquisition of HYBRID Software Group is completed
AQ
01/08GLOBAL GRAPHICS : Result of the General Meeting
PU
01/08GLOBAL GRAPHICS PLC : Result of the General Meeting
AQ
01/05GLOBAL GRAPHICS : Join our Graduate Program
PU
01/05GLOBAL GRAPHICS : Financial reporting calendar 2021
PU
01/05GLOBAL GRAPHICS PLC : Financial reporting calendar 2021
AQ
01/04GLOBAL GRAPHICS PLC : second shareholder Q & A conference call
PU
More news
Financials
Sales 2019 22,5 M 27,2 M 27,2 M
Net income 2019 0,45 M 0,55 M 0,55 M
Net cash 2019 1,11 M 1,35 M 1,35 M
P/E ratio 2019 89,4x
Yield 2019 -
Capitalization 39,9 M 48,2 M 48,2 M
EV / Sales 2018 1,82x
EV / Sales 2019 1,75x
Nbr of Employees 138
Free-Float 41,3%
Chart GLOBAL GRAPHICS PLC
Duration : Period :
Global Graphics PLC Technical Analysis Chart | GLOG | GB00BYN5BY03 | MarketScreener
Technical analysis trends GLOBAL GRAPHICS PLC
Short TermMid-TermLong Term
TrendsBearishBullishBullish
Income Statement Evolution
Managers and Directors
NameTitle
Michael Rottenborn Chief Executive Officer & Executive Director
Guido René van der Schueren Chairman
Neil Wylie Operations Director
Graeme Redgrave Huttley Chief Financial Officer & Executive Director
Clare Findlay Independent Non-Executive Director
Sector and Competitors