Like most aspects of American society, the judicial system deals with a diverse population. Many among that population have limited English proficiency and must rely on English-speaking family members or friends who may not be fluent speakers themselves. Even those with a good mastery of the English language may struggle with legal terminology and the unique vocabulary associated with court proceedings.

For perspective, according to the US Census Bureau’s 2018 American Community Survey, a record 67.3 million US residents (native-born and immigrants) spoke a language other than English at home. Of those who spoke a foreign language at home, 38% told the Census Bureau they speak English “less than very well.” If current demographic trends continue, the 2020 Census will likely report even higher numbers.

To help ensure everyone can understand and participate in the US legal system — including those with limited English proficiency, courts often employ a variety of tactics. Among them: bilingual staff and third-party telephonic interpreter services.

Both can be costly. They’re also often limited to only a handful of primary languages. Given that there are at least 350 different languages spoken in the US, it is challenging to enable communication and comprehension for a vast segment.

Besides, it’s difficult to recruit and retain qualified translators and bilingual staff who are well trained in court processes and understand the associated terminology. Ongoing training is time-consuming and resource-intensive.

Voice Assistant Solutions

To mitigate potential language barriers more efficiently, it’s possible for court systems and other government agencies to use voice-to-text interpretation services. Such a project would entail two “voice assistant” apps.

The first would enable court staff and visitors who speak other languages, or who have limited English proficiency, to communicate without an interpreter. The second would allow those with limited English proficiency to access information from a court-sponsored knowledge base without interacting with court staff.

In this blog, we’ll discuss exactly how such apps could be built and the benefits they’d provide for any government agency that faces language translation issues.

The User Interface (UI)

The first step could be to create a UI that could serve as the common front-end for the two voice assistant apps. It accommodates new capabilities and use cases as additional needs are identified, and new technologies emerge.

The UI would be built using ReactJS, a JavaScript library for building interactive, component-based interfaces. ReactJS guarantees stable code and facilitates further maintenance. React Router and dynamic client-side routing could be used to build the apps as a single-page web app with navigation, without the page refreshing as the user navigates.

MobX could provide the mechanism to store and update the application state that React could then use. The library would make state management simple and scalable by transparently applying functional reactive programming (TFRP).

All code would be written in TypeScript, an open-source language that builds on JavaScript. TypeScript uses types, which helps make code easier to read, and offers browser compatibility.

Front-end UI web development could be facilitated through the use of Sass, a stable, professional-grade CSS extension language, and Twitter Bootstrap, an open-source CSS framework directed at responsive, mobile-first front-end web development.

The AWS SDK for JavaScript could be used at the network layer, where direct integration of AWS services is suitable. The UI could be exposed via Amazon API Gateway with cross-site scripting protection and security enforced via Transport Layer Security (TLS), AWS Signature V4, and custom Amazon Cognito authorizers. The apps would only be available from the court network and for authorized personnel and locations.

To help ensure widespread usability, the app would work in most web browsers issued after 2015, including Google Chrome, Chromium, Safari, Microsoft Edge, Firefox, and Opera. All modern platforms, such as macOS, Linux, Windows, iOS, and Android, would also be supported.

Architecture Diagram

Architecture Diagram

App 1: Automated Voice Assistant

The Automated Voice Assistant app would serve as a tool to help people speaking different languages to understand each other, with the specific use case of court visitors interacting with court staff.

When someone using the app begins to talk, the audio stream would be routed to an Amazon Transcribe streaming API. The speech would be recognized in real-time, with breaks in the audio stream based on natural speech segments, such as a change in the speaker or a pause in the audio.

The transcription could then return progressively to the app. Each response could contain more transcribed speech until the entire segment was transcribed.

The transcription could then be translated into the court employee’s language (English) using Amazon Translate. This neural machine translation service uses deep learning models to deliver accurate, natural-sounding translations.

From there, Amazon Polly could convert the text into lifelike speech. This text-to-speech service relies on advanced deep learning technologies to synthesize natural-sounding human speech. By using Amazon Polly’s neural text-to-speech (NTTS) voices, the app could further create a more “human-like” conversation.

When the audio is ready, the app could use Amazon Pinpoint, an outbound, and inbound marketing communications service, to send a push notification to the end device.

AWS Step Functions, a serverless function orchestrator, could sequence the multiple AWS services into event-driven workflows that maintain the application state. The output of one step would act as an input to the next.

At every step, the ML services’ outputs could be securely stored in Amazon Simple Storage Service (Amazon S3 object storage buckets. This would make it easier to analyze them for misinterpreted speech, incorrectly supplied advice, and even offensive content. The content could also be used to improve the machine learning models further.

Architecture Diagrams


App 2: Smart Voice Assistant

The Smart Voice Assistant app could be architected much the same as the Automated Voice Assistant. The primary difference would be that it should include an AI-powered backend. An app user could ask a question that would then be answered with information from a court-sponsored knowledge base rather than by a court employee.

It could also follow a similar workflow as the Automated Voice Assistant process until the translated text is available. At that point, instead of being spoken by a court employee through the app, the information would be passed to Amazon Lex. Amazon Lex is a service for building conversational interfaces using the same deep learning technologies that power the well-known Amazon Alexa.

When someone using the app asks a question, a designated Lambda could pass it to Amazon Kendra, an AI service capable of answering free-form questions from a knowledge database. The response would be analyzed by Amazon Comprehend, a natural language processing (NLP) service that uses machine learning to find text insights and relationships. Amazon Polly could then convert the text into lifelike speech.

Architecture Diagrams


Results and Benefits

What impact would voice assistant apps like this have in US court systems and government agencies? They would instantly enhance communication and comprehension for people interacting with the courts who may have limited English proficiency, ensuring they could access justice.

The apps would also ease some of the burdens on the court system. Resources allocated for interpretation and translation services could be used for other purposes. Fewer errors and instances of miscommunication are also potential benefits. The built-in security mechanism would boost overall security and privacy as well.

Besides, the apps would be cost-efficient. The use of serverless architecture would eliminate the need for dedicated system administrators and infrastructure that could be directed to other needs. The compute resources required to power the apps would be based on the capacity used, so there would be no overprovisioning and paying for unused resources. Resources could be easily scalable and accessed in milliseconds, not minutes, to meet fluctuating demand and high traffic periods.

Is It Time to Talk?

In this blog, we looked at how two apps could potentially be created to provide impactful voice-to-text interpretation services. If you’re interested in learning how you can integrate machine learning, AI, voice-to-text technologies, and more into your apps, talk to ClearScale. We have extensive experience in integrating a variety of AWS services to create custom, often complex apps, including those employing AI and other advanced technologies.