By Dr John Yardley, CEO of Threads Software Ltd, shares his thoughts on improving business productivity through speech recognition.
In the current climate, business productivity has never been more important. And making effective use of the best technology and communications methods is key to that.
If your business has had a telephone system installed in the last five years, chances are that you have a VoIP telephone system. VoIP stands for “Voice over Internet Protocol”, and in plain terms, this means that telephone connections are made using a standard office data structure. Analogue voice signals are converted into digits (0s and 1s) to become data, and because VoIP telephone handsets can be connected using existing data networks, subscribers are no longer locked into expensive proprietary systems using dedicated (and messy) wiring.
Why Voice?
One reason many people continue to use emails and texting when a phone call would be more effective is that email provides a permanent record – one you can find at a later date. Yet a phone call is immediate, interactive and in many cases, what’s needed to close a transaction. It’s often assumed that phone calls evaporate into the ether even though nowadays, most VoIP phone systems can record calls, something many users are unaware of. Even so, it can still be difficult to retrieve a specific call unless you can remember exactly the date and time it was made. It is clearly impractical to listen to dozens of calls made over an estimated time frame to find one specific call.
Even for small businesses, the technology now exists to enable them to search recorded phone calls in the same way as emails using Automatic Speech Recognition or ASR. The technology is now well within the budgets of most businesses. But you do need to have a VoIP phone system – even if the system itself does not record the calls – and to be realistic about how well current ASR systems perform and what they’re best at.
ASR Today
People often judge the quality of ASR by those frustrating chatbot interactions typically forced upon customers by banks – the ones that request a “yes” or “no” answer that has to be repeated 10 times only to finally give up, or if lucky, pass the transaction to a human operator. Customers’ time costs the banks nothing, so they have little incentive to provide good systems.
So it may also come as a surprise that computers are often as good as humans at recognising individual words – or “word spotting.” They are less good at transcribing complete conversations, which is better described as speech understanding, but that said, ASR systems can correctly transcribe anything between 60 and 90 percent of a telephone conversation – and this is improving all the time.
To be really effective when searching either emails or phone calls, it’s best to search for unusual words – that is, those words that occur relatively infrequently. Unusual words tend to be phonetically more complex and hence easier for ASR systems to spot.
Call transcription (ASR) currently costs about £3.00 per hour and is dropping in cost. In a business where employees are on the phone all day, this could mount up, but even if a business decides that the ability to search any call is not worth the cost, it’s still worthwhile recording calls. ASR can be applied at any time in the future, and as time goes on ASR will improve – at some point outperforming humans. If calls are not recorded when they are made, the opportunity to transcribe and search them will be lost forever and it will never be possible to benefit from the information they contain.
The Practicalities
Installing a VoIP telephone system is straightforward. In most cases, an existing local network and internet connection may be used. An early decision to make is where to locate the Private Branch Exchange or PBX. A PBX is like a mini telephone exchange that allows internal extensions to connect to one another and to the public network. A local (office-based) VoIP PBX requires some hardware – just a PC-type computer – to connect handsets in the office network to the public telephone network. Alternatively, a PBX may be located outside the subscriber’s offices – perhaps in the Cloud.
As employees often use the mobile phone as their main method of communication, you may well ask how this fits with ASR. Ironically, the answer is not particularly well. Although mobile use the same VoIP technology, the operating networks (Vodafone, O2, etc) are not “open” in the same way as is the Internet. This means that in order to be transcribed, calls have to be routed via an office PBX, captured by the service network provider or before they leave the mobile phone. Provided VoIP is used, it’s possible to automatically ingest phone calls directly from a subscriber’s network so they can be stored (recorded) and indexed in a database for easy retrieval.
Both the storage and transcription of calls can also be instigated manually – on a call-by-call basis – but this is very time consuming. It involves capturing the call in the correct format, uploading it to an ASR subscription service then filing the transcription for later inspection. This is far better done automatically. It also means that user-specific contextual knowledge sources can be applied to the ASR to improve their performance.
Five years ago, call transcription would have been the preserve of large corporate call centres with big budgets. But now, any small business can run a VoIP phone system, catalogue, record and search company-wide calls at less than the cost of a proprietary telephone system. And not be locked in. Businesses that will succeed in these changed times are those that maximise what new and emerging technologies can offer to put them ahead of their rivals.