Products
Dragon NaturallySpeaking 10 SDK Frequently Asked Questions
Q. What is the difference between the Client and Server SDK editions and the boxed versions of Dragon NaturallySpeaking?
Dragon NaturallySpeaking is the world’s best selling speech recognition software. It can be deployed in two methods. The first method is the client deployment. This means that Dragon NaturallySpeaking software is running locally on a client machine. The primary benefit of a client deployment is instant transcription of dictation. A client deployment can be implemented either through the use of stand-alone Dragon NaturallySpeaking packaged software editions, or by using the Dragon NaturallySpeaking SDK Client (DSC) Edition.
Dragon NaturallySpeaking (DNS) is an end-user, UI-based continuous speech recognition program that lets users dictate into any Windows application. DSC makes it possible to integrate the speech-recognition capabilities of Dragon NaturallySpeaking into any Windows application without using the Dragon NaturallySpeaking user interface. Using the Active-X components supplied with DSC, developers can create speech-aware applications out of the box or add speech recognition to existing applications.
The second method through which Dragon NaturallySpeaking can be deployed is on a server, implemented using Dragon NaturallySpeaking SDK Server (DSS) Edition. DSS is an extension of Dragon NaturallySpeaking with new and improved speech-recognition functionality geared towards back-end transcription. Back-end means that the recognition engine resides on a remote machine in a network (that is, not at the location where the user is dictating), which enables transcription of audio from telephony, recorder or broadcast files. Although DSS is not a transcription workflow system in itself, it is designed to be integrated into a dictation/transcription workflow system. The primary benefit of a server deployment is that the people who are dictating don’t need to change their current dictation habits to accommodate client-based speech recognition.
Q. What is the difference between Dragon NaturallySpeaking SDK Server Edition and Dragon NaturallySpeaking SDK AudioMining?
Depending on the use case, the Dragon SDK contains two sets of models that users can integrate within a server environment to run recognition: Dragon NaturallySpeaking SDK Server (DSS) Edition and Dragon SDK AudioMining.
DSS is a speaker-dependent system, designed to create precisely accurate transcriptions. This model requires separation of speaker audio channels and yields significantly better results when specific user profiles are defined and refined over time. A use case for DSS could be providing transcriptions of recorded dictations of known speakers.
Alternately, AudioMining is a speaker-independent system, designed to gather rough transcriptions that could be useful in indexing, word search, etc. Unlike DSS, user profiles aren’t necessary, enabling offline recognition from individuals or groups of speakers. The use case for AudioMining may be an organization listening to all audio to identify various content, such as harassment.
Q. What is Audiomining?
Audiomining enables the indexing of audio and video media using speech recognition. This increases the accessibility of audio and video content, makes multi-media content “deep” searchable and enables data-mining of multimedia content.
Q. Does Dragon NaturallySpeaking SDK AudioMining support broadcast quality audio?
Yes, but only for US English. Dragon Audiomining SDK supports non-US English languages in telephony models only.
Q. Does Dragon NaturallySpeaking SDK AudioMining support video indexing?
No. However, it is possible to separate the audio portion from a video file, using free tools on the Internet, enabling Dragon AudioMining to index the corresponding audio content from a video file. The separation needs to be implemented as a separate process before invoking the audiomining indexer.
Q. How can I convert analog tape to digital format?
To convert content from analog tapes to digital format, you need an analog device with audio output capabilities (analog recorders usually have an 'ear' or output jack). Connect an audio cable to this output and the other end to the line-in jack of the soundcard in the computer. Change the windows mixer recording settings to “Line-In,” then launch an audio capture program (like Creative's Wave Studio) and record the audio in .wav format.
Q. How does the Dragon SDK Server impact traditional transcription workflow?
Traditionally, an assigned transcriptionist produces a document by listening to a recorded dictation and typing the dictation using specialized transcription software and hardware. This is a labor-intensive – and therefore expensive – process.
State-of-the-art speech recognition technology in transcription workflow automates the transcription process, increasing transcription productivity and reducing transcription costs. When used in a transcription workflow, speech recognition is used to create a draft transcript of the recorded dictation. The transcriptionist listens to the dictation, verifies the draft, and makes corrections and further edits. Productivity gains are obtained because correcting recognition errors and editing text is faster then typing the entire dictation.
Q. What is the difference between back-end and front-end speech recognition?
Front-end speech recognition takes place in real-time at the location where the user is dictating. On the other hand, back-end speech recognition takes place on a remote machine in a network. This is not real-time recognition; the results are produced in batch mode.
Back-end speech recognition used for transcription is inherently more difficult than speech recognition for interactive dictation (front-end speech recognition). One particular reason is that back-end recognition is typically tasked with transcribing speech that is recorded over the phone or a voice-recorder. Such recordings have higher noise levels and lower bandwidth than what is typical for recordings made with high-quality, noise-canceling, close-talking microphones. The recordings used for back-end transcription also generally reflect faster speaking rates and poorly-enunciated speech.
Q. What is a “Vocabulary”?
Dragon NaturallySpeaking SDK Client (DSC) Edition comes with a set of Vocabularies each of which contains:
- A list of words (and phrases) that can be recognized by the software.
- Pronunciation information for those words as well as category information (such as “singular noun”, “last name”, “adjective”).
- Statistics about how frequently each of these words typically appear and how these words are typically combined in phrases. (This is called the language model.)
Starting with the right Vocabulary and customizing it to the speaker's specific usage and preferences are important factors for getting the best possible accuracy possible for each speaker. A Vocabulary in DSC, as in Dragon NaturallySpeaking, is a body of information that includes a word list and a statistical language model. The word list includes information about all the words the program can recognize. The language model contains usage information about those words. DSC uses the Vocabulary to decide what to transcribe based not only on the sound of the words, but on their context.