speech recognition

April 28, 2008

Speech recognition for cellphones: one upgrades, one quits

First the one that quit.

I had signed up for Sprint's Voice Command ($5/month) for my Treo 700p over a month ago only to recently receive a letter saying that they are discontinuing this service as of July. Basically, you dialed "*" and then a prompt would come on. You then spoke the contact's name, but this wasn't using the contact manager on your phone (or at least on the Treo, I don't how other phones work). You had to go online to the Sprint Web site and enter these contacts into a contact database. I recall using their import function to pull the names off my phone, but this didn't work at all. Maybe that's why this service is being stopped.

The Sprint Web site is working a lot better than it did not too long ago. Whenever you change your plan, they send you an SMS and an e-mail. If you have a question about your plan, you can chat with a rep online, and then the transcript of this conversation is sent to you via e-mail.

And the one that I upgraded.

I was also trying out the Voice Control based on Nuance technology ($6/month), but initially I didn't think that I'd keep it.

The Nuance Voice Control Web page is very helpful for setting up your phone and learning how to operate the software. It even has screens customized to represent your phone. (The screenshot here is showing an old version of the software.)

Nuancevoicecontrol

On the Treo, after you install the software, the side button is reprogrammed so that when you hold it down, it launches the software and beeps to let you know it's ready. While still holding down the button, you speak the command according to the list they provided on the Nuance Web site. It sends the info to their services which then sends the text and commands back to the phone to do the task. The time it takes to do this is the big issue.

I decided to finally review this product after getting the Sprint letter to see if I was going to discontinue both services. Voice Control prompted me to download the new version 1.5, which also made me download the new Palm installer before that. The upgrade was free and went on without a hitch, so I began my testing.

The fastest function is a Google search, and it works remarkably well. I was able just to say "Search New York Cancer Consortium" while still on the dial pad screen, and in less than 10 seconds it brought up the Treo browser and some hits on Google. This particular search would have taken some effort with thumbtyping, so for searches like this one, it's a real convenience. Or, you can say "Find bookstore 10019," and it will give you a Google map of bookstores in Manhattan. Another real convenience.

You can also create a small e-mail message, or add an appointment to your calendar, but this is only a convenience if you're on the go and can set the phone down or put it back in your pocket while it does its thing. It might take 20 seconds or more, or even sometimes not at all. I would like to have the ability just to make a quick note into the memo pad, but it won't allow this.

As far as calling people in your phone's contact manager, it averages around 5-10 seconds, so if you're willing to accept this wait, it works well.

I have to say that overall, I wish Voice Control would work faster, but even then I think it's worth keeping

April 02, 2008

Quick links: speech recognition

Now that CTIA 2008 is underway, the stream of new product/service announcements is starting.

Nuance Communications is announcing:

Voicemail to Text, a voicemail transcription service that delivers high-quality readable messages to mobile devices. Offered through carriers, the service uses Nuance’s world-leading speech recognition and transcription workflow solutions to convert voicemails left on any voicemail box into text. Transcribed messages are sent to users as SMS or email messages.

Nuance is also providing its Mobile Speech Platform to TeleNav for mobile GPS support. From the Nuance Web site:

Mobile users can conveniently access TeleNav GPS Navigator on a device they already carry with them and receive information in real-time based on their current location. Voice destination entry, which makes navigation services easier to use on mobile phones, has the ability to significantly enhance an already fast-growing market for mobile navigation. Industry analyst firm, In-Stat predicts that the total number of mapping and navigation mobile phone subscribers could exceed 70 million worldwide by 2012.

From the blog Wireless Speech Recognition about another service that uses the Nuance software:

France Loisirs, the French version of a mini-Amazon.com, has now selected Atos Worldline to host and operate a new automated telephone order service, based on speech recognition. The service is named “Commande Flash” & both smooths out / combines the different purchasing channels already available and reduces human-handled calls.

Looking at the healthcare sector, the Speech Recognition blog's Risking a SpeechMagic-Dragon Comparison? seems has started much discussion.

February 28, 2008

Sony releasing new crop of digital voice recorders in April; speech recognition supported

Sonyicdvoicerecorders_2Here's a photo of Sony's new digital voice recorders ICD-UX70 ($100), ICD-UX80 ($150) providing 1GB and 2GB in memory, respectively. Both support stereo MP3 recording, USB mass storage.

Their ICD-SX68 ($150) and ICD-SX68DR9 ($200) both have 512MB and support Dragon NaturallySpeaking speech recognition. The DR9 is bundled with the Dragon software.

Finally, they're releasing  the ICD-P620 ($60) and ICD-B600 ($40), both with 512MB and USB mass storage. The P620 includes Digital Voice Editor software.

February 20, 2008

BusinessWeek mobile killer apps: two do speech recognition

This is part of their Wireless World special report.

The particular story I want to talk about, "Killer Mobile Apps," is Handango's list of most downloaded software for smartphones. But I can't link to it because it's a slide show that comes up as a pop-up window.

Here're the ten:

MobiTV - $10 per month to watch MSNBC, ABC, Discovery Channel, etc.

Spb Mobile Shell 1.5 - uncluttering Windows Mobile

Spb Pocket Plus - customizing the Windows Mobile display

VoiceControl
- Provided by Nuance Communications for $6 a month, the software enables users of devices based on the Palm OS to voice-dial, dictate e-mail and text messages, and view Web content without pressing a button.

Ringtone Megaplex for BlackBerry
- "Designed for Research In Motion's (RIMM) BlackBerry devices, this software offers more than 1,000 distinctive ringtones in genres from classical to pop."

Microsoft Voice Command -
From Microsoft for $39.99. This is another speech recognition application that makes it easier to use voice commands, this time to look up contacts, make phone calls, and choose music.

Pocket Mirror Standard Edition
- sync MS Outlook features to Palm OS

Ringphonic Lite
- $7 per month to let BlackBerry users customize their ringtones

Pocket Informant 2007
- For Pocket PC users, it allows them to create and move appointments and tasks

Spd Diary 2.5 - Launcher for Windows Mobile Pocket Outlook

UPDATE: I signed on for Sprint's Voice Command which will be billing me $5/month. I was able to set up the service through a rep chat service on the Sprint Web site. I then hit asterisk, dial, which started some voice instructions which said something about adding names to an address book on the Sprint Web site. Now, I'm told by another Sprint rep that this isn't necessary.

You basically dial "*" and then are prompted for a name or number. It works for names in my address book, but I'll have to give it a thorough test to come to a final conclusion.

I also downloaded Nuance's VoiceControl which is embedded speech recognition software for navigating through Palm OS allowing you to do a Web search or add an item in your address book.

You hold down the side button on the Treo 700p, you hear a beep, then you dictate a command, then let go of the button.

Here's a sampling of the commands available (the full list):

  • Phone Calls - Call [Contact's name] + [phone type (optional)].    
    • Call John Doe.
    • Call John Doe home.
    • Call John Doe office.
    • Call John Doe mobile.
  • Phone Dialing - "Dial" [phone number].    
    • Dial 555-1212.
    • Dial 513-555-1234
  • Call Voicemail - "Call Voicemail"
  • Quick E-mails - "E-mail" [Contact's name] Subject [subject text] Body [body text].    
    • E-mail John Smith subject meeting body John, I'm going to be late to the meeting.  Go ahead and start it without me.
    • E-mail John Smith body Call me as soon as you get this.  We need to talk.
  • Calendar - "Add Appointment" [appointment text] Date [date of appointment] Time [time of appointment].    
    • Add appointment dentist date January fifth time ten o'clock.
    • Add appointment Joe's birthday date March second.
  • Web sites - "Go to website" [website URL]    
    • Go to website google.com    
  • Quick Web    
    • Weather [ZIP code]    
      • Weather 10001


February 16, 2008

Pogue on "Star Trek-ishly accurate" Dragon NaturallySpeaking 9, now licensed for the Mac

Normally I don't watch the very theatrical videos on technology topics produced by NY Times tech columnist David Pogue, but this is a CNBC segment  entitled "Digital Dictation" where's he interviewed in their studio by their anchorwoman, in addition to his Commedia dell'arte clip.

Long story short:

  • He's been using Dragon NaturallySpeaking (DNS) for 15 years now as a result of tensynovitis that developed from his writing/piano playing gigs
  • He's excited about the capabilities of DNS 9, he has described as being “Star Trek-ishly accurate" [98.9%] even without training the software by reading passages, compared to when he had to dictate word-by-word with older versions
  • He's a Mac person who writes the highly regarded "Missing Manual" books
  • He's almost in heaven now that the next version of MacSpeech Dictate (MSD) will be based on the DNS speech recognition engine
  • He'll have to spend some time in Purgatory, though. This new version of MSD does not allow correction by voice command (you have to fix transcription errors the old mouse and keyboard way, plus these corrections won't serve to train the software and increase accuracy), but will appear in the second version

See, the message was delivered without extras, costumes and special effects. Plus, I didn't get called out by the anchorwoman for calling her "Dude!"

Talking to your PC: KnowBrainer Command Set

KnowbrainerI'm continuing my exploration of the Dragon NaturallySpeaking (DNS) 9 Medical speech recognition software by going back to the KnowBrainer forum where I remember picking up a lot of useful information in the past.

I was asking some questions I had about working through some things I needed to work out, and in the process I received some great tips from the folks there about how to use this software to its best advantage.

Knowbrainer2007box3d400 Based upon a strong recommendation for the KnowBrainer 2007 Command Set from a person on the forum, I decided to install it and see how it could improve my workflow.

It basically adds 10,000 commands to DNS, tweaks some of the settings, provides their own Command Browser for creating advanced-scripting commands, and a scripting language called VerbalBasic to create commands verbally.

When I first tried out Dragon's speech recognition software, the Pro version 9, the first thing that really amazed me was reading an article from the New York Times Sunday Magazine, and seeing how accurately it could transcribe a variety of words with very little training. Words like "kibbutz" for example, or even some proper names or acronyms.

I think when most people try out this software at a computer convention or any other demo, they judge it by what they try to dictate ex temporaneously, which doesn't work as well. When you read something, it produces a flow and cadence that works better with the software.

This makes you realize that you need to improve your dictation skills, both by pronouncing clearly and thinking far enough ahead so that you can speak a full sentence in a continuous flow, also including the punctuation where needed. So, while you're still getting the hang of dictation, you find yourself trying to remember commands to select, edit or correct your mistakes by moving the cursor and making it do what you want it to do. Now things slow up. This is where the Command Set begins to show its usefulness.

Knowbrainercommand2

While DNS obviously comes with its own set of commands, some of which are available in the printed manual, all of which are searchable within the program, the list in the Command Set PDF manual gives you the full roster of the commands you can use. What I did was to print out the pages with the commands list, and just scan it for the commands that I've been needing, then trying them out. So far it's increased my speed in the using DNS.

The second amazing thing is realizing that you can navigate through various Windows programs with spoken commands.  It really improves on the tedious task of using the mouse and keyboard to pick through screen menus. And what's really impressive is opening documents without searching through the file folders. If you want to check something in the manual, you simply say "How do I," and the PDF opens. Supposedly it's also possible to search for any file on your PC using desktop search software which I'll be trying out next.

I think anything that saves you from peering at the screen to navigate is a wonder. I'm not particular in love with typing either.

February 13, 2008

First try with Dragon NaturallySpeaking 9 Medical speech recognition software

Dragon_med_9_2 Yesterday afternoon, as the snow continued to fall here in Brooklyn, the DHL delivery man brought me the latest medical speech recognition software from Nuance Communications, Dragon NaturallySpeaking 9 Medical. I'm not only interested in streamlining my workflow when writing on oncology and medical technology topics, but also speech-driven clinical documentation. This allows the physician to customize patient data in an EMR that pick lists won't allow.

This story by Eric Fishman, MD of EMRConsultant.com, tells of his experience using templates for capturing data in an EMR versus using speech recognition.  [Disclosure: He's a reseller for this software.] His point is that although much of the patient interview process is about clearly defined reponses to standard questions, eventually these patient records begin to look alike if you're limited to using pick lists.

While data dictated by speech recognition and transcribed as free text is not easily parsed and distributed to third parties, it does have some advantages:

  • It helps the physician create a record that paints a mental picture of each patient, so that they can be remembered individually
  • A plan of care can be described so that when the physician selects "chest pain" as a symptom, he or she can elaborate why a cardiac cause has been ruled out

It does take some effort on the part of the person dictating in developing a good dictation technique. When I was a pathology resident, I gained an expertise in dictating observations as I did my dissections of specimens. These tapes were later given to a transcriptionist who typed out the reports.

As far as producing a convincing story that explains a topic in an interesting way, some forethought is needed, but you still can expect to do some editing afterwards. If you don't, you wind up with a written piece that's a tad bit too chatty--you're caught up in a web of circumlocution, and the reader doesn't have a clear idea of the points you're making.

Now for the unboxing. You can see in the photo that the package contains 3 CDs, a manual and a headset with a boom mike.

Installation occurred without a hitch, and it was just a matter of plugging in the headset into the sound card of my PC with the headphone and mike jacks.

It had me read a few paragraphs for the initial training. Then it scanned my Word docs and Outlook e-mail to get an insight into how I write. I wonder if they should give you the option of choosing which files to look at because some of my Word docs are culled from the Web, and my e-mail tends to be telegraphic with a few exceptions including the occasional angry rant at Apple for something iTunes screwed up again.

Next, I was given the choice and reading a medical passage, ranging from easy to hard. I chose the hard just to get the most mileage from my spent doing this. It was a typical surgical procedure dictation. This is was about placing a cardiac catheter via a groin stick. I started the video training feature, but it sparked my memory of when I was using the regular version of Pro 9, so I skipped it and proceeded with testing its accuracy.

You have to keep in mind that this is the hardest test for this software, since as you use the software more, it becomes better acclimated to your speech. These both were the first attempts at reading these oncology passages. The first paragraph is what I was reading, and the second in red is what the software transcribed:

In Burkitt's lymphoma, the c-myc oncogene is activated by translocation of genetic material from chromosome 8 to chromosome 14. Chronic myelogenous leukemia (CML) is defined by a reciprocal translocation of the long arms of chromosomes 9 and 22, resulting in the generation of a fusion protein (BCR-ABL) with tyrosine kinase activity.

In Burkitt's lymphoma, the C. MIC on go gene is activated by translocation of genetic material from chromosome 8 to chromosome 14. Chronic myelogenous leukemia (CML) is defined by reciprocal translocation of the long arms of chromosomes and 9 and 22, resulting in the generation of the fusion protein (BCR-ABL) with tyrosine kinase activity.

I picked this passage specifically for "c-myc oncogene" term. What I said was "see mick oncogene," which is the way I would pronounce it if I were giving a talk. From what I understand so far, I can produce a voice macro that will allow me say something like, "Charlie hyphen m-y-c oncogene" and it will produce this term with the proper italicization.

Treatment of HER-2/neu-positive early-stage breast cancer with the combination of chemotherapy and the targeted agent trastuzumab has resulted in striking improvements in outcome so much so that finding this gene not only predicts response to treatment but also a lower risk of recurrence.

Treatment of HER-2/neu positive early-stage breast cancer with the combination of chemotherapy and a targeted agent trastuzumab has resulted in striking improvements in outcome so much so that finding this gene not only predicts response to treatment but also a lower risk of recurrence.

The next passage really impressed me considering the only error was a missed hyphen after "neu" which may really not be needed, and the lack of italicization. Getting "trastuzumab" correct shows the benefit of having a comprehensive medical vocabulary.

In future posts, I'll try to give some more helpful hints for using this software. In the meantime, you can discuss this topic on these speech recognition forums:

ScanSoft

SpeechComputing

VoiceRecognition

KnowBrainer


February 12, 2008

Revolabs' xTag wireless microphone now offered by Nuance for its healthcare speech recognition software

Revolabs From the Revolabs press release:

The xTag wireless microphone's wide band audio combined with HIPAA compliance security [128-bit encryption], complements the Dragon NaturallySpeaking Medical dictation software, which is the most widely used and successful general-purpose speech-enabled clinical documentation solution in the healthcare industry.  Now, medical personnel do not need to be tethered to a computer as the xTag wireless microphone allows for natural mobility and is rechargeable -- eliminating the need to wear a headset or a bulky transmitter.

My experience with using the Nuance Dragon NaturallySpeaking software has been with a headset and a boom mike. If I recall correctly, the manual said that it is important to keep the mike in the same position for good results.  The photos on their Web site show folks with this mike clipped to their shirt pocket. If it produces transcribed documents that are just as accurate, this will be a big jump up in convenience.

Here're the specs and descriptions from Revolabs' Medical Applications Web page.

This vendor's selling it for $250.

February 06, 2008

Speech recognition: will it promote the use of EMRs?

Yes, I think so.

This story on eWeek, "Speech Recognition Makes a Statement in Health Care: Is data entry the hurdle blocking electronic medical records from wide adoption?," is about the Dragon NaturallySpeaking Medical speech recognition software produced by Nuance Communications.

I've been using their professional version for dictating technology and oncology articles with much success. It's amazing how little training is needed.

January 30, 2008

Superlative MacWorld '08 software product uses the Nuance speech recognition engine

Macspeechdictate Glenn Fleishman of Wi-Fi Net News fame is reporting that he was "suitably impressed" with the new MacSpeech Dictate (scroll down to Most Welcome Brain Transplant) speech recognition software demo'ed under the noisy conditions present at the recent MacWorld 08 Expo. Turns out, this "superlative product" uses the Nuance Communications engine which drives the Dragon NaturallySpeaking 9 software that the Windows users have been using for a while now. I was using it satisfactorily with 1.5 GB of RAM on my PC. It is amazing how you can get by with very little time spent training the software.

I would advise using more than 2 GB of RAM, since I did have problems dictating in memory intensive apps like MS Word. Plus, you really have to be careful with the cadence of your dictating, pronouncing each word consistently, and using complete phrases. If you don't, you'll find yourself constantly saying "scratch that," the command for "delete that nonsensical chatter I've been blathering." With a laptop, you might also have to invest in an external sound processor to make it work well.

The software tunes itself to your voice using the headset that comes with the software. Forget about thinking you can transcribe a recorded conversation between two people, although there are portable voice recorders that supposedly doing a good job of transcribing once you get back to your PC.

Pogue of the NY Times gives the background on the deal, along with the usual hammy video he's famous for. Now, if they could only get this to work with a smartphone.

The MacSpeech Dictate Web site.