Playback

For anyone interested in sound and sound recordings

Information

Speech to Text

This group is for anyone interested in speech-to-text and voice recognition systems. Technologies that can convert the spoken word into text are growing in number, from voice-activated commands for smartphones to the conversion of digital speech archives into readable and searchable text.

The particular focus of this group is the conversion of speech archives. It is interested in the competing technologies, cost-effective models for archives and libraries, and the uses of transcribed speech collections (audio and video) for researchers.

Members: 12
Latest Activity: Jun 16

GROUP MATTERS

The Comment section is for general information and conversation about speech-to-text matters. The Discussion Forum will be used to list specific packages and to discuss issues relating to these.

Discussion Forum

Interviewy

Started by Luke McKernan Jan 17. 0 Replies

iMIRACLE by AT&T WATSON

Started by Mari King. Last reply by Mari King May 29, 2013. 2 Replies

Palaver + VoxForge

Started by Luke McKernan May 16, 2013. 0 Replies

CONTENTUS - Next Generation Multimedia Libraries

Started by Mari King. Last reply by Mari King May 7, 2013. 1 Reply

BBC Snippets

Started by Mari King Apr 10, 2013. 0 Replies

Darpa

Started by Luke McKernan Mar 9, 2013. 0 Replies

Luxid

Started by Luke McKernan Mar 6, 2013. 0 Replies

AVAtech

Started by Richard Ranft Mar 5, 2013. 0 Replies

Comment Wall

Add a Comment

You need to be a member of Speech to Text to add comments!

Comment by Luke McKernan on June 16, 2014 at 8:02

Google Glass Offers Disabled People Access to a Bigger World

http://www.usnews.com/news/stem-solutions/articles/2014/06/10/googl...

The photo is a blur. A wide swath of blue – the photographer’s torso or maybe someone else’s – spreads across the left half of the image. A dark square and rectangles of brown, like the open flaps of a cardboard box, fill the right.

As a picture, it’s unremarkable, an image taken apparently at random and perhaps by mistake. But for the photographer, it’s nothing short of momentous.

Ashley Lasanta has cerebral palsy, and for the first time in her 23 years, she was able to snap – and then share – a photograph, all without the use of her hands.

"It was awesome," she says. "I take pictures of just about anything."

The device she used wasn’t a traditional camera. It was Google Glass, the thumb-sized computer that's worn like a pair of glasses. With just a tilt or a nod of the head and a few spoken phrases, Lasanta can record videos, send emails, browse the web far faster than before, play games and, thanks to the wealth of recipes online, hang out in the kitchen and help with cooking ...

[more text]

Comment by Luke McKernan on May 28, 2014 at 8:01

What could be a big leap forward in the application of speech-to-text is being demonstrated by Microsoft, who are testing live video translation for Skype. Demonstration video here:

http://qz.com/214106/watch-skype-translate-a-video-conversation-in-...

Comment by Luke McKernan on February 17, 2014 at 8:10

OK Google

http://techcrunch.com/2014/02/16/ok-google/

Article on Google and voice recognition.

"There can be little doubt that, just like Microsoft thinks touch is the future of computing, Google seems to believe voice will be the user interface of the future..."

Comment by Luke McKernan on January 26, 2014 at 18:38

Interesting piece on using YouTube's automatic transcriptions feature to generate rough transcripts from which to produce more accurate records.

Dirty, Fast, and Free Audio Transcription with YouTube

http://waxy.org/2014/01/dirty_fast_and_free_audio_transcription_wit...

Five years ago, I wrote about how I transcribe audio with Amazon's Mechanical Turk, splitting interviews into small segments and distributing the work among dozens of anonymous people. It ended up as one of my most popular posts ever, continuing to draw traffic and comments every day.

Lately, I've been toying with a free, fast way to generate machine transcriptions: repurposing YouTube's automatic captions feature.

How It Works

Every time you upload a video, YouTube tries to generate a caption file. If there's audible text, you can grab a subtitle file within a few minutes of uploading the video.

But how's the quality? Pretty mediocre! It's about as good as you'd expect from a free machine-generated transcript. The caption files have no punctuation between sentences, speakers aren't broken out separately, and errors are very common.

But if you're transcribing interviews, it's often easier to edit a flawed transcript than starting from scratch. And YouTube provides a solid interface for editing your transcript audio and getting the results in plaintext.

I used TunesToTube, a free service for uploading MP3s to YouTube, to upload the first 15 minutes of our New Disruptors interview, with permission from Glenn Fleishman.

It took about 30 seconds for TunesToTube to generate the 15-minute-long video, three seconds to upload it, and about a minute for the video to be viewable on my account.

It takes a bit more time for YouTube to generate the audio transcriptions. Testing in the middle of a weekday, it took about six minutes to transcribe a two-minute video, and around 30 minutes for the 15-minute video. Fortunately, there's nothing you need to do while it processes. Just upload and wait.

I ran a number of familiar film monologues through the YouTube's transcription engine, and the results vary from solid to laughably bad. I've posted the videos below with the automatic transcription and their actual text.

As you'd expect, it works best with clear enunciation and spoken word. Soft words over background music, like in the Breakfast Club clip, falls apart pretty quick. But some, like Independence Day, aren't terrible.

[See full article for examples]

Comment by Luke McKernan on October 21, 2013 at 17:09

Speaker Diarization Boosts Automatic Speaker Recognition In Audio Recordings

http://www.science20.com/news_articles/speaker_diarization_boosts_a...

An important goal in spoken-language-systems research is speaker diarization - computationally determining how many speakers feature in a recording and which of them speaks when.

To date, the best diarization systems have used supervised machine learning; they're trained on sample recordings that a human has indexed, indicating which speaker enters when. In a new paper, MIT researchers show how they can improve speaker diarization so that it can automatically annotate audio or video recordings without supervision: No prior indexing is necessary.

They also discuss, compact way to represent the differences between individual speakers' voices, which could be of use in other spoken-language computational tasks.

"You can know something about the identity of a person from the sound of their voice, so this technology is keying in to that type of information," says Jim Glass, a senior research scientist at MIT's Computer Science and Artificial Intelligence Laboratory (CSAIL) and head of its Spoken Language Systems Group. "In fact, this technology could work in any language. It's insensitive to that."

To create a sonic portrait of a single speaker, Glass explains, a computer system will generally have to analyze more than 2,000 different acoustic features; many of those may correspond to familiar consonants and vowels, but many may not. To characterize each of those features, the system might need about 60 variables, which describe properties such as the strength of the acoustic signal in different frequency bands.

New algorithm determines who speaks when in audio recordings represents every second of speech as a point in a three-dimensional space. In an iterative process, it then groups the points together, associating each group with a single speaker.

E pluribus tres

The result is that for every second of a recording, a diarization system would have to search a space with 120,000 dimensions, which would be prohibitively time-consuming. In prior work, Najim Dehak, a research scientist in the Spoken Language Systems Group and one of the new paper's co-authors, had demonstrated a technique for reducing the number of variables required to describe the acoustic signature of a particular speaker, dubbed the i-vector ...

Comment by Luke McKernan on August 22, 2013 at 8:12

Nuance’s “talking ads” speak their first words – in Swedish

http://gigaom.com/2013/08/19/nuances-talking-ads-speak-their-first-...

In April, Nuance Communications decided to bring its speech recognition and natural language understanding technology to a new industry: advertising. It created what it called the Voice Ad, a marketing medium that allows mobile users to speak to a digital advertisement on their phones and receive an answer back.

Nuance teamed up with Millennial Media, Jumptap (which are set to merge) and Ad Marvel to bring these new interactive ads to market, but it turns out that a European ad network beat all of them to the punch. Widespace is debuting the first Nuance-powered voice ad in two Swedish media company apps: those of Nordic daily newspaper Expressen and television programming guide Tv24.

There are no details yet on what form the ads will take, but in general Nuance’s voice ads are supposed to be self-contained brand-specific versions of a virtual assistant like Siri. Users can interact with the ads by asking them plain-speech questions. Nuance’s language servers in the cloud interpret the question and provide the appropriate response either via text, video or spoken word.

Nuance and its partners are encouraging brands and their ad agencies to use their traditional spokespeople as the voice blueprint for the ads. So, for instance, if you’re selling insurance and the face of your TV ads is Morgan Freeman, Freeman’s pre-recorded voice can answer your questions within the ad.

Comment by Luke McKernan on August 8, 2013 at 17:26

The Bl's Opening up Speech Archives project concluded last week. We will continue to investigate the best ways in which to assimilate speech-to-text technologies into our discovery systems. A project web page at http://www.bl.uk/reshelp/bldept/soundarch/openingup/speecharchives.... has an overview of the work undertaken. A full report is in preparation.

Comment by Luke McKernan on August 5, 2013 at 10:01

CyberAlert Launches Nationwide Radio Monitoring Service

http://www.businesswire.com/news/home/20130801005948/en/CyberAlert-...

CyberAlert, the all-in-one media monitoring and measurement company, announced today the launch of a comprehensive radio monitoring service for public relations and marketing.

CyberAlert Radio monitors more than 250 news and talk radio stations in the Top 50 U.S. markets. The monitoring covers all local and national news along with local and syndicated talk shows. Using advanced speech-to-text technology, the new radio monitoring service identifies radio clips based on key words specified by CyberAlert’s clients and delivers the text of the radio broadcast. Clients can also order high-quality downloadable audio files of broadcasts from most radio markets.

CyberAlert’s radio monitoring service can be ordered as a stand-alone service or in an integrated package with online, TV news and social media monitoring.

“Radio is the missing component in many media monitoring services, yet it is a channel that heavily impacts public opinion,” said William J. Comcowich, CEO of CyberAlert. “With the addition of radio to our media monitoring services, CyberAlert now truly is the all-media service covering the full scope of online news, broadcast news and social media. Our clients now have the benefit of a fully-integrated and low-cost monitoring and measurement service so they won’t miss a mention, no matter where it occurs.”

Like its other media monitoring services, CyberAlert Radio offers customized key word searches and delivers all clips overnight to the client’s email. CyberAlert also stores each client’s news radio monitoring clips in an online digital clip archive with a full-featured dashboard for clip management with unlimited storage.

More information on radio monitoring and CyberAlert’s online and social media monitoring and measurement services can be found at www.cyberalert.com.

About CyberAlert:

Founded in 1999 as one of the very first SaaP and cloud computing services, CyberAlert (http://www.cyberalert.com) is a worldwide news monitoring, broadcast monitoring, social media monitoring and media measurement service. CyberAlert® 5.0 worldwide online news monitoring service monitors 55,000+ online news sources each day in 250+ languages in 191 countries. The company’s TV broadcast monitoring service monitors the closed caption text and video feed of over 2,100 news programs on over 600 TV stations in all 210 markets in the United States. CyberAlert’s radio monitoring service covers more than 250 radio stations in the top 50 U.S. markets. For social media monitoring, CyberAlert monitors over 75 million blogs worldwide, 100,000 Web message boards and UseNet news groups, over 200 video sharing sites like YouTube, Twitter and Facebook for consumer insight about companies, products, key issues and trends. CyberAlert offers a no-risk 14-day free media monitoring trial for most media monitoring services.

Comment by Paul Wilson on July 26, 2013 at 17:09

A timely Radio 4 programme this morning, Klatt's Last Tapes, had Lucy Hawking exploring the history of speech-to-text's sister technologies - text-to-speech and speech synthesis. Incredibly, it all began back in 1769 (sic) with Wolfgang von Kempelen's mechanical speech synthesizer, a reconstruction of which is heard in the programme. On the BBC iPlayer until 1st August:

http://www.bbc.co.uk/programmes/b03775fy

Comment by Luke McKernan on July 26, 2013 at 17:02

Semantic Media @ British Library

A workshop on semantic media and the challenges of time-based navigation of large collections of media documents is being held at the British Library on 23 September 2013. the event is being organised by the Semantic Media Network of Queen Mary, University of London.

Details of the event and how to book (it is free) are here: http://semanticmedia.org.uk/?q=semanticmedia-at-bl-2013

 

Members (12)

 
 
 

© 2014       Powered by

Badges  |  Report an Issue  |  Terms of Service