Apple files patent for an audio interface for the iPod

Posted by Dennis Sellers Apple ico May 4, 2006 at 4:47am

imageOn May 4, the US Patent & Trademark Office revealed an Apple patent titled “Audio user interface for computing devices,” originally filed in November 2004.

Patent FIG. 8 illustrates a media item that includes a header, an audio tag, and a media file, according to one embodiment of the present invention.

Brief summary of the invention

The present invention is directed to an audio user interface that generates audio prompts that help a user navigate through the features of a computing device. The audio prompts provide audio indicators that allow a user to focus his or her visual attention upon other tasks such as driving an automobile, exercising, or crossing a street. In one embodiment the computing device is a media player (e.g., a portable audio device). In some embodiments, the computing device is a hand-held device that may have a scaled-down computer architecture that facilitates the device’s portability.

One aspect of the present invention pertains to techniques for providing the audio user interface by efficiently leveraging the computing resources of a host computer system. The relatively powerful computing resources of a host computer system create audio files based upon text strings that are then transferred to a smaller computing platform, such as a hand-held device. The host computer system performs the process intensive text-to-speech conversion so that the computing device only needs to perform the less intensive task of playing the audio file. This approach of utilizing the host computer system in addition to the computing device allows for increased quality for the text-to-speech conversions and helps reduce not only the computational requirements but also the size and weight of hand-held computing devices.

As a method

As a method, one embodiment of the present invention relates to providing an audible user interface for a user of a hand-held device. The method includes at least receiving a selection of a user interface control on a hand-held device, selecting an audio file associated with the selected user interface control, and playing the selected audio file such that an audio prompt is audiblized (i.e., aurally presented) for the user. The audio prompt describes the selected user interface control or a displayed user interface item corresponding to the selected user interface control.

As a method, an alternative embodiment of the present invention relates to creating an audio file at a host computer system. This method includes at least receiving a text string at a text to speech conversion engine, creating an audio file based upon the text string, and associating the audio file to a media file.

As a method, an alternative embodiment of the present invention relates to providing an audio user interface. This method includes at least creating, at a host system, an audio file based upon a text string, downloading the audio file from the host system to a hand-held device, selecting a user interface element on the hand-held device pertaining to the text string, and playing the audio file such that an audio prompt is made audible to a user.

As an apparatus

As an apparatus, one embodiment of the present invention relates to a hand-held device that includes at least a user interface having a plurality of user interface controls and at least one menu that contains one or more menu items, a communications port for receiving audio files created by a host computer system, the audio files describing at least one of the user interface controls or one of the menu items, a memory that stores the audio files, and a user interface control module that plays one of the audio files in response to a user selection of one of the user interface controls.

Excerpts from “Detailed description of the Invention”

The present invention pertains to an audio user interface that generates audio prompts that help a user interact with a user interface of a computing device. The audio prompts provide audio indicators that allow a user to focus his or her visual attention upon other tasks such as driving an automobile, exercising, or crossing a street, yet still enable the user to interact with the user interface. As examples, the audio prompts provided can audiablize the spoken version of the user interface selection, such as a selected function or a selected (e.g., highlighted) menu item of a display menu. The audio prompts are produced by voice generation techniques, which are also referred to as speech feedback techniques.

The computing device can be various types of devices such as, but not limited to, media players, mobile phones (e.g., cell phones), personal hand-held devices, game players, video players, digital cameras, and digital video cameras. The computing device can be a hand-held device (e.g., a portable music player) or a stationary device (e.g., a personal desk computer).

In alternative embodiments, media player 102 may be computing devices that are not specifically limited to playing media files. For example, media player 102 can also be a mobile telephone or a personal digital assistant. The types of media transferred between personal computer 104 and media player 102 can also take the form of text files and any type of content that can be digitally stored on a computer.

image

TTS: Text-to-Speech engine

The text-to-speech conversion engine 214 is a software module that converts text strings into audio files that can be played to generate a user interface audio prompt that audioablizes (verbalizes) a respective text string. Such text-to-speech (TTS) engines can use various techniques for creating the audio files. For example, some algorithms use a technique of breaking a word down into fragments or syllables for which a certain sound is then designated. Then, a word can be verbalized through combining individual sounds. In the case where the media content pertains to music, these text strings may, for example, correspond to song titles, album names, artist names, contact names, addresses, phone numbers, and playlist names.

The audio file database 216 stores audio files that are generated by the TTS engine 214. In some embodiments, the audio files may be additionally or alternatively stored in media database 208. For example, audio files that are attached to associated media files can be conveniently stored together in media database 208.

The media database 210 has a number of media files and playlist files, which are used to classify, identify and/or describe media files in the media database 210. The media files can be, for example, song files. Each song file may contain media information that describes each song file. The media information might include, for example, the names of songs, the artist, the album, the size of the song, the format of the song, and any other appropriate information. Of course, the type of information may depend on the type of media. A video file might additionally have director and producer fields, but may not use the album field. In typical embodiments of media player 202, media files are non-editable when located within media player 202.

image

The audio UI process

FIG. 4 illustrates a flow diagram of a process 300 for providing a hand-held device with an audio user interface according to one embodiment of the present invention. The process 300 generally involves creating audio files at a host computer system, loading the audio files into a hand-held computing device (e.g., media player), and then playing the audio files when appropriate at the hand-held device.

where a media player is connected to a host computer system. As shown in FIGS. 1 and 2, a media player can be connected to a host computer system through a cable such as a FireWire or USB cable. In alternative embodiments, the connection can be through a wireless communications protocol. Then, in block 304, a synchronization process is performed between the media player and the host computer system. Media files and text string (or audio files) stored on the media player and host computer system are compared. Based on the comparison, appropriate files or text strings are copied between the media player and the host computer system. Hence, in block 304, not only are media files synchronized between the different platforms, but also text strings (or audio files) are synchronized between the different platforms. In one embodiment, text strings resident on the media player that require conversion into audio files can be uploaded into the host computer system for conversion.

In block 306, a text-to-speech (TTS) conversion engine at the host computer system converts text strings to audio files. The newly created audio files are stored at the host computer system and are also made ready for downloading onto the media player. The audio files are typically stored in the audio file database 216, but can also be stored in the media database 208, as shown in FIG. 2.

Next, in block 308, the audio files that have been created are downloaded into the media player from the host computer system. The audio files are typically stored in the audio file database 218, but can also be stored in the media database 210, as shown in FIG. 2. In one embodiment, a user can configure the extent to which audio files are created and/or downloaded. For example, a user can designate that all new audio files present at the host computer system be automatically downloaded into the media player. Alternatively, a user can manually select which of the newly generated audio files are to be downloaded. The downloading of audio files may cause pointers or lookup tables that store or reference the audio files at the media player to be updated. The process of downloading the audio files provides media player 202 with audio files that can be played by the user interface control module 220 to guide a user with user interface audio prompts. The audio files can be of higher quality since they are generated on the host computer system 204, which can support a more robust TTS engine 214 than could the media player 202, thereby enabling a richer user experience and seamless use.

image

In block 310, the media player is thereafter optionally disconnected from the host computer system so that the user can then freely use the media player without confinement to the personal computer 204. In block 312, the media player 202 plays the audio files in response to the user’s interaction (e.g., navigation) through the audio user interface. The process 300 for providing an audio user interface can be repeated each time the media player is reconnected to the host computer system.

Patent FIG. 6 illustrates a process 500 for creating audio files at a host computer system according to one embodiment of the present invention.

Process 500 beings at block 502 where host computer system retrieves configuration settings for a text-to-speech conversion process. The configuration settings can control various aspect of the text-to-speech conversion process. For example, the configuration settings can determine certain text strings to be converted into audio files, quality of the TTS conversions, gender of the voice that verbalizes the text strings, the speed at which an audio prompt is audiblized (e.g., a speaking rate can be increased as the user gets more familiar with the audio prompts), and customizing voices to different subtasks (e.g., the controls and function can be audiblized with one voice while data such as songs and contact names can be audiblized with a different voice). Furthermore, a configuration setting can handle adept manipulation of user interface controls by playing only a part of an audio prompt as a user navigates. For example, while browsing through contact names lexicographically, only the letter (a, b, c . . .) is rendered until the user reaches the contact name that start with a desired letter. For example, j, as in Jones.

Secondly, a user can input the text strings directly into the host computer system or into the media player. For example, a user can input text corresponding to a new playlist name or text relating to a new contact. The text relating to a new contact can pertain to information about the contact such as a person’s name, address, phone number, email address, and other related contact information. A user may also desire to enter textual descriptions for a media file, for example, a song title, album name, artist name, or a comment.

Test strings that require TTS conversion can be entered directly into a media device, for example, when a media device contains its own user input device. For example, some media players or hand-held devices, such as a mobile phone or PDA’s, have their own keypad for entering alphanumeric characters. Such text strings can be identified by the host computer system as requiring audio files so that a user interface audio prompt can be incorporated into an audio user interface at the media player or other hand-held device.

A text string can be a single word, phrase, or single letters and/or numbers. Various sound synthesizer rules and engines can be used to generate the audio file. A generalized example of a process for converting a word into an audio file can operate as follows. The process for converting the word “browse” begins by breaking the word into fragments that represent diphone units or syllables, such as “b” “r” “ow” “s”. Then various techniques generate audio prompts for each component, which can then be combined to form an intelligible word or phrase. The audio file is typically given an extension that corresponds to the type of audio file created. For example, the audio file for “browse” can be identified by a browse .aiff filename, wherein the .aiff extension indicates an audio file.

image

It is noted that text strings that correspond to standard text strings can have pre-recorded audio files. Such text strings may correspond to common user interface controls, such as “play”, “stop”, “previous”, etc., and to common menu items such as “Music”, “Extras”, “Backlight.” These audio files can be created using a voice talent or speech synthesized from the voice talent’s recordings. The other text displayed as part of the media player user interface that is usually user specific, such as contacts and customized playlist names can all be synthesized by building a voice from the voice talent recordings. This provides consistency by having the same voice for all textual data to be presented to the user.

Audio prompts

FIG. 10 illustrates a flow diagram 900 that describes a process for generating audio prompts that guide a user through a user interface according to one embodiment of the present invention. The process 900 begins at block 902 where a user makes a user interface control selection while navigating through the user interface of the media player. For instance, a user can make a control selection by using one of the user interface control as shown in FIG. 3 (e.g., select button 259 or previous button 264). Some of the control selections will cause a cursor to highlight different menu items in the display screen 250.

In some embodiments, control selections are accompanied by an audio prompt that confirms the selection to the user. For example, “play” can be audiblized to the user to provide feedback that the play/pause button 266 was actually depressed. These embodiments may involve a repeated user action to make a user interface control selection. For example, a user would make multiple “clicks” of a user interface control to make the selection. A first “click” would cause the hand-held device to audiblize the selected user interface control. For example, “play” would be audiblized when a user presses the play button. This first audio prompt provides audio guidance as to which button has been depressed, which is helpful to a user when not directing visual attention upon the hand-held device. A subsequent “click” would then cause the hand-held device to perform the action corresponding to the user interface control. Continuing with the example, a media file will then be played. On the other hand, the audio prompt may have informed the user that an unintended selection is about to be made. Therefore, the user can attempt to select a different user interface control. For example, the user may then attempt to press a “next” button 262, rather than proceeding to press the play button 266 for a second time.

Neo’s Note: This sounds a lot like the way Nike was using Apple’s technology in their phone patent.

At block 910, the audio prompts are played according to the selected audio interface mode. When a media player is not playing an audio file, only audio files corresponding to the user interface are played and made audible to the user. However, when a media file is being played back, the audio interface mode can be set to mix the media file and audio file playback in different manners. According to one setting, the volume for playing back a media file is reduced when an audio prompt is to be played. For example, the volume for playing back a song or a movie clip is lowered during the playback of the audio prompt. According to another setting, playback of a media file is paused during the playback of an audio prompt and then restarted after the audio prompt has been played. If the process 900 detects that a user is making multiple user control selections in a certain time frame, the playback of the media file can be paused for a short period of time so that the playback of the media file need not be paused and restarted multiple times. This can avoid a repeated interruption of a song’s playback. For instance, playback of a media file can be paused for five seconds if a user makes at least three user control selections within 5 seconds. The time periods and number of user control selections may vary depending upon a user’s preference. Some audio interface modes can designate that the audio prompts be played through a left, right, or both speakers or earphone channels.

Notice

Macsimum News presents only a brief summary of patents with associated graphic(s) for journalistic news purposes as each such patent application and/or grant is revealed by the U.S. Patent & Trade Office. Readers are cautioned that the full text of any patent applications and/or grants should be read in its entirety for further details.

If you have an opinion about this patent report, please contact me at neo@macsimumnews.com

Related patent material

“Apple’s Roadmap: Voice-activated media management system for the iPod”

“Apple files patent for wireless iPod with ringtones, micro browser”



Leave a comment ⇒

Please post the article topic & comment in our forums. No registration required.








Article Information

Comment on this Article Print this Article Email this Article Digg This

Contributor

Contributor

Dennis Sellers

Dennis has been a newspaper editor/reporter (seven years) and teacher (seven years). He has over 10,000 magazine, newspaper and online articles to his credit.  He has also covered the Mac and tech industries for over a decade for such online publications as MacCentral, MacMinute and now MacsimumNews.

Recent Articles