JPH086589A

JPH086589A - Telephone line voice input system

Info

Publication number: JPH086589A
Application number: JP6138828A
Authority: JP
Inventors: Takao Watanabe; 隆夫渡辺
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1994-06-21
Filing date: 1994-06-21
Publication date: 1996-01-12
Anticipated expiration: 2012-09-17
Also published as: JP2655086B2

Abstract

PURPOSE:To improve the convenience of an information service by realizing high precision voice recognition by installing the telephone line voice input system on the remote information service system side through a telephone line network. CONSTITUTION:An information service system 1 keeps vocabulary grammar being used in a recognition vocabulary grammar storage section 12. A control section 11 takes out the vocabulary grammar information used at each occasion from the section 12 in accordance with the conversation being progressed between a user and the system and transmits the information to the user's terminal 3 via a telephone line network 2. The terminal 3 stores the received vocabulary grammar information into a vocabulary grammar buffer 22. A voice recognition section 23 reads the information from the buffer 22, recognizes the voice input signals of a user inputted from a voice input section 4 and the recognition result is transmitted back to the system through a control section 21 and the network 2.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、音声により電話回線ネ
ットワークを介して情報サービスシステムとのやりとり
を行うシステムに関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a system for communicating with an information service system by voice via a telephone network.

【０００２】[0002]

【従来の技術】従来の音声認識は、単語を認識単位とす
る方式に基づいていた。このため、音声認識装置の認識
語彙が固定されており、したがって、ユーザが情報サー
ビスシステムとネットワークを介して音声によってやり
とりを行うに際しては、情報サービスシステム側で音声
認識機能を持ち、情報サービスシステム固有の語彙や文
法情報に従って認識を行う方法をとる必要があった。2. Description of the Related Art Conventional speech recognition is based on a system in which words are used as recognition units. Therefore, the recognition vocabulary of the voice recognition device is fixed. Therefore, when the user exchanges voice with the information service system via the network, the information service system has a voice recognition function and is unique to the information service system. It was necessary to adopt a method of recognition according to the vocabulary and grammatical information of.

【０００３】この方法では、以下のような問題がある。
ひとつは、ユーザの音声を電話回線ネットワークを介し
て情報サービスシステムへ送る必要があるため、音声が
歪み音声認識が難しくなるという点である。音声を電話
回線で送る場合には、回線歪等の大きな歪をさけること
ができない。もうひとつの問題は、情報サービスシステ
ムは、多数のユーザの音声を受けて認識処理しなければ
ならないため、不特定話者認識処理をおこなわなければ
ならないという点である。不特定話者認識では、話者に
よっては認識性能が著しく低下するという問題がある。
なお、ユーザ毎に標準パタンを用意することにより性能
を向上させることができるが、情報サービスシステム側
で個々のユーザ毎の標準パタンを用意することは非常に
コストがかかり現実には困難である。This method has the following problems.
One is that it is necessary to send the user's voice to the information service system via the telephone line network, so that the voice is distorted and speech recognition becomes difficult. When sending voice through a telephone line, it is impossible to avoid large distortion such as line distortion. Another problem is that the information service system must receive and perform recognition processing of the voices of a large number of users, and therefore must perform an unspecified speaker recognition processing. In the speaker-independent recognition, there is a problem that the recognition performance is significantly reduced depending on the speaker.
The performance can be improved by preparing a standard pattern for each user, but it is very expensive and actually difficult to prepare a standard pattern for each user on the information service system side.

【０００４】[0004]

【発明が解決しようとする課題】本発明では、上記の問
題点を解決するため、音声認識機能はユーザ端末側に用
意し、認識に必要な語彙文法情報を情報サービスシステ
ム側から電話回線ネットワークを介して送ってもらう。
これにより、ユーザ端末側では、ユーザ専用の標準パタ
ンを用意することにより、高い認識精度の音声認識を実
現することが可能である。こうした音声認識として、例
えば、文献１（情報処理学会第４７回全国大会講演論文
集、ｐ．２−３７５からｐ．２−３７６）に示されたパ
ソコン音声認識ソフトウェアを使用することが可能であ
る。このソフトウェアによれば、パソコン上に、ユーザ
専用の音声標準パタンを持った音声認識が実現できる。In the present invention, in order to solve the above-mentioned problems, a speech recognition function is provided on the user terminal side, and vocabulary grammar information necessary for recognition is transmitted from the information service system side to the telephone line network. Have it sent through.
As a result, on the user terminal side, it is possible to realize voice recognition with high recognition accuracy by preparing a standard pattern exclusively for the user. For such speech recognition, it is possible to use, for example, personal computer speech recognition software described in Document 1 (Proceedings of the 47th Annual Conference of the Information Processing Society of Japan, pp. 2-375 to pp. 2-376). . According to this software, it is possible to realize the voice recognition having the voice standard pattern dedicated to the user on the personal computer.

【０００５】[0005]

【課題を解決するための手段】本発明の電話回線音声入
力システムは、情報サービスシステムとユーザ端末とが
電話回線により接続され、該ユーザ端末から該情報サー
ビスシステムに対し音声により指示を行う電話回線音声
入力システムにおいて、前記情報サービスシステムは、
語彙文法を格納する語彙文法記憶部を備え、前記ユーザ
端末は、前記情報サービスシステムより前記語彙文法を
受け取り格納する語彙文法バッファと、該語彙文法を用
い音声認識を行う音声認識部とを備えることを特徴とす
る。A telephone line voice input system of the present invention is a telephone line in which an information service system and a user terminal are connected by a telephone line, and the user terminal gives a voice instruction to the information service system. In the voice input system, the information service system,
A vocabulary grammar storage unit for storing a vocabulary grammar, the user terminal includes a vocabulary grammar buffer for receiving and storing the vocabulary grammar from the information service system, and a voice recognition unit for performing voice recognition using the vocabulary grammar. Is characterized by.

【０００６】[0006]

【作用】認識に用いられる語彙や文法は、情報サービス
のアプリケーションによって、また、そのアプリケーシ
ョンの中の場面に応じて変わる。例えば、最初は、どの
ようなサービスを選ぶかを選択するための語彙文法が用
いられ、ついでサービス内容が決まったら、サービスに
応じて入力文の語彙文法が変わる。情報サービスシステ
ムは、サービスの選択等の制御に加えて、場面に従って
認識用語彙文法の切り替えを制御し、ユーザ端末側へ認
識用語彙文法情報を送信する。ユーザ端末側では、音声
認識を作動させるに際して、場面毎に送信されてきた文
法語彙に従って認識動作が行われるように制御する。The vocabulary and grammar used for recognition vary depending on the application of the information service and the situation within the application. For example, at first, a vocabulary grammar for selecting what kind of service is selected is used, and then, when the service content is decided, the vocabulary grammar of the input sentence changes depending on the service. The information service system controls the switching of the recognized vocabulary grammar according to the scene in addition to the control of the service selection and the like, and transmits the recognized vocabulary grammar information to the user terminal side. When the user terminal activates the speech recognition, it controls so that the recognition operation is performed according to the grammar vocabulary transmitted for each scene.

【０００７】文法語彙の情報は、単語の発音を表す単語
辞書情報と、どのような単語の並びを文として入力可能
であるかを表す単語の並びに関する文法情報から成る。
音声入力として、文発声を許容せず離散単語発声のみを
許容する場合は、単語の並びに関する文法情報は不要で
ある。[0007] The grammar vocabulary information includes word dictionary information indicating pronunciation of words and grammatical information on word sequences indicating what word sequences can be input as sentences.
In the case where only discrete word utterances are allowed as sentence utterances without utterance utterance, grammatical information on word arrangement is unnecessary.

【０００８】ユーザ端末側の音声認識部としては、音素
や音節のように、これらの組み合わせにより任意の単語
を表現できるユニットを認識の単位とした方式に基づく
ものでなければならない。文献１に示された音声認識は
こうした方式のものの一例である。この場合には、語彙
文法情報を情報サービスシステムから受けとり、これを
用いた認識が可能である。従来、一般に使われてきた単
語を単位として単語の標準パタンを登録する方式の音声
認識は本用途には使えない。音声認識部は、ユーザ側に
設置されるのでユーザ用に標準パタンを用意することが
可能である。これにより認識性能を向上させることが可
能であることが知られており、情報サービスシステムの
側に不特定話者認識装置を設置するより高性能の音声認
識を実現できる。また、情報サービスシステム側の音声
認識装置にユーザ用の標準パタンを実現しようとすると
高価となるが、これを避けることができる。The speech recognition unit on the user terminal side must be based on a system in which a unit that can express an arbitrary word by a combination of these, such as a phoneme or a syllable, is a unit of recognition. The speech recognition shown in Document 1 is an example of such a system. In this case, it is possible to receive vocabulary grammar information from the information service system and perform recognition using this. Conventionally, speech recognition of a method of registering a standard pattern of words in units of commonly used words cannot be used for this purpose. Since the voice recognition unit is installed on the user side, it is possible to prepare a standard pattern for the user. It is known that this makes it possible to improve recognition performance, and it is possible to realize higher-performance voice recognition than installing an unspecified speaker recognition device on the information service system side. Further, it is expensive to implement a standard pattern for a user in the voice recognition device on the information service system side, but this can be avoided.

【０００９】語彙文法情報は、図２に示されるように、
例えば、単語辞書情報については、単語のかな漢字表記
とカナ文字表記で与え、文法情報については、単語名を
アークとする有限状態ネットワークで与えられる。The lexical grammar information is, as shown in FIG.
For example, word dictionary information is given in Kana-Kanji notation and Kana-character notation of words, and grammar information is given in a finite state network using word names as arcs.

【００１０】[0010]

【実施例】本発明による実施例を図１に示す。１は情報
サービスシステムであり、使用する語彙文法を認識用語
彙文法記憶部１２に保持する。制御部１１は、ユーザと
システムとの対話の進行場面に従って、各場面で用いら
れる語彙文法情報を語彙文法記憶部１２から取り出し、
電話回線ネットワーク２を経由してユーザ端末３へ送出
する。ユーザ端末３は受信した語彙文法情報を語彙文法
バッファ２２に格納する。音声認識部２３は、語彙文法
バッファ２２から語彙文法情報を読みだし、音声入力部
４から入力されたユーザの音声入力信号を認識し、認識
結果を制御部２１、電話回線ネットワーク２を経由して
情報サービスシステム１へ送る。情報サービスシステム
１は認識結果に応じた処理、応答を行い、場面を進め
る。以上の処理がサービス完了まで繰り返される。FIG. 1 shows an embodiment according to the present invention. An information service system 1 stores a vocabulary grammar to be used in a recognition vocabulary grammar storage unit 12. The control unit 11 retrieves the vocabulary grammar information used in each scene from the vocabulary grammar storage unit 12 according to the progress scene of the dialogue between the user and the system,
It is sent to the user terminal 3 via the telephone line network 2. The user terminal 3 stores the received vocabulary grammar information in the vocabulary grammar buffer 22. The voice recognition unit 23 reads the vocabulary grammar information from the vocabulary grammar buffer 22, recognizes the voice input signal of the user input from the voice input unit 4, and outputs the recognition result via the control unit 21 and the telephone line network 2. Send to information service system 1. The information service system 1 performs processing and response according to the recognition result, and advances the scene. The above process is repeated until the service is completed.

【００１１】[0011]

【発明の効果】以上説明したように、本発明は、音声認
識用の語彙文法情報を音声認識部とは切り放し、電話回
線ネットワークを介して遠隔の情報サービスシステム側
におくことによって、高い精度の音声認識を実現するこ
とができ、これにより、電話回線ネットワークを介した
情報サービスの利便性を向上させることができる。As described above, the present invention separates the vocabulary grammar information for speech recognition from the speech recognition unit and places it on the remote information service system side via the telephone line network, thereby achieving high accuracy. Voice recognition can be realized, thereby improving the convenience of an information service via a telephone line network.

[Brief description of drawings]

【図１】本発明の一実施例を示すブロック図。FIG. 1 is a block diagram showing an embodiment of the present invention.

【図２】語彙文法の例。FIG. 2 shows an example of a lexical grammar.

[Explanation of symbols]

１情報サービスシステム２電話回線ネットワーク３ユーザ端末４音声入力部１１制御部１２語彙文法記憶部２１制御部２２語彙文法バッファ２３音声認識部 REFERENCE SIGNS LIST 1 information service system 2 telephone line network 3 user terminal 4 voice input unit 11 control unit 12 vocabulary grammar storage unit 21 control unit 22 vocabulary grammar buffer 23 voice recognition unit

Claims

[Claims]

1. A telephone line voice input system in which an information service system and a user terminal are connected by a telephone line, and the user terminal gives voice instructions to the information service system, wherein the information service system uses a vocabulary grammar. A vocabulary grammar storage section for storing the vocabulary grammar storage section, wherein the user terminal includes a vocabulary grammar buffer for receiving and storing the vocabulary grammar from the information service system; Telephone line voice input system.