TW394926B - Speech recognition system employing multiple grammar networks - Google Patents
Speech recognition system employing multiple grammar networks Download PDFInfo
- Publication number
- TW394926B TW394926B TW87116807A TW87116807A TW394926B TW 394926 B TW394926 B TW 394926B TW 87116807 A TW87116807 A TW 87116807A TW 87116807 A TW87116807 A TW 87116807A TW 394926 B TW394926 B TW 394926B
- Authority
- TW
- Taiwan
- Prior art keywords
- candidates
- grammatical
- recognition
- text
- patent application
- Prior art date
Links
Landscapes
- Machine Translation (AREA)
Abstract
Description
A7A7
五、發明説明(/ ) 經濟部中央標準局員工消費合作社印袋 本案部份地是美國專利申請序號:〇8/642,766,1996 年5月6日建檔,標題"採用連續語音的呼叫路由裝置”,申 請人 Jean-Claude Junqua 和 Michae 丨 Galler,之連續。 本發明一般關於電腦製作的語音辨識。尤其是,本發 明關於使用多重語法網絡處理語音資料之方法和裝置》多 重網絡之使用導致語音資料的不同分段,以便利從不具有 實用性的語音抽取具有實用性的語音。 雖然本發明具有許多用途,此處說明適於電話呼叫路 由應用型式之一種所拼名稱辨識系統。在展示的實施例 中,第一和第二組語法網絡被使用以分別地檢測N組最佳和 Μ組最佳文字序列。第一組語法網絡假設使用者反應於系統 提示將即時地開始拼音而被組態。第二語法網絡假設所拼 名稱文字序列以系統無法辨識的外來雜訊或者語調開始而 被組態》Ν組最佳和Μ組最佳文字序列分別地接受對於有效 名稱字典之動態規劃匹配,以抽取對應至Ν組最佳和Μ組最 佳文字序列之各組的Ν組最佳和Μ組最佳名稱假設。接著從 這些名稱假設集選擇最佳候選者而形成辨識決定。 目前的語音辨識技術涉及語音資料中樣型的辨識以及 那些樣型與系統所辨識的一組預定字典登入集之相關性。 語音辨識問題因爲有許多不同的變形而相當地具有挑戰 性。一般而言,語音辨識器將數位形式的進入語音資料應 用至依據預定模式轉換該數位資料成爲參數之數學辨識程 序。 習見地,該模式被使用足夠大訓練集合預先加以訓練 本紙張尺度適用中國國家揉準(CNS ) Α1規格(210Χ297公釐) (請先聞讀背面之注意事項再填寫本頁)V. Description of the Invention (/) Printed bags of the Consumer Cooperatives of the Central Standards Bureau of the Ministry of Economic Affairs This case is part of the US patent application serial number: 08 / 642,766, filed on May 6, 1996, title " continuous call routing "Apparatus", applicants Jean-Claude Junqua and Michae 丨 Galler, continuous. The present invention generally relates to computer-made speech recognition. In particular, the present invention relates to a method and device for processing speech data using a multi-grammar network. The use of multiple networks leads to Different segments of voice data to facilitate the extraction of practical speech from speech that is not practical. Although the present invention has many uses, a spelled name recognition system suitable for telephone call routing application types is described here. In the embodiment, the first and second sets of grammatical networks are used to detect the N best and M best text sequences, respectively. The first set of grammatical networks is assumed that the user will immediately start pinyin in response to a system prompt Configuration. The second grammatical network assumes that the spelled name text sequence is external noise or intonation that the system cannot recognize. Started and configured> Best group of N and best text sequences of group M respectively accept dynamic programming matching for a valid name dictionary to extract groups of N corresponding to each group of best and best text sequences of group N Best and M group best name hypotheses. Then select the best candidate from these name hypothesis sets to form a recognition decision. Current speech recognition technology involves the identification of patterns in speech data and a set of patterns and systems recognized by the system Relevance of predetermined dictionary login sets. Speech recognition problems are quite challenging because of many different variants. Generally speaking, speech recognizers apply digital form of incoming speech data to transform the digital data into parameters according to a predetermined pattern. Mathematical identification procedure. It is customary that this model is pre-trained with a sufficiently large training set. The paper size is applicable to the Chinese National Standard (CNS) Α1 specification (210 × 297 mm) (please read the precautions on the back before filling this page)
1 A7 —______B7五、發明説明(2 ) 經濟部中央標準局員工消費合作社印製 以至於分別的說話者變異被大幅地減低。模式爲主的辨識 程序區分進入的資料成爲基礎構件,例如音素,它們接著 利用與被訓練的模式比較而被標示。在一型的辨識器中, 一旦分別的音素被標示,則該等音素資料與預先儲存於系 統字典中的字語比較。這比較經由將容納由於不正確音素 辨識以及在所給予的序列之內音素的插入和刪除之不精確 匹配的對齊處理程序而達成。該系統依據機率基礎工作》 習見地,語音辨識器將選擇導自先前說明的分段、標示和 對齊處理程序之最可能字語候選者。 由於他們的性質,目前的語音辨識器從預先定義的字 典選擇字語候選者並且因此它們將僅辨識預先定義的字 語。這形成一個問題,尤其是在依據語音辨識結果而達成 進一步地決定的系統中。外來的雜訊或者在字典中不被發 現的字語之語音語調時常被不正確地翻譯爲在字典中所發 現的字語。依據此種不正確辨識的後續決定可能導致錯誤 的系統性能。 爲了展示此問題,考慮一種所拼名稱電話呼叫路由應 用。使用者被合成聲音指示拼出呼叫應該被引導之人員的 名稱。如果使用者遵從這些指令,則語音辨識器確認各個 被說出的文字並且接著可以與字典對齊文字順序而找尋所 拼名稱。該系統接著使用在字典中所發現路由資訊而將呼 叫引導至適當的延伸。但是如果使用者先發出外來的資 訊,例如在拼音之前發音人員名稱,則該辨識程序極可能 失敗。這是因爲該辨識系統僅預期接收一系列被說出的文 (請先閱讀背面之注意事項再填寫本頁) 訂 本紙張尺度適用中國國家標準(CNS ) A4规格(210X297公釐) A7 _____ B7 五、發明説明(3 ) 字並且將嘗試”辨識"所說的名稱爲一組_或者多組文字。習 見的系統不具備以適當地區分進入的語音資料,因爲製造 系統的基礎模式假設所有的資料是對於系統有用的或者有 意義的等效單元(所說的文字)。 本發明經由採用並且結合多重語法網辂以產生多組的 辨識候選者,有些依據假設外來語音存在之一組模式,並 且有些依據假設外來語音不存在之一組模式的一組語音辨 識系統而解決上述問題。該兩組模式之結果被使用以依據 分別的匹配機率計分而選擇最可能候選者以提供最後的辨 識決定》 經濟部中央標準局貝工消費合作社印製 依據本發明的一組論點,語音資料被以導致不同的語 音資料區分之不同的第一和第二語法網絡分別地處理》以 此方式,該系統將從不具有實用性之語音抽取具有實用性 的語音。對於各語法網絡,多數個辨識候選者被產生。較 佳實施例使用一組第一語法網絡產生N組最佳候選者並且使 用一組第二語法網絡產生Μ組最佳候選者,其中N和Μ是大 於一的整數並且可以是相同。第一和第二多數個辨識候選 者(Ν組最佳,Μ組最佳)依據至少一組對於具有實用性之語 音的先前限制而被轉換。該轉換可包含,例如,匹配候選 者至被系統所辨識的所拼名稱之字典。辨識決定接著取決 於轉換辨識候選者。 如將更完全地被說明於下,本發明將語音資料分割爲 兩組或者更多組各被不同地處理之通道。—組通道使用依 據僅供應實用性語調(例如文字)的假設之第一語法網絡而被 _' _6 本紙張尺度適用中國國家標準(CNS ) Α4規格(210X297公釐) A7 _________B7五、發明説明u ) 經濟部中央標隼局員工消費合作社印製 處理。另一通道使用假設在有用的語音之前有外來、無用 的語音之一種不同的語法網絡而被處理。不同的語法網絡 因此導致不同的資料區分》 被各通道所產生之辨識候選者可依據各候選者匹配分 別模式的程度而各被計分。不需要兩組通道在這步驟比較-以便選擇具有最高計分的單一候選者-兩組辨識候選者被保 持分離。在這步驟中,辨識候選者代表N組最佳以及Μ組最 佳文字順序假設。爲了選擇最佳候選者的假設•該兩組被 分別地匹配至被系統辨識之所有名稱的字典。 該字典其實是有關對於系統具有實用性之語音的先前 限制之一組集合。因此某種文字順序假設可被計分爲較不 可能,因爲那些文字序列與儲存在字典中的文字序列匹配 不良。本較佳實施例使用Ν組最佳以及Μ組最佳文字序列以 從字典選擇Ν組最佳以及Μ組最佳名稱。因此來自兩組通道 的貢獻被包含在決定形成的程序中。最後,Ν組最佳以及Μ 組最佳名稱可被組合以形成輸入語調被應用之字典候選者 的簡化集合。 這減低尺度的字典可被使用以建立從Ν組最佳以及Μ組 最佳名稱候選者所建立之一組動態語法。這動態語法將依 據輸入語調是否包含外來的語音而傾向偏好一組候選者或 者其他一組。如果有外來的語音存在,被設定以確認以及 排斥外來語音之語法網絡將傾向於產生較佳的辨識結果’ 並且這些結果將被反映爲從Ν組最佳和Μ組最佳名稱候選者 所構成之動態語法中的較佳候選者。另一方面,如果沒有 ^ 14-- (請先閲讀背面之注意事項再填寫本頁) ,1Τ 象. 本紙張尺度適用中國國家標準(CNS ) Α4規格(210Χ297公釐) 經濟部中央標準局員工消費合作社印製 A7 ______B7 五、發明説明(3·) 外來的語音存在,另一語法網絡將產生較佳的辨識結果, 並且它將被反映爲在動態語法中的較佳候選者。 一旦動態語法被構成,輸入語音資料可依據動態語法 使用一組辨識器以抽取單一最有可能的名稱候選者作爲辨 識名稱而被處理。該辨識名稱接著被使用以存取一組適當 的資料庫而適當地引導電話呼叫。 爲了更完全地了解本發明、其目的和優點,可參考下 面的說明以及附圖。 第1圖是使用本發明之呼叫路由元件的範例系統之方塊 圚; 第2圖是本發明之呼叫路由元件的一種範例實施例之方 塊圖; 第3圖展示假設所拼名稱文字順序以有效文字開始而被 組態的語法網絡G1之狀態圖; 第4圖展示假設所拼名稱文字順序以外來的雜訊或者不 被系統辨識之語調開始而被組態之語法網絡〇2的狀態圖; 第5圖是本發明之較佳辨識系統的詳細方塊圖; 第6圖是展示不同型式的辨識錯誤之圖形; 第7圖是展示PLP-RASTA濾波器係數之最佳化以減少替 代、刪除以及插入錯誤之數目的圖形; 第8圖是展示改進的格子式N組最佳技術之圖形; 第9圖是進一步地描述在辨識的往回追蹤步驟時如何達 成假設產生之圖形。 本發明之原理將以一組呼叫路由元件加以展示並且說 本紙張尺度適用中國國家標準(CNS ) A4規格(210X297公釐) 11-丨·----Λ------IT------A (請先閲讀背面之注意事項再填寫本頁) A7 ____;_B7_五、發明説明(么) 經濟部中央標準局員工消費合作社印製 明•其提示使用者利用逐字地拼出名稱進入系統以供應呼 叫路由資訊。因此,爲了幫助了解該-語音辨識系統,將先 提供可採用語音辨識系統之一組呼叫路由元件的摘要說 明。但是,應該注意到,本發明之語音辨識系統不受限制 於呼叫路由元件。反之,該辨識系統可應用於有用的語音 必須從外來的雜訊或者不是有用的語音中被抽取之廣大範 圍的不同應用中。 系統綜觀和基本的操作 採用連續語音辨識的呼叫路由元件將被展示於一組實 施例中,其適用於現有的PBX交換機之隨插即用的連接, 或者在製造時被結合至PBX設備。參看至第1圖,PBX交換 機2 10被以習見的裝置,例如電話線214,連接到電話網絡 下部構造212。爲了方便起見,在所示實施例中展示三組線 路。這不是有意作爲本發明的限制,因爲本發明可以被使 用在具有更多或者更少數目之電話線路的系統中。 該PBX交換機是習見的設計,可將來自網絡212的進入 呼叫引導至任何被選擇的電話元件,例如話機216。本發明 之拼出名稱辨識呼叫路由器218被連接,如同話機216被連 接,至PBX交換機2 10上的另外延伸或者出入埤。如將被更 完全地討論,本較佳實施例經由攜帶聲音交通的多數條線 路220並且經由攜帶允許呼叫路由器與存在PBX系統整體地 工作之控制邏輯信號的另外線路222而連接至PBX交換機。 第2圖更詳細地展示呼叫路由器218» PBX交換機210以 及線路220和222也同時被展示。依據PBX系統之結構,呼· 本紙張尺度適用中國國家標準(CNS ) Α4規格(210Χ297公釐)1 A7 —______ B7 V. Description of the invention (2) Printed by the Consumer Cooperatives of the Central Standards Bureau of the Ministry of Economic Affairs, so that the speaker variation is greatly reduced. Pattern-based recognition programs distinguish incoming data as basic building blocks, such as phonemes, which are then labeled using comparisons with trained patterns. In a type 1 recognizer, once the respective phonemes are marked, the phoneme data is compared with words previously stored in the system dictionary. This comparison is achieved via an alignment process that will accommodate inexact matches due to incorrect phoneme recognition and phoneme insertion and deletion within the given sequence. The system works on a probabilistic basis. It is customary that the speech recognizer will select the most probable word candidate derived from the segmentation, labeling, and alignment processing procedures described previously. Due to their nature, current speech recognizers select word candidates from a pre-defined dictionary and therefore they will only recognize pre-defined words. This poses a problem, especially in systems that make further decisions based on speech recognition results. Alien noise or the intonation of words not found in the dictionary are often incorrectly translated into words found in the dictionary. Subsequent decisions based on such incorrect identification may lead to erroneous system performance. To illustrate this problem, consider a phone call routing application with a spelled name. The user is synthesized with a voice indicating the name of the person who should be directed to call. If the user follows these instructions, the speech recognizer confirms each spoken text and can then align the text order with the dictionary to find the spelled name. The system then uses the routing information found in the dictionary to direct the call to the appropriate extension. However, if the user sends out external information first, such as the pronunciation of a person's name before pinyin, the recognition process is likely to fail. This is because the recognition system only expects to receive a series of spoken texts (please read the notes on the back before filling this page). The size of the paper is applicable to the Chinese National Standard (CNS) A4 specification (210X297 mm) A7 _____ B7 5. Description of the invention (3) words and will try to "recognize" the name is a set of _ or multiple sets of text. The conventional system does not have the voice data to properly enter, because the basic model of the manufacturing system assumes The data are useful or meaningful equivalent units (the words) for the system. The present invention adopts and combines multiple grammar networks to generate multiple sets of identification candidates, some of which are based on the assumption that a set of patterns of foreign speech exists, And some solve the above problem based on a set of speech recognition systems that assumes that there is no set of patterns of foreign speech. The results of the two sets of patterns are used to select the most likely candidate based on the respective matching probability scores to provide the final recognition Decision "The Central Standards Bureau of the Ministry of Economy The different first and second grammatical networks that are distinguished by different speech materials are processed separately. In this way, the system extracts practical speech from non-practical speech. For each grammar network, most Identification candidates are generated. The preferred embodiment uses a set of first syntax networks to generate N sets of best candidates and a set of second syntax networks to generate M sets of best candidates, where N and M are integers greater than one and Can be the same. The first and second majority identification candidates (N-best, M-best) are transformed based on at least one set of previous restrictions on useful speech. The transformation may include, for example, Dictionary of matching candidates to the spelled name recognized by the system. The recognition decision then depends on the conversion recognition candidate. As will be explained more fully below, the present invention divides the voice data into two or more groups, each of which is Channels handled differently. — Group channels are _ '_6 used in this paper standard based on the hypothetical first grammatical network that supplies only practical tones (eg text) National Standard (CNS) A4 Specification (210X297 mm) A7 _________B7 V. Invention Description u) Printed and processed by the staff consumer cooperative of the Central Bureau of Standards of the Ministry of Economic Affairs. The other channel uses the assumption that there is foreign and useless voice before useful voice. A different grammatical network is processed. Different grammatical networks therefore lead to different data distinctions. The identification candidates generated by each channel can be scored according to the degree to which each candidate matches the respective pattern. Two sets of channels are not required In this step comparison-in order to select the single candidate with the highest score-the two sets of identification candidates are kept separated. In this step, the identification candidates represent the best of N groups and the best text order assumptions of group M. In order to select the most Good candidate hypothesis • The two sets are individually matched to a dictionary of all names recognized by the system. This dictionary is actually a collection of one of the previous restrictions on speech that is useful to the system. It is therefore less likely that certain text order hypotheses can be scored because those text sequences do not match well with the text sequences stored in the dictionary. This preferred embodiment uses the N-best and M-best text sequences to select the N-best and M-best names from the dictionary. The contributions from the two sets of channels are therefore included in the decision-making process. Finally, the group N best and group M best names can be combined to form a simplified set of dictionary candidates to which the input intonation is applied. This reduced-scale dictionary can be used to build a set of dynamic grammars built from the best set of N and best set name candidates of the M set. This dynamic grammar will tend to favor one set of candidates or the other depending on whether the input intonation contains foreign speech. If there is a foreign voice, the grammatical network set to confirm and exclude the foreign voice will tend to produce better recognition results' and these results will be reflected as being composed of the best name candidates in group N and group M A better candidate for dynamic grammar. On the other hand, if there is no ^ 14-- (please read the notes on the back before filling this page), 1T image. This paper size applies the Chinese National Standard (CNS) A4 specification (210 × 297 mm) Employees of the Central Standards Bureau of the Ministry of Economic Affairs Printed by the Consumer Cooperative A7 ______B7 V. Description of the Invention (3 ·) Another foreign language will produce better recognition results, and it will be reflected as a better candidate in dynamic grammar. Once the dynamic grammar is constructed, the input speech data can be processed based on the dynamic grammar using a set of recognizers to extract a single most likely name candidate as a recognized name. This distinguished name is then used to access a suitable set of databases to properly direct the phone call. For a more complete understanding of the invention, its objects and advantages, reference is made to the following description and accompanying drawings. Fig. 1 is a block diagram of an exemplary system using the call routing element of the present invention; Fig. 2 is a block diagram of an exemplary embodiment of the call routing element of the present invention; The state diagram of the configured grammar network G1 at the beginning; Fig. 4 shows the state diagram of the grammar network 02 which is configured assuming that the noise of the spelling name is not in the order of words or the tone of the system is not recognized by the system; Figure 5 is a detailed block diagram of a better recognition system of the present invention; Figure 6 is a graph showing different types of recognition errors; Figure 7 is a diagram showing the optimization of PLP-RASTA filter coefficients to reduce substitution, deletion, and insertion Figure of the number of errors; Figure 8 is a figure showing the improved grid-type N-group best technique; Figure 9 is a figure that further describes how to achieve the hypothesis generated during the identified backward tracking step. The principle of the present invention will be demonstrated by a set of call routing elements and said that the paper size is applicable to the Chinese National Standard (CNS) A4 specification (210X297 mm) 11- 丨 · ---- Λ ------ IT-- ---- A (Please read the precautions on the back before filling this page) A7 ____; _B7_ V. Description of the invention (?) Printed by the Consumer Cooperatives of the Central Standards Bureau of the Ministry of Economic Affairs • It reminds users to use verbatim Spelling the name into the system to provide call routing information. Therefore, in order to help understand the speech recognition system, a summary description of a set of call routing elements that can adopt the speech recognition system will be provided first. It should be noted, however, that the speech recognition system of the present invention is not limited to call routing elements. In contrast, the recognition system can be applied to a wide range of different applications where useful speech must be extracted from external noise or not useful speech. System overview and basic operation Call routing elements using continuous speech recognition will be shown in a set of embodiments that are suitable for plug-and-play connections to existing PBX switches, or incorporated into PBX equipment during manufacturing. Referring to Fig. 1, the PBX switch 210 is connected to the telephone network substructure 212 with conventional devices, such as a telephone line 214. For convenience, three sets of lines are shown in the illustrated embodiment. This is not intended to be a limitation of the present invention, since the present invention can be used in a system having a larger or smaller number of telephone lines. The PBX switch is a conventional design that directs incoming calls from network 212 to any selected telephone element, such as telephone 216. The spelled name identification call router 218 of the present invention is connected as if the telephone 216 is connected to another extension or access on the PBX switch 210. As will be discussed more fully, the preferred embodiment is connected to the PBX switch via a plurality of lines 220 carrying voice traffic and via another line 222 carrying control logic signals that allow the call router to operate integrally with the PBX system. Figure 2 shows the call router 218 »PBX switch 210 and lines 220 and 222 in more detail at the same time. According to the structure of the PBX system, the paper size applies to the Chinese National Standard (CNS) Α4 specification (210 × 297 mm)
、1T (請先閲讀背面之注意事項再填寫本頁) -装. 五、發明説明(7 ) A7 B7 經濟部中央梂準局員工消費合作社印製 叫路由器218可以多種不同的方式被組態。在展示的實施例 中,該呼叫路由器具有三組分離音訊通道分別地連接到三 組線路220。當然,所需的通道數目將取決於電話系統的結 構。此處示出三組通道以展示一組系統如何可同時地在三 組進入的電話線路214的各組上提供三位呼叫者所拼名稱辨 識。爲了支援另外的呼叫者,另外的音訊通道可被包含或 者多工電路可被包含以允許通道被分享。 各音訊通道具有一組數位信號處理器(DSP)224以及相 關的類比-至-數位/數位-至-類比轉換電路226。數位信號處 理器被耦合至包含儲存著所有參考者或者名稱的一組資料 儲存器230之主處理器228 »資料儲存器23〇可以是任何適當 的數位儲存媒體,例如隨機存取記憶體。資料儲存器230儲 存著可被系統辨識的所有名稱之連績語音辨識字典’以及 相關的電話交換號碼。如將更完全地說明於下’該較佳實 施例使用一組特別的語音辨識器,那是連續拼出名稱之說 話者無關辨識的最佳化辨識器。 同時也有一組呼叫切換邏輯232耦合至主處理器228(或 者配合爲主處理器之部份)。這切換邏輯連接至信號線222 並且依據PBX交換機所指定之通訊協定與PBX切換系統通 訊。 在繼續語音辨識器之詳細說明之前,簡單地說明呼叫 路由器218之操作是有益處的。參看至第1和2圖。當一組進 入的呼叫經由一組電話線路2 14抵達PBX交換機時,它可以 被一位操作員處理而不被本發明的呼叫路由器所介入。但 (請先閲讀背面之注意事項再填寫本頁) |在_ 訂 良- 私紙張尺度適用中國國家標準(CNS ) A4規格(210X297公釐) 經濟部中央梯準局員工消費合作社印製 A 7 ______B7 五、發明説明(2) 是,如果該操作員不能夠處理該呼叫(例如,該呼叫在正常 上班時間之後沒有操作員時進入),該PBX交換機被規劃以 傳送該呼叫至呼叫路由器218。這可簡單地被交換機利用依 據在線路222上傳送的切換指令而排定該呼叫至呼叫路由器 的音訊通道之一(至線路220之一組)而達成。如果需要,當 第一線呼叫是忙碌時PBX交換機可被規劃以跳到在路由器 218之內不同的音訊通道上之不同的信號線。完成這動作 後,進入的呼叫接著與被選擇的一組DSP處理器224通訊。 處理器供應任何所需的聲音提示給進入呼叫者(要求呼叫者 拼出所需人員的名稱)並且它同時也處理呼叫者所拼名稱反 應。DSP處理器224使用的語音辨識演算法將被詳細地說明 於下》 作爲辨識程序的一部份,DSP處理器224從主處理器228 下載一組備份的分享語音辨識資源,亦即反映所有的參考 者名稱以及他們的相關的電話延伸號碼之資料。當使用N組 最佳策略以供即時辨識時,DSP製作的語音辨識器從資料儲 存器230選擇最可能候選者。這候選者的名稱被使用DSP處 理器說回到呼叫者以供應一組語音合成信號或者重放被選 擇人員名稱之預先記錄音訊信號。呼叫者接著被要求回答 指示候選者是否正確的"是"或者"否"。如果是,則主處理器 228使用呼叫切換邏輯232以指示PBX交換機將該呼叫從一 組線路220傳送至被選擇的話機216之一》在發生切換之 後,呼叫路由器的音訊通道可再次被用以處理新的進入呼 叫。 11 本紙張尺度適用中國國家標準(CNS ) A4規格(210X297公釐) (請先Μ讀背面之注意事項再填寫本頁) 訂 ,A_ 經濟部中央標準局員工消費合作社印製 A7 ______B7 五、發明説明(?) 較佳語音辨識處理之細節 本較佳語音辨識系統可被當作爲一種多回合步驟,其 中最後回合僅在先前(對齊)回合不產生一組單一辨識名稱作 爲輸出時被使用。第一和最後回合採用隱藏式馬克夫 (Markov)模式辨識,而對齊回合採用與字典動態規劃對 齊。如將被更完全地討論,第一回合(隱藏式馬克夫模式辨 識)本身被分割爲多數個平行的子經道。在第5圖中,第 一、第二以及第三回合被展示出》該注意的是第一回合分 叉成爲分別的隱藏式馬克夫模式辨識塊26a和26b。 所展示的實施例被設計以辨識包含經由呼叫者電話話 機1〇的輸入而供應至辨識系統的一序列文字之連續地拼出 名稱。爲了展示有用以及沒有用的輸入範例,兩組話機10 被展示出。在一組話機中呼叫者正確地使用該系統而供應 文字序列:H-A-N-S-0-N。在另一組話機中,呼叫者說出所 拼名稱以及文字序列:"Hanson" H-A-N-S-0-N而錯誤地使 用該系統。如將說明於下,、該系統被設計以容納正確使用 以及錯誤使用,而產生更強健的辨識系統。 在12所展示的辨識系統包含在13所展示的一組名稱取 出系統。如將被討論,名稱取出系統具有組成代表在名稱 字典中被發現項目之被選擇子集之動態語法的能力》在第 二回合中未達成辨識並且處理前進至第三回合的情況時使 用動態語法。 文字的輸入序列可被饋送至一組適當的語音分析模組 14。這模組進行前端最佳化以減少替代、刪除和插入錯誤 12 ^氏張尺度適用中國國家標準(CNS ) A4規格(210X297公釐) I -1-- — I--装------訂------4· (請先閱讀背面之注$項^3;寫本頁) 經濟部中央標準局員工消費合作社印裝 A7 B7 五、發明説明(/0 ) 的數目。在連續所拼出名稱中,替代錯誤是將一組錯誤文 字替代正確的文字。第6圖在16和18展示在所拚名稱 JOHNSON之辨識中造成的替代錯誤的範例。刪除錯誤是在 連續所拼出名稱的一組或者更多組文字之消失。這被展示 在第6圖的20處。插入錯誤是在連續所拼出名稱中包含原先 沒有被拼出的另外文字。插入錯誤之範例被展示在第6圖的 22和 24。 語音分析模組14被設計以便在數位化語音資料上操 作。因此如果使用一組類比語音輸入系統,類比信號應該 首先被數位化。這可利用可被包含在語音分析模組14中的 適當類比-至-數位電路而被達成》 本較佳語音分析模組使用一組8階PLP-RASTA處理程序 以補償通訊通道之效應。有關更多關於PLP-RASTA補償之 資訊,可參考 H.Hermansky、N.Morgan、A.Bayya和 P.Kohn 所著,EUROSPEECH’91,第 1 367- 1 370頁,1991 年》本較佳 實施例使用一組10毫秒框移位以及一組20毫秒分析視窗》 RASTA濾波器係數被最佳化以減少替代、刪除以及插入錯 誤之數目。最佳濾波器係數被選擇爲0.90之値。 在決定最佳化RASTA濾波器係數中,能量,能量之第 —導數以及靜止倒頻譜係數C1至C8(在7音框之上計算)之第 —導數被交互地與靜態倒頻譜係數組合以形成語音參數表 示(總共18係數)。第7圖展示將減少替代、刪除以及插入錯 誤之數目的最佳化RASTA濾波器係數。在這圖中,1>1^-1T (Please read the notes on the back before filling this page)-Installation. V. Description of the invention (7) A7 B7 Printed by the Consumers' Cooperative of the Central Government Bureau of the Ministry of Economic Affairs The router 218 can be configured in many different ways. In the embodiment shown, the call router has three separate audio channels connected to three sets of lines 220, respectively. Of course, the number of channels required will depend on the structure of the telephone system. Three sets of channels are shown here to show how one set of systems can simultaneously provide identification of three callers' names on each of the three sets of incoming telephone lines 214. To support additional callers, additional audio channels may be included or multiplexing circuits may be included to allow channels to be shared. Each audio channel has a set of digital signal processors (DSP) 224 and related analog-to-digital / digital-to-analog conversion circuits 226. The digital signal processor is coupled to a main processor 228 containing a set of data storage 230 that stores all references or names. The data storage 23 may be any suitable digital storage medium, such as random access memory. The data storage 230 stores a continuous speech recognition dictionary 'of all names that can be recognized by the system, and related telephone exchange numbers. As will be explained more fully below, the preferred embodiment uses a special set of speech recognizers, which are optimized recognizers for speaker-independent recognition of consecutive spelled names. There is also a set of call switching logic 232 coupled to the main processor 228 (or part of the main processor). This switching logic is connected to the signal line 222 and communicates with the PBX switching system according to the communication protocol specified by the PBX switch. Before proceeding to the detailed description of the speech recognizer, it is beneficial to briefly explain the operation of the call router 218. See Figures 1 and 2. When a group of incoming calls arrive at the PBX switch via a group of telephone lines 214, it can be handled by an operator without intervention by the call router of the present invention. But (Please read the notes on the back before filling this page) | In _ Dingliang-Private paper size applies Chinese National Standard (CNS) A4 specification (210X297 mm) Printed by A 7 ______B7 V. Description of Invention (2) Yes, if the operator is unable to handle the call (for example, the call enters without an operator after normal working hours), the PBX switch is planned to route the call to the call router 218. This can be achieved simply by the switch using one of the audio channels (to a group of lines 220) to schedule the call to the calling router according to the switching instruction transmitted on line 222. If desired, the PBX switch can be planned to jump to different signal lines on different audio channels within router 218 when the first line call is busy. After this is done, the incoming call then communicates with the selected group of DSP processors 224. The processor supplies any required audible prompts to the incoming caller (requiring the caller to spell out the name of the person required) and it also handles the name response spelled by the caller. The speech recognition algorithm used by the DSP processor 224 will be described in detail below. As part of the recognition process, the DSP processor 224 downloads a set of backup shared speech recognition resources from the main processor 228, which reflects all the Information about the names of the referees and their associated phone extensions. When N sets of optimal strategies are used for real-time recognition, the speech recognizer produced by the DSP selects the most likely candidate from the data memory 230. This candidate's name is spoken back to the caller using a DSP processor to supply a set of speech synthesis signals or replay a pre-recorded audio signal of the selected person's name. The caller is then asked to answer " yes " or " no " indicating whether the candidate is correct. If so, the main processor 228 uses the call switching logic 232 to instruct the PBX switch to transfer the call from a group of lines 220 to one of the selected telephones 216. After the switching occurs, the audio channel of the calling router can be used again Handle new incoming calls. 11 This paper size is in accordance with Chinese National Standard (CNS) A4 specification (210X297 mm) (please read the precautions on the back before filling this page). A_ Printed by the Consumers Cooperative of the Central Standards Bureau of the Ministry of Economic Affairs A7 ______B7 V. Invention Explanation (?) Details of better speech recognition processing This better speech recognition system can be used as a multi-round step, where the final round is used only when the previous (aligned) round did not produce a single set of recognition names as output. The first and last rounds use hidden Markov pattern recognition, while the alignment rounds align with dictionary dynamic programming. As will be discussed more fully, the first round (recognition of hidden Markov patterns) is itself divided into a number of parallel sub meridians. In Figure 5, the first, second, and third rounds are shown. "It should be noted that the first round bifurcation becomes a separate hidden Markov pattern recognition block 26a and 26b. The illustrated embodiment is designed to recognize a continuously spelled name that includes a sequence of text supplied to the recognition system via input from a caller's telephone 10. To show useful and useless input examples, two sets of telephones 10 are shown. The caller uses the system correctly in a group of phones to provide a text sequence: H-A-N-S-0-N. In another set of telephones, the caller used the system by saying the spelling name and text sequence: "Hanson" H-A-N-S-0-N. As will be explained below, the system is designed to accommodate both correct and incorrect use, resulting in a more robust identification system. The identification system shown in 12 includes a set of name extraction systems shown in 13. As will be discussed, the name extraction system has the ability to form a dynamic grammar that represents a selected subset of the items found in the name dictionary. "Dynamic grammar is used when identification is not reached in the second round and processing proceeds to the third round . The input sequence of text can be fed to a suitable set of speech analysis modules 14. This module is optimized for front-end to reduce substitution, deletion and insertion errors. The 12 ^ 's scale is applicable to the Chinese National Standard (CNS) A4 specification (210X297 mm) I -1-- — I-- 装 ----- -Order ------ 4 · (Please read the note at the back of the item ^ 3; write this page) A7 B7 printed by the Consumer Cooperatives of the Central Standards Bureau of the Ministry of Economic Affairs 5. The number of invention descriptions (/ 0). In consecutive spelled names, replacing errors is replacing a set of incorrect text with the correct text. Figure 6 shows examples of substitution errors in the identification of the spelled name JOHNSON at 16 and 18. Deletion error is the disappearance of one or more groups of words in consecutive spelled names. This is shown at 20 in Figure 6. The insertion error is that the consecutive spelled names contain additional text that was not previously spelled. Examples of insertion errors are shown in Figures 22 and 24 of Figure 6. The speech analysis module 14 is designed to operate on digitized speech data. So if a set of analog voice input systems is used, the analog signal should be digitized first. This can be achieved using appropriate analog-to-digital circuits that can be included in the speech analysis module 14. The preferred speech analysis module uses a set of 8-stage PLP-RASTA processing programs to compensate for the effects of the communication channel. For more information on PLP-RASTA compensation, please refer to H. Hermansky, N. Morgan, A. Bayya, and P. Kohn, EUROSPEECH'91, pp. 1367-1370, 1991. The example uses a set of 10 millisecond box shifts and a set of 20 millisecond analysis windows. The RASTA filter coefficients are optimized to reduce the number of substitution, deletion, and insertion errors. The optimal filter coefficient was chosen to be a fraction of 0.90. In determining the optimized RASTA filter coefficients, the first derivative of energy, the energy, and the first derivative of the static cepstrum coefficients C1 to C8 (calculated above the 7-tone frame) are interactively combined with the static cepstrum coefficient to form Speech parameter representation (18 coefficients in total). Figure 7 shows optimized RASTA filter coefficients that will reduce the number of substitutions, deletions, and insertion errors. In this figure, 1 > 1 ^-
rasta^mmm ' > mmmmmww.nk'SL 13 本紙張尺度適用中國國家榇準(CNS ) A4規格(2丨0X297公釐) ----I--I------^------1T------2 /.·»<· (請先閱讀背面之注意事項再填寫本頁) A7 B7 五、發明説明(/ / ) 靜態倒頻譜係數之第一導數的組合。 雖然PLP-RASTA最佳化是目前較佳的一種,也可使用 其他形式的最佳化。例如,可另外地使用一種暗色頻率倒 頻譜係數(MFCC)分析》可使用14階MFCC分析而得到適當 的結果。在MFCC分析中,11組靜態倒頻譜係數(包含C〇)以 16毫秒的框移位和32毫秒的分析視窗而被計算》 不同的辨識精確度可使用不同的特性集合而被得到》 這些特性集合可分別地並且組合地包含靜態特點和動態特 點。爲了展示本發明中所使用之參數的強健,淸除以及濾 取資料被使用。爲了得到濾波資料以供本較佳實施例中之 測試,一組失真濾波器被使用並且測試資料被濾波以在訓 練集和測試集之間人工地產生一組不匹配。在這方面,參 看 H.Murveit 1 J.Butzberger和 H.Weintraub所著,Darpa 硏 討會,語音和自然語言,第280-284頁,丨992年2月。 經濟部中央標準局員工消費合作社印製 (請先聞讀背面之注意事項再填寫本頁) 返回至第5圖,語音分析模組14之輸出被分割成爲兩組 通道,一組是與隱藏式馬克夫模式相關的辨識塊26a以及另 一組是與隱藏式馬克夫模式相關的辨識塊26b »辨識塊26a 與28a所示出的一組預先定義文字語法G1-起作用。辨識塊 26b與28b所示出的一組預先定義文字語法G2—起作用》這 些不同的文字語法分別地構成第3和4圖中展示之語法網 絡。這些語法網絡是包含與各種可能文字相關的節點以及 節點-至-節點的可能轉移之圖形。該等語法包含被文字迴路 跟隨著的靜音節點,其中任何文字可跟隨任何文字。第3圚 之語法G1以一組靜音(Sil)節點50而開始,轉移至分別的開 14 本紙張尺度適用中國國家標準(CNS ) A4规格(210X297公着) 經濟部中央標準局員工消費合作社印製 A7 ___B7 五、發明説明(A?) 始文字A、B、C*»第4圖所示出的語法G2,以一組充塡器 節點52代表外來的語音或者在拼音之前所說的雜訊而開 始。充塡器節點轉移到靜音節點52並且接著到分別的文字 節點,如G1-在本較佳製作中,辨識塊26 a和26b是採用 Viterbi解碼的框同步、一階、連續密度隱藏式馬克夫模式 辨識器。 本較佳實施例採用產生N組最佳或者Μ組最佳假設(取 代單一假設)之一種修改式Viterbi解碼器》Viterbi解碼器通 常依據在HMM模式和測試語調之間的匹配機率而被設計爲 僅提供最佳假設。在本發明中,這標準Viterbi解碼器被修 改而使得它依據在HMM模式和測試語調之間的匹配最高機 率而提供N組最佳或者Μ組最佳假設。辨識塊26a和26b各產 生他們獨有的N組最佳或者Μ組最佳假設。雖然在較佳賁施 例中使用相同數目(例如,Ν = Μ= 10),如果需要,這兩組辨 識塊不需要產生相同數目的假設。因此在第5圖中,辨識塊 26a產生Ν組最佳假設並且辨識塊26b產生Μ組最佳假設。如 先前所述,符號Ν和Μ可以是任何大於1的整數。爲整數Ν和 Μ所選擇的精確値可取決於處理器速率和記憶體大小。產生 Ν組最佳(或者Μ組最佳)文字候選者之技術將更完全地被討 論於下。將可了解,產生Ν組最佳(或者Μ組最佳)假設之技 術在兩種情況中是大致相同。 在26a和26b被採用的隱藏式馬克夫模式辨識器具有被 設計以限制捜尋空間的束捜尋能力,使得辨識器更快速地 處理進入的語音》隱藏式馬克夫模式辨識器產生代表在輸 15 本紙張尺度適用中國國家標準(CNS ) A4規格(210X297公釐) --:--1---------1T------^ j · (請先閲讀背面之注意事項再填寫本頁) 經濟部中央標準局貝工消費合作社印製 A7 _____^_B7_ 五、發明説明(Ο) 入語音和參考語音之間匹配可能性的一組計分。沒有束捜 f機構時,辨識器必須在捜尋程序時將在各音框所有的可 能通道計分。具有束搜尋辨識器時,僅考慮計分從最佳計 分偏移不超出等於束寬度之數量的那些通道。取代尋找整 個搜尋空間,製作一種束搜尋而使最不可能搜尋通道被修 剪,使得僅有最佳假設。 從辨識器263和26b得到的N組最佳(或者Μ組最佳)假設 接著分別地被送到動態程式規劃(DP)對齊模組38a和38b。 動態程式規劃對齊模組可存取用以比較N組最佳(或者Μ組 最佳)假設之一組相關的名稱字典39。動態程式規劃被使用 以負责塞入、替代以及刪除錯誤。 在一些例子中,動態程式規劃對齊之結果將產生不具 有其他候選者的一組單一名稱》當DP對齊僅產生一組候選 者時,決定策略模組40檢測這名稱並且將辨識名稱提供爲 输出。但是,在大多數的情況中,並不產生單一候選者, 在此情況中,決定策略模組將Ν組最佳和Μ組最佳假設傳送 到模組42以建立一組動態語法。 模組42使用DP對齊模組所提供的Ν組最佳和Μ組最佳候 選者建立一組語法。高度限制的辨識器44接著被啓動以使 用動態語法42衡量Ν組最佳和Μ組最佳候選者》辨識器44也 可是一組隱藏式馬克夫模式辨識器。即使高度地被限制, 因爲動態語法是小的並且因爲參數表示(在14中計算)不需要 被重新計算,經由這辨識器而傳送的資料並不耗時》如果 需要,一組神經網絡鑑別器可被施加在辨識器26a和26b或 16 本紙張尺度適用中國國家標準(CNS ) A4規格(210X297公釐) j--丨Μ----裝------訂------扃 (請先閱讀背面之注意事項再填寫本頁) A7 ___B7__ 五、發明説明(/4) 者辨識器44之輸出。 在附錄A中的列表展示本發明之系統如何進行辨識所拼 名稱WILSON。在列表中指定爲[第一回合(First Pass)]的部 份展示被兩種語法產生的所有假設。其中並沒有名稱 WILSON。 在標示著[DP對齊]的部份中,頂部(最佳)候選者被 列出:名稱WILSON(10個中第1候選者)包含在列表中。 在標示著[成本限制回合(Costly Constrained Pass)]的部 份中,輸入語調僅與在DP對齊時被選擇之候選者比較。在 此情況中,辨識器正確地檢測名稱WILSON。 N組最佳處理技術 經濟部中央標準局員工消費合作社印製 (請先Μ讀背面之注意事項再填寫本頁) -年 N組最佳或者Μ組最佳候選者使用一種N組最佳選擇演 算法而被選擇。關於這技術細節,參看R. Schwartz和Steve Austin,"N組最佳捜尋之有效益、高性能演算法",DARPA 語音辨識硏討會,第6-11頁,1990年》在語音辨識中,進 入的語音資料被分解成爲時間框並且被以一種框接著框的 基礎而分析。任何所給予語調,可有許多可能的假設。本 較佳N組最佳(或者Μ組最佳)演算法僅依據先前文字並且不 依據在先前文字之前的文字選擇一組文字最佳開始時間。 當各文字被說出並且分析時,隱藏式馬克夫模式辨識器將 爲各模式產生機率計分。因爲該系統最後目的是選擇最可 能文字序列|該系統儲存代表可能所拼文字組合的多數個 通道。 爲了使該系統較佳地作爲一組即時辨識器•兩組不同 Ί7 本紙張尺度適用中國國家標準(CNS ) Α4規格(210Χ297公釐) 經濟部中央標準局員工消費合作社印製 A7 _ B7 五、發明説明(/5) 的資料修剪位準被製作。兩種位準的修剪技術涉及比較所 給予假設的機率以及一組機率臨限。如果所給予通道的機 率是在臨限之下,則它被抛棄。更明確地說,修剪以一區 域位準以及一廣域位準發生。區域位準修剪涉及拋棄在文 字位準上代表低機率匹配的通道並且廣域修剪涉及拋棄從 語調開始到最後發現的文字之代表低機率匹配的通道。因 此。在所拼名稱之末端,一種巡迴式往回追蹤被達成以抽 取N組最佳(或者Μ組最佳)名稱假設。當往回追蹤操作被進 行時,區域和廣域修剪已經減低需要被分析的記憶體空間 之尺度。 除了區域和廣域修剪之外,本較佳系統同時也可使用 一種適應式臨限因而使修剪臨限在系統執行時被動態地調 整。 在混淆字語的情況中,辨識器使用狀態連結以幫助集 中在字語的區分部份並且減少評估參數的數目。這些連結 文字是(m,η)、(1,r)、Cp,t)以及(b,d)。在本較佳實施 例中除了文字W之外的所有文字被以一種六組狀態HMM模 式表示。文字W是以十二組狀態HMM模式表示並且靜音模 式是以一組狀態表示。依據文字混淆程度,文字模式具有 不同數目的高斯(Gaussian)密度。"E-組"文字·· b、c、d、 e、g、p、t、v、和z,以及文字m、n、s和f全是以6組高斯 密度模式化。其餘文字是以3組高斯密度模式化。 第8圖展示用以進行N組最佳(或者Μ組最佳)假設分析的 本發明之另一較佳技術。此處稱爲格子式Ν組最佳技術,在 18 本紙張尺度適用中國國家標準(〇阳)八4規格(210乂297公釐) I-^丨丨-------f------IT------Λ. (請先閲讀背面之注意事項再填寫本頁) 經濟部中央橾準局員工消費合作社印製 Α7 Β7 五、發明説明(/办) 各音框中該步驟計算各語法節點的可能性並且儲存到達該 節點之最佳反應。它接著儲存機率以及假設有作用的音框 數目。此技術因此保存N組最佳(或者Μ組最佳)假設並且經 由節點傳輸最佳的一組,所有其他組被最大可能性通道所 包含。rasta ^ mmm '> mmmmmww.nk'SL 13 This paper size is applicable to China National Standard (CNS) A4 specification (2 丨 0X297 mm) ---- I--I ------ ^ --- --- 1T ------ 2 /.·»<· (Please read the notes on the back before filling this page) A7 B7 V. Description of the Invention (//) The first derivative of the static cepstrum coefficient combination. Although PLP-RASTA optimization is currently the better one, other forms of optimization can be used. For example, a dark frequency cepstrum coefficient (MFCC) analysis may be used in addition. A 14th order MFCC analysis may be used to obtain appropriate results. In the MFCC analysis, 11 sets of static cepstrum coefficients (including C0) are calculated with a frame shift of 16 milliseconds and an analysis window of 32 milliseconds. "Different recognition accuracy can be obtained using different feature sets." These characteristics Sets can contain static and dynamic features separately and in combination. In order to demonstrate the robustness of the parameters used in the present invention, erasure and filtering data are used. In order to obtain filtered data for testing in the preferred embodiment, a set of distortion filters is used and the test data is filtered to manually generate a set of mismatches between the training set and the test set. In this regard, see H. Murveit 1 J. Butzberger and H. Weintraub, Darpa 讨 Symposium, Speech and Natural Language, pp. 280-284, February 992. Printed by the Consumer Cooperative of the Central Standards Bureau of the Ministry of Economic Affairs (please read the notes on the back before filling out this page). Return to Figure 5. The output of the voice analysis module 14 is divided into two groups of channels. The Markov pattern-related recognition block 26a and another set are the recognition block 26b »recognition blocks 26a and 28a related to a set of pre-defined text syntaxes G1-, which function as hidden Markov patterns. The different sets of pre-defined text grammars G2—functions shown in the recognition blocks 26b and 28b constitute the grammatical networks shown in Figs. 3 and 4, respectively. These grammatical networks are graphs containing nodes related to various possible texts and possible transitions from node-to-node. These grammars include silent nodes followed by text loops, where any text can follow any text. The third grammar G1 starts with a set of Sil nodes (Sil) 50, and is transferred to a separate paper. This paper size applies the Chinese National Standard (CNS) A4 specification (210X297). Printed by the Consumer Cooperatives of the Central Standards Bureau of the Ministry of Economic Affairs. System A7 ___B7 V. Description of the invention (A?) Initial text A, B, C * »The grammar G2 shown in Figure 4, with a set of filler nodes 52 representing foreign speech or miscellaneous words before Pinyin News began. The charger node is transferred to the mute node 52 and then to the respective text node, such as G1- In this preferred production, the identification blocks 26 a and 26 b are frame-synchronized, first-order, continuous-density hidden markoff using Viterbi decoding Pattern recognizer. This preferred embodiment uses a modified Viterbi decoder that produces N-best or M-best hypotheses (instead of a single hypothesis). The Viterbi decoder is usually designed based on the probability of a match between the HMM mode and the test tone. Provide only the best assumptions. In the present invention, this standard Viterbi decoder is modified so that it provides the N-best or M-best hypotheses depending on the highest probability of matching between the HMM pattern and the test tone. The identification blocks 26a and 26b each produce their unique N-group best or M-group best hypotheses. Although the same number is used in the preferred embodiment (for example, N = M = 10), the two sets of identification blocks need not generate the same number of hypotheses if desired. Therefore, in Fig. 5, the identification block 26a generates N sets of best hypotheses and the identification block 26b generates M sets of best hypotheses. As stated previously, the symbols N and M can be any integer greater than one. The exact value chosen for the integers N and M may depend on the processor speed and memory size. The technique for generating the best (or best in the M group) text candidate for the N group will be discussed more fully below. It will be appreciated that the technique for generating the hypothesis of group N (or group M) is roughly the same in both cases. The hidden Markov pattern recognizer adopted in 26a and 26b has a beam search capability designed to limit the search space, so that the recognizer can process the incoming speech more quickly. 15 This paper size is applicable to China National Standard (CNS) A4 specification (210X297 mm)-: --- 1 --------- 1T ------ ^ j · (Please read the note on the back first Please fill in this page for further information.) Printed by the Central Standards Bureau of the Ministry of Economic Affairs, Shellfish Consumer Cooperatives. A7 _____ ^ _ B7_ V. Description of the invention (0) A set of points for the possibility of matching between the input voice and the reference voice. When there is no beam mechanism, the recognizer must score all possible channels in each frame during the search procedure. With a beam search identifier, only those channels whose scores are offset from the best score by an amount equal to the beam width are considered. Instead of finding the entire search space, make a beam search so that the least likely search channel is trimmed so that only the best hypothesis is available. The N-best (or M-best) assumptions obtained from the identifiers 263 and 26b are then sent to the dynamic program planning (DP) alignment modules 38a and 38b, respectively. The dynamic programming alignment module has access to a set of related name dictionaries 39 for comparing one of the N-best (or M-best) hypotheses. Dynamic programming is used to plug, replace, and delete errors. In some examples, the result of dynamic programming alignment will generate a set of single names without other candidates. ”When DP alignment only generates a set of candidates, the decision strategy module 40 detects this name and provides the distinguished name as an output. . However, in most cases, no single candidate is generated. In this case, the decision strategy module transmits the N-best and M-best assumptions to the module 42 to establish a set of dynamic grammars. Module 42 uses the N-best and M-best candidates provided by the DP alignment module to create a set of grammars. The highly constrained recognizer 44 is then activated to use the dynamic grammar 42 to measure the best set of N and the best candidates of set M. The recognizer 44 may also be a set of hidden Markov pattern recognizers. Even if highly restricted, because the dynamic syntax is small and because the parameter representation (calculated in 14) does not need to be recalculated, the data transmitted via this recognizer is not time-consuming. ”If necessary, a set of neural network discriminators Can be applied to the identifiers 26a and 26b or 16 This paper size is applicable to China National Standard (CNS) A4 specification (210X297 mm) j-- 丨 M ---- install -------- order ----- -扃 (Please read the precautions on the back before filling this page) A7 ___B7__ 5. Description of the invention (/ 4) The output of the identifier 44. The list in Appendix A shows how the system of the present invention recognizes the spelled name WILSON. The part designated as [First Pass] in the list shows all hypotheses generated by the two syntaxes. It does not have the name WILSON. In the section labeled [DP Alignment], the top (best) candidate is listed: the name WILSON (the first candidate out of 10) is included in the list. In the section labeled [Costly Constrained Pass], the input intonation is compared only with the candidate selected during DP alignment. In this case, the recognizer correctly detects the name WILSON. Printed by the Consumer Cooperatives of the Central Standards Bureau of the Ministry of Economics and Technology of the N Group (please read the notes on the back before filling this page)-The best candidate of the N group or the best candidate of the M group uses the best choice of the N group Algorithm. For technical details, see R. Schwartz and Steve Austin, "Effective, High-Performance Algorithms for the Best Search in N Groups," DARPA Speech Recognition Symposium, pp. 6-11, 1990. In recognition, the incoming speech data is decomposed into time frames and analyzed on a frame-by-frame basis. There are many possible hypotheses for any given intonation. The preferred N-best (or M-best) algorithm selects the best start time for a group of characters based on previous characters only and not on characters preceding the previous characters. As each text is spoken and analyzed, the hidden Markf pattern recognizer will generate probability scores for each pattern. Because the system's final purpose is to select the most probable text sequence | the system stores the most channels that represent possible combinations of text. In order to make the system better as a set of real-time identifiers • Two different sets of Ί7 This paper size applies to Chinese National Standard (CNS) Α4 specifications (210 × 297 mm) Printed by A7 _ B7 of the Consumer Cooperatives of the Central Standards Bureau of the Ministry of Economic Affairs The data pruning level of the invention description (/ 5) is made. Two-level pruning techniques involve comparing the probability of a given hypothesis with a set of probability thresholds. If the probability given to the channel is below the threshold, it is discarded. More specifically, pruning occurs at a regional level and at a wide area level. Regional level pruning involves discarding channels that represent low-probability matches at the text level and wide-area pruning involves discarding channels that represent low-probability matches from the beginning of the intonation to the last found text. Therefore. At the end of the spelled name, a roving backtracking is reached to extract the N-best (or M-best) name hypothesis. When backtracking operations are performed, regional and wide-area pruning has reduced the size of the memory space that needs to be analyzed. In addition to regional and wide-area pruning, the preferred system can also use an adaptive threshold so that the pruning threshold is dynamically adjusted during system execution. In the case of confusing words, the recognizer uses state links to help focus the distinguishing part of words and reduce the number of evaluation parameters. These links are (m, η), (1, r), Cp, t), and (b, d). In the preferred embodiment, all characters except the character W are represented in a six-group state HMM mode. The text W is expressed in twelve groups of states in the HMM mode and the mute mode is expressed in one group of states. Depending on the degree of text obfuscation, the text patterns have different numbers of Gaussian densities. " E-group " texts ... b, c, d, e, g, p, t, v, and z, and texts m, n, s, and f are all modeled with 6 sets of Gaussian density. The rest of the text is modeled with 3 sets of Gaussian density. Fig. 8 shows another preferred technique of the present invention for performing N-best (or M-best) hypothesis analysis. This is called the grid-type N group best technology. It is applicable to Chinese national standard (〇 阳) 8 4 size (210 乂 297 mm) on 18 paper sizes. I- ^ 丨 丨 ------- f-- ---- IT ------ Λ. (Please read the precautions on the back before filling out this page) Printed by the Consumers' Cooperative of the Central Government Bureau of the Ministry of Economic Affairs Α7 Β7 V. Description of Invention (/ Office) Each frame This step calculates the probability of each grammar node and stores the best response to reach that node. It then stores the probability and the number of frames that are supposed to work. This technique therefore preserves the N-group best (or M-group best) hypothesis and the best group is transmitted by the node, all other groups are included by the most likely channel.
Viterbi傳送演算法計算各狀態的機率。這是以框接著 框方式在输入緩衝器中所有的資料進行,而機率被儲存在 狀態資料結構中。本較佳格子式N組最佳技術是產生N組最 佳(或者Μ組最佳)候選者的一種修改Viterbi演算法,但是在 下一模式中僅傳輸最大可能性。因此在各音框中,該常式 計算各語法節點之可能性並且接著儲存進入該節點的最佳 反應。 參看至第8圖,一組網絡節點η被示出。三組假設W,、 "3進入節點η。在這些假設中僅有最大可能性(最高的 機率)被往前傅送。因此節點η依據從節點η的最大可能性產 '生下一字語假設Wj、Wk、以及W,。Viterbi傳送演算法儲存 機率、持續(目前假設有作用之框數目)以及產生特定網絡節 點的各假設之一組指示器。當在狀態資料結構中的機率資 料被分析時,這資訊被往回追蹤演算法使用。 往回追蹤演算法可參考第9圖而了解。本較佳往回追蹤 演算法將所有N組最佳結束節點排優先順序而進入一組優先 順序佇列,展示於第9圚中的垂直行280»如果,例如,在 第一回合的最後框(N= 10)中有10組假設被傳輸時,將會有 1〇組結束節點(η〆n2、…、η,。)在優先順序佇列280中》結 Ί9 本紙張尺度適用中國國家標準(CNS ) Α4規格(210X297公釐) (請先閲讀背面之注意事項再填寫本頁) 訂 Λ. A7 __B7 五、發明説明(,7 ) 束節點被以下降順序分類,使得在佇列中的第一結束節點 代表具有最高機率計分的一組。 爲了展不起見,在第9圖中假設節點1^是具有最髙計分 的節點。該節點被以往回追蹤一步驟擴大以置放產生所給 予的節點'的假設(W,、w2或者w3)。當確認這假設時可允 許往回追蹤常式確認產生辨識假設之節點。在第9圖中,如 果假設_W2負責產生節點η,,則節點^是經由往回追蹤而被 辨識。節點nb接著在優先順序佇列280中的某些地方取代節 點η,。在替代之後,優先順序佇列將以下降順序被分類保 持。如果被取代的新節點nb具有最高的計分,它將會佔有 先前被節點η,佔據的位置。當然,在優先順序佇列中的一 組其他節點也同時可能具有比被取代新節點、較高的計 分。在此情況中具有最高計分的節點,並且不是被取代之 新節點nb,將被使用在下次的往回追蹤操作中。 往回追蹤以上述方式前進直至在往回追蹤時遇到一組 開始節點(對應至語音第一音框的一組節點)爲止。當抵達一 組開始節點時,一組假設被發現。往回追蹤演算法儲存在 往回追蹤進行時遇到的各組符號。這些符號可被儲存作爲 以逆向讀取的符號串以產生最可能候選者名稱。 經濟部中央梯準局員工消費合作社印製 第9圖展示具有多重N組最佳假設(N = 10)的一般情況。 爲了進一步地展示優先順序佇列往回追蹤步驟,可參看在 附錄B中的範例。該範例展示名稱"JONES"的往回追蹤步 驟。 從上面說明可了解本發明之呼叫路由元件極適用於幾 20 本紙張尺度適用中國國家標準(CNS ) A4規格(210X 297公釐) A7 B7 經濟部中央標準局貝工消費合作社印製 五、發明説明(/5 ) 乎任何辦公室電話網絡或者PBX系統之隨插式匹配連接。 該路由元件採用一種熟練的說話者·無關、連續語音辨識技 術,其允許進入的呼叫者拼出所需的接受者名稱並且該系 統將自動地決定接受者的適當電話分機並且使現有的電話 網絡或者PBX系統將進入的呼叫者連接至接受者之分機。 本發明不需要經由繁複按鍵命令而與電話網絡通訊,因而 使本系統極適用於視障者。該語音辨識技術具有高度彈 性:進入的呼叫者可以他們獨有的自然拼音速率拼出並且 在呼叫者退出拼音之後系統將自動地提示使用者確認名稱 選擇。如果進入的呼叫者明顯地暫停並且接著繼續拼音, 該系統將自動地確定語音辨識》即使經由雜訊電話通道, 該多回合語音辨識步驟也可良好地進行》該步驟在各回合 之間傅輸N組最佳假設並且延緩最耗計算步驟直至最後回 合,在該時間,可能名稱之候選者列表被大量地減小。N組 最佳,多回合辨識程序之結果,本發明可使用低成本DSP電 路被製,作。 雖然本發明以其較佳實施例被說明,將可了解本發明 可有某些修改而不脫離所附加申請專利範圍之精神。因 此,例如|不同的組態可與不同的型式目前和未來的電話 系統連接,包含類比系統和數位系統》 :-------裝-- --· (請先聞讀背面之注意事項再填寫本頁) 、tr 本紙張尺度適用中國國家標準(CNS ) A4規格(210X297公釐) 五、發明説明(/y) A7 B7 元件標號對照表 10 呼叫者電話話機 12 辨識系統 13 名稱取出系統 14 語音分析模組 26a,28a 辨識塊 26b,28b 辨識塊 38a,38b 動態程式規劃對齊麵 39 名稱字典 40 決定策略模組 42 動態語法建立模組 44 辨識器 210 PBX交換機 212 電話網絡下纖造 214 電話線 216 話機 218 呼叫路由器 220 線路 222 線路 224 數位信號處理器 226 類比-至-數你 數位-至扁比轉換電路 228 主處理器 230 資料儲存器 232 呼叫切換邏輯 Μ ^— (請先閲讀背面之注意事項再填寫本頁) 訂 Λ 經濟部中央標準局員工消費合作社印製 22 本紙張尺度適用中國國家標準(CNS ) Α4規格(210X297公釐) 五、發明説明(Μ;The Viterbi transmission algorithm calculates the probability of each state. This is done in a box-by-box manner with all the data in the input buffer, and the probability is stored in the state data structure. The preferred lattice-type N-group optimal technique is a modified Viterbi algorithm that produces N-group best (or M-group best) candidates, but only transmits the maximum probability in the next mode. So in each frame, the routine calculates the probability of each grammatical node and then stores the best response into that node. Referring to Figure 8, a group of network nodes n are shown. Three sets of hypotheses W ,, " 3 enter node η. Of these assumptions, only the highest probability (highest probability) is sent forward. Therefore, the node η is assumed to produce the word hypothesis Wj, Wk, and W, from the maximum probability of the node η. The Viterbi transmission algorithm stores a set of indicators of probability, persistence (currently assumed number of active boxes), and hypotheses that generate specific network nodes. When the probabilistic data in the state data structure is analyzed, this information is used by the backtracking algorithm. Refer to Figure 9 for the backtracking algorithm. This preferred backtracking algorithm ranks all N best ending nodes into a set of priority queues, shown in the vertical row in line 9 280 »If, for example, in the last box of the first round When 10 sets of hypotheses are transmitted in (N = 10), there will be 10 sets of end nodes (η〆n2, ..., η ,.) in the priority queue 280 "Conclusion 9 This paper standard applies Chinese national standards (CNS) Α4 specifications (210X297 mm) (Please read the notes on the back before filling this page) Order Λ. A7 __B7 V. Description of the invention (, 7) Bundle nodes are sorted in descending order, making the The first ending node represents the group with the highest probability score. For the sake of brevity, it is assumed in Fig. 9 that node 1 ^ is the node with the highest score. This node was traced back in the past and expanded to place the hypothesis (W, w2 or w3) that gave the given node '. When confirming this hypothesis, it is allowed to trace back the routine to confirm the node that generated the identification hypothesis. In Figure 9, if it is assumed that _W2 is responsible for generating node η, then node ^ is identified by backtracking. Node nb then replaces node η, somewhere in priority queue 280. After replacement, the priority queues will be maintained by categories in descending order. If the replaced new node nb has the highest score, it will occupy the position previously occupied by node n ,. Of course, a group of other nodes in the priority queue may also have a higher score than the new node being replaced. The node with the highest score in this case, and not replaced by the new node nb, will be used in the next backtracking operation. Backtracking proceeds in the above manner until a set of start nodes (a set of nodes corresponding to the first frame of speech) are encountered during the backtracking. When a set of starting nodes is reached, a set of hypotheses is discovered. The backtracking algorithm stores sets of symbols encountered during backtracking. These symbols can be stored as a string of symbols read in reverse to produce the most likely candidate names. Printed by the Consumer Cooperatives of the Central Government Bureau of the Ministry of Economic Affairs. Figure 9 shows the general situation with multiple N-group optimal assumptions (N = 10). To further illustrate the priority queue backtracking steps, see the example in Appendix B. This example shows the backtracking steps for the name " JONES ". From the description above, we can understand that the call routing element of the present invention is extremely suitable for several 20 paper sizes. It is applicable to the Chinese National Standard (CNS) A4 specification (210X 297 mm) A7 B7 Printed by the Shell Standard Consumer Cooperative of the Central Standards Bureau of the Ministry of Economic Affairs Note (/ 5) Plug-and-match connection for almost any office telephone network or PBX system. The routing element employs a skilled speaker-irrelevant, continuous speech recognition technology that allows incoming callers to spell the required recipient name and the system will automatically determine the appropriate telephone extension for the recipient and enable the existing telephone network Or the PBX system connects the incoming caller to the recipient's extension. The invention does not need to communicate with the telephone network via complicated key commands, thus making the system extremely suitable for the visually impaired. This speech recognition technology is highly resilient: incoming callers can spell it out at their unique natural pinyin rate and the system will automatically prompt the user to confirm the name selection after the caller exits pinyin. If the incoming caller pauses significantly and then continues to pinyin, the system will automatically determine the speech recognition. "This multi-round speech recognition step works well even through a noisy phone channel." This step is lost between rounds. N sets of best hypotheses and delay the most expensive calculation steps until the final round, at which time the list of candidates for possible names is greatly reduced. As a result of the best N-group, multi-round identification procedure, the present invention can be manufactured using low-cost DSP circuits. Although the invention has been described in its preferred embodiments, it will be understood that the invention may be modified without departing from the spirit of the scope of the appended patents. Therefore, for example | different configurations can be connected with different types of current and future telephone systems, including analog systems and digital systems '': ------------------- (Please read the note on the back first Please fill in this page for further information), tr This paper size applies Chinese National Standard (CNS) A4 specification (210X297 mm) V. Description of the invention (/ y) A7 B7 Component reference table 10 Caller phone 12 Identification system 13 Name removal System 14 Speech analysis module 26a, 28a Identification block 26b, 28b Identification block 38a, 38b Dynamic programming planning alignment surface 39 Name dictionary 40 Decision strategy module 42 Dynamic syntax establishment module 44 Identifier 210 PBX switch 212 Fiber under the telephone network 214 telephone line 216 telephone 218 call router 220 line 222 line 224 digital signal processor 226 analog-to-digital digital-to-ratio conversion circuit 228 main processor 230 data storage 232 call switching logic M ^ — (Please read first Note on the back, please fill out this page again) Order Λ Printed by the Central Consumers Bureau of the Ministry of Economic Affairs Printed by the Consumer Cooperatives 22 This paper size applies to Chinese National Standards (CNS) Α4 specifications (210X297 mm) 5. Description of the invention (Μ;
FIRST PASS G1 grammar: A7 B7FIRST PASS G1 grammar: A7 B7
APPENDIX AAPPENDIX A
Hypothesis 1: ocfeylson Hypothesis 2: onseylson G2 grammar: Letters spotted at frame 104Hypothesis 1: ocfeylson Hypothesis 2: onseylson G2 grammar: Letters spotted at frame 104
Hypothesis 1: wylson (請先閲讀背面之注意事項再填寫本頁)Hypothesis 1: wylson (Please read the notes on the back before filling out this page)
DICTIONARY ALIGNMENT PASS N-best candidates from Gl:DICTIONARY ALIGNMENT PASS N-best candidates from Gl:
,1T of of of of of of of of 8: 8: 8: 8: 8: 8: 8: 8:, 1T of of of of of of 8: 8: 8: 8: 8: 8: 8: 8:
CandidateCandidate
CandidateCandidate
CandidateCandidate
CandidateCandidate
CandidateCandidate
CandidateCandidate
CandidateCandidate
Candidate neilson masterson nielson andersson carlson nelson anderson patterson 經濟部中央標準局員工消費合作社印製 M-best candidates from G2:Candidate neilson masterson nielson andersson carlson nelson anderson patterson M-best candidates from G2:
Candidate 1 Candidate 2 Candidate 3 Candidate 4 Candidate 5 Candidate 6 Candidate 7 Candidate 8 Candidate 9 ooooooooo lllllxxxl fffffffff oooooooooCandidate 1 Candidate 2 Candidate 3 Candidate 4 Candidate 5 Candidate 6 Candidate 7 Candidate 8 Candidate 9 ooooooooo lllllxxxl fffffffff ooooooooo
Wilson walton wasson watson nelson folsom urmson bylsma olson 23 本紙張尺度適用中國國家標準(CNS ) A4規格(210X297公釐) A7 B7 五、發明説明(2/ )Wilson walton wasson watson nelson folsom urmson bylsma olson 23 This paper size applies to Chinese National Standard (CNS) A4 specification (210X297 mm) A7 B7 V. Description of the invention (2 /)
Candidate 10 of 10: sissonCandidate 10 of 10: sisson
COSTLY CONSTRAINED PASSCOSTLY CONSTRAINED PASS
Hypothesis 1: wilson 310 frames (3.1 seconds) in signal. n I I i I 裝 I I 訂— ^ |旅 ' . : (請先閱讀背面之注意事項再填寫本頁) 經濟部中央標準局員工消費合作社印製 24 本紙張尺度適用中國國家樣準(CNS ) A4規格(210X297公釐) A7 B7 五、發明説明(n )Hypothesis 1: wilson 310 frames (3.1 seconds) in signal. N II i I Binding II Binding — ^ | 旅 '.: (Please read the precautions on the back before filling out this page) Printed by the Consumer Cooperatives of the Central Standards Bureau of the Ministry of Economic Affairs 24 This paper size applies to China National Standard (CNS) A4 specification (210X297 mm) A7 B7 V. Description of invention (n)
Appendix BAppendix B
Example of Priority*Queue Backtrack Procedure for "JONES" • Last Frame Number is 10; Three Hypothesis Nodes Saved 1. node 6 (s) 2. node 6 (r) 3. node 4 (d) prob 0.9 prob 0.8 prob 0.6 duration 18 frames duration 20 frames duration 12 frames prev. node 3 prev. node 3 prev. node 2 frame #100 frame #100 frame #100 • Build Priority Queue: (Order of Decreasing Probability): (請先M讀背面之注意事項再填寫本頁) 訂Example of Priority * Queue Backtrack Procedure for " JONES " • Last Frame Number is 10; Three Hypothesis Nodes Saved 1. node 6 (s) 2. node 6 (r) 3. node 4 (d) prob 0.9 prob 0.8 prob 0.6 duration 18 frames duration 20 frames duration 12 frames prev. node 3 prev. node 3 prev. node 2 frame # 100 frame # 100 frame # 100 • Build Priority Queue: (Order of Decreasing Probability): (Please read the note on the back first (Please fill in this page again)
經濟部中央標準局員工消費合作社印製 node 3 (e) prob. 0.9 (prob. of parent) duration 10 prev. node 1 frame: 100-18=82 node 2 (a) prob. 0.7 prob. of parent (0.9) duration 10 prev. node 8 frame: 100-18=82Printed by the Consumer Cooperatives of the Central Standards Bureau of the Ministry of Economic Affairs node 3 (e) prob. 0.9 (prob. Of parent) duration 10 prev. Node 1 frame: 100-18 = 82 node 2 (a) prob. 0.7 prob. Of parent ( 0.9) duration 10 prev. Node 8 frame: 100-18 = 82
Insert New Hypothesis Node in Priority Queue:Insert New Hypothesis Node in Priority Queue:
(Children Replace Parent Node in Queue) 25(Children Replace Parent Node in Queue) 25
Expand Maximum Likelihood Node, Extending Sequence Backward: 本紙張尺度適用中國國家標準(CNS ) A4規格(21〇X297公釐) 經濟部中央標準局員工消费合作社印装 A7 B7 五、發明説明(θ )Expand Maximum Likelihood Node, Extending Sequence Backward: This paper size applies to China National Standard (CNS) A4 (21 × 297 mm) Printed by the Consumer Cooperatives of the Central Standards Bureau of the Ministry of Economic Affairs A7 B7 V. Description of the invention (θ)
Appendix B (cont.)Appendix B (cont.)
Begin Procedure BackTrack { Initialize backtrack priority-queue Q }Begin Procedure BackTrack {Initialize backtrack priority-queue Q}
For each grammar-terminal state S BeginFor each grammar-terminal state S Begin
If S has active list hl,h2.....hn of hypotheses in final frame TIf S has active list hl, h2 ..... hn of hypotheses in final frame T
BeginBegin
For each active hypothesis h Begin generate node N FLscore <- h.score N.sequence <- h.symbol N.duration <- h.duration N.predecessor <- h.predecessor N.time <- T enqueue N in Q End ForFor each active hypothesis h Begin generate node N FLscore <-h.score N.sequence <-h.symbol N.duration <-h.duration N.predecessor <-h.predecessor N.time <-T enqueue N in Q End For
End If End For { Process priority-queue Q, generating n-best sequences. }End If End For {Process priority-queue Q, generating n-best sequences.}
NumSequences <- 0NumSequences <-0
While Queue nonempty and NumSequences < n BeginWhile Queue nonempty and NumSequences < n Begin
Dequeue first (top-scoring) node N from QDequeue first (top-scoring) node N from Q
If N's predecessor is grammar-initial state Begin send N.sequence to output NumSequences <- NumSequences + 1If N's predecessor is grammar-initial state Begin send N.sequence to output NumSequences <-NumSequences + 1
End If { Expand N to generate child nodes.End If {Expand N to generate child nodes.
Add child nodes to priority queue Q } T <- N.time - N.duration S <- N.predecessorAdd child nodes to priority queue Q} T <-N.time-N.duration S <-N.predecessor
For each active hypothesis h for state S in frame T BeginFor each active hypothesis h for state S in frame T Begin
generate node C C.score <- N.score - (best score for S in frame T - h.score) Appendix B (cont.) C.sequence <- concatenation of h.symbol and N.sequence C.duration <- h.duration 二 C.predecessor <- h.predecessor ‘ C.time <- T enqueue C in Q End For 26 本紙張尺度適用中國國家標準(CNS ) A4規格(2丨〇X297公釐) ---1-------裝-- -' (請先閲讀背面之注意事項再填寫本頁) ,-ιτgenerate node C C.score <-N.score-(best score for S in frame T-h.score) Appendix B (cont.) C.sequence <-concatenation of h.symbol and N.sequence C.duration <-h.duration II C.predecessor <-h.predecessor 'C.time <-T enqueue C in Q End For 26 This paper size applies Chinese National Standard (CNS) A4 specification (2 丨 〇X297 mm ) --- 1 ------- install --- '(Please read the precautions on the back before filling in this page), -ιτ
A A7 B7 五、發明説明U4)A A7 B7 V. Invention Description U4)
End WhileEnd Procedure BackTrack (請先閲讀背面之注意事項再填寫本頁) ,草End WhileEnd Procedure BackTrack (Please read the precautions on the back before filling out this page)
.1T 經濟部中央標準局員工消費合作社印製 27 本紙張尺度適用中國國家標準(CNS ) A4規格(210X297公釐).1T Printed by the Consumer Cooperatives of the Central Bureau of Standards of the Ministry of Economics 27 This paper size applies to China National Standard (CNS) A4 (210X297 mm)
Claims (1)
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US08/834,358 US5991720A (en) | 1996-05-06 | 1997-04-16 | Speech recognition system employing multiple grammar networks |
Publications (1)
Publication Number | Publication Date |
---|---|
TW394926B true TW394926B (en) | 2000-06-21 |
Family
ID=25266744
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
TW87116807A TW394926B (en) | 1997-04-16 | 1998-10-09 | Speech recognition system employing multiple grammar networks |
Country Status (1)
Country | Link |
---|---|
TW (1) | TW394926B (en) |
-
1998
- 1998-10-09 TW TW87116807A patent/TW394926B/en not_active IP Right Cessation
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US5991720A (en) | Speech recognition system employing multiple grammar networks | |
US5799065A (en) | Call routing device employing continuous speech | |
EP0769184B1 (en) | Speech recognition methods and apparatus on the basis of the modelling of new words | |
Odell | The use of context in large vocabulary speech recognition | |
US6182039B1 (en) | Method and apparatus using probabilistic language model based on confusable sets for speech recognition | |
CA2202656C (en) | Speech recognition | |
US4618984A (en) | Adaptive automatic discrete utterance recognition | |
US5729656A (en) | Reduction of search space in speech recognition using phone boundaries and phone ranking | |
JP3434838B2 (en) | Word spotting method | |
KR100312920B1 (en) | Method and apparatus for connected speech recognition | |
US5732187A (en) | Speaker-dependent speech recognition using speaker independent models | |
US5983177A (en) | Method and apparatus for obtaining transcriptions from multiple training utterances | |
US7286989B1 (en) | Speech-processing system and method | |
US6671668B2 (en) | Speech recognition system including manner discrimination | |
US5664058A (en) | Method of training a speaker-dependent speech recognizer with automated supervision of training sufficiency | |
KR19980070329A (en) | Method and system for speaker independent recognition of user defined phrases | |
JPH0583918B2 (en) | ||
JPH0422276B2 (en) | ||
JPH10508392A (en) | Method and system for pattern recognition based on tree composition probability density | |
EP1022725A1 (en) | Selection of acoustic models using speaker verification | |
JP3124277B2 (en) | Speech recognition system | |
KR100379994B1 (en) | Verbal utterance rejection using a labeller with grammatical constraints | |
Lee et al. | Acoustic modeling of subword units for speech recognition | |
TW394926B (en) | Speech recognition system employing multiple grammar networks | |
Barnard et al. | Real-world speech recognition with neural networks |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
GD4A | Issue of patent certificate for granted invention patent | ||
MM4A | Annulment or lapse of patent due to non-payment of fees |