JPH0698319A

JPH0698319A - Voice tracing type camera focusing system

Info

Publication number: JPH0698319A
Application number: JP4243865A
Authority: JP
Inventors: Hiroshi Morimoto; 洋森本; Mikio Aoyama; 幹雄青山
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1992-09-14
Filing date: 1992-09-14
Publication date: 1994-04-08

Abstract

(57)【要約】【目的】テレビ会議システムにおける音声追尾型カメ
ラ照準方式に関し、複数の出席者による討議の状況が忠
実に他室に表示可能とすることを目的とする。【構成】出席者１００毎の発言を検出する音声検出手
段２００と、音声検出手段が検出する各発声源の位置情
報を記憶する発声源位置情報記憶手段３００と、音声検
出手段により検出される発言（に所定時間の停止遅延を
持たせる等）により、一乃至複数の同時発言者を識別す
る発声源識別手段４００と、発声源識別手段の識別結果
と、発声源位置情報記憶手段に記憶済の位置情報とに基
づき、総ての発言者を包含して所定位置に設置されたテ
レビカメラ６００により照準するに必要とする照準情報
を作成し、テレビカメラに伝達する照準情報作成手段５
００とを設ける様に構成する。 (57) [Summary] [Purpose] With regard to the audio tracking type camera aiming method in a video conference system, the purpose is to be able to faithfully display the status of discussions by multiple attendees in another room. [Structure] A voice detection unit 200 for detecting a speech of each attendant 100, a voice source position information storage unit 300 for storing position information of each voice source detected by the voice detection unit, and a voice detected by the voice detection unit. (By giving a stop delay of a predetermined time, etc.), the voice source identification means 400 for identifying one or a plurality of simultaneous speakers, the identification result of the voice source identification means, and the voice source position information storage means have already been stored. Based on the position information, aiming information creating means 5 for creating aiming information necessary for aiming by the TV camera 600 installed at a predetermined position including all speakers and transmitting the aiming information to the TV camera.
00 and 00 are provided.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、テレビ会議システムに
おける音声追尾型カメラ照準方式に関する。近年、複数
の会議室で行われる各会議の音声および映像を相互に転
送表示し、全員が参加した会議と同等の効果を発揮する
テレビ会議システムが実用されつつある。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice tracking type camera aiming system in a video conference system. 2. Description of the Related Art In recent years, a video conference system has been put into practical use, in which audio and video of each conference held in a plurality of conference rooms are mutually transferred and displayed, and an effect equivalent to that of a conference in which all members participate is exhibited.

【０００２】この種のテレビ会議システムにおいては、
各会議室における複数の発言者による討議状況が、極力
忠実に他室にも表示されることが望まれる。In this type of video conference system,
It is desirable that the discussion status by multiple speakers in each conference room be displayed in the other room as faithfully as possible.

【０００３】[0003]

【従来の技術】当初のテレビ会議システムにおいては、
会議室内の討議状況を極力忠実に撮影する為に、テレビ
カメラを専任のオペレータにより操作することが考慮さ
れていたが、近年、オペレータを不要とする目的で、音
声追尾型のテレビ会議システムが実用されている。2. Description of the Related Art In the original video conference system,
In order to capture the debate situation in the conference room as faithfully as possible, it was considered to operate the TV camera by a dedicated operator, but in recent years, a voice tracking type video conference system has been put to practical use for the purpose of eliminating the operator. Has been done.

【０００４】従来ある音声追尾型のテレビ会議システム
においては、会議の出席者の発言をマイクロホン等で検
出し、発言を検出した一人の出席者にカメラを照準し、
当該発言者の映像を他室に転送表示していた。〔本文に
おいては、所定の目標物を含む所要の範囲の映像を撮影
する為に、カメラを制御することを「照準」と称す
る〕。In a conventional voice-tracking type video conference system, the utterance of a participant in a conference is detected by a microphone or the like, and one attendee who detects the utterance is aimed at the camera.
The video of the speaker was transferred to another room and displayed. [In the text, controlling a camera to capture an image of a required range including a predetermined target is referred to as "aiming"].

【０００５】従って、複数の出席者が発言した場合に
は、検出した各発言の音量を比較し、音量の最も大きい
出席者一人を照準せざるを得なかった。Therefore, when a plurality of attendees speak, the volume of each detected speech must be compared and one attendee with the highest volume must be aimed at.

【０００６】[0006]

【発明が解決しようとする課題】以上の説明から明らか
な如く、従来ある音声追尾型のテレビ会議システムにお
いては、複数の出席者が同時に発言して討議を行ってい
る場合にも、音量の最も大きい出席者一人の映像を他室
に転送表示していた為、複数の出席者による討議の状況
が忠実に他室に表示出来ぬ問題があった。As is clear from the above description, in the conventional voice-tracking type video conference system, even when a plurality of attendees speak at the same time for discussion, the volume of the volume is the highest. Since the video of one large attendee was transferred and displayed in another room, there was a problem that the situation of discussions by multiple attendees could not be faithfully displayed in another room.

【０００７】本発明は、複数の出席者による討議の状況
が忠実に他室に表示可能とすることを目的とする。It is an object of the present invention to faithfully display the status of discussions by a plurality of attendees in another room.

【０００８】[0008]

【課題を解決するための手段】図１は本発明の原理を示
す図である。図１において、１００は会議への出席者、
６００は会議室の所定位置に設置されたテレビカメラで
ある。FIG. 1 is a diagram showing the principle of the present invention. In FIG. 1, 100 is a participant in the conference,
Reference numeral 600 denotes a TV camera installed at a predetermined position in the conference room.

【０００９】２００は、本発明により設けられた音声検
出手段である。３００は、本発明により設けられた発声
源位置情報記憶手段である。４００は、本発明により設
けられた発声源識別手段である。５００は、本発明によ
り設けられた照準情報作成手段である。Reference numeral 200 is a voice detecting means provided by the present invention. Reference numeral 300 is a voice source position information storage means provided by the present invention. Reference numeral 400 is a vocal source identifying means provided by the present invention. Reference numeral 500 is aiming information creating means provided by the present invention.

【００１０】[0010]

【作用】音声検出手段２００は、出席者１００毎の発言
を検出する。発声源位置情報記憶手段３００は、音声検
出手段２００が検出する各発声源の位置情報を記憶す
る。The voice detecting means 200 detects the speech of each attendee 100. The utterance source position information storage unit 300 stores the position information of each utterance source detected by the voice detection unit 200.

【００１１】発声源識別手段４００は、音声検出手段２
００により検出される発言により、一乃至複数の同時発
言者を識別する。照準情報作成手段５００は、発声源識
別手段４００の識別結果と、発声源位置情報記憶手段３
００に記憶済の位置情報とに基づき、総ての発言者を包
含してテレビカメラ６００により照準するに必要とする
照準情報を作成し、テレビカメラ６００に伝達する。The voicing source identifying means 400 is a voice detecting means 2
The utterances detected by 00 identify one or more simultaneous speakers. The aiming information creating means 500 includes the identification result of the utterance source identifying means 400 and the utterance source position information storing means 3.
Based on the position information stored in 00, aiming information necessary for aiming by the TV camera 600 including all speakers is created and transmitted to the TV camera 600.

【００１２】なお発声源識別手段４００は、音声検出手
段２００が検出した任意の出席者１００の発言が停止し
た後、予め定められた期間が経過する迄、出席者１００
を発言者と見做すことが考慮される。It should be noted that the voicing source identifying means 400 uses the attendees 100 until a predetermined period elapses after the speech of any of the attendees 100 detected by the voice detecting means 200 is stopped.
Considering as a speaker is considered.

【００１３】従って、複数の出席者が発言している討議
状況が他室にも表示可能となり、当該テレビ会議システ
ムの利便性が大幅に向上する。Therefore, the discussion status in which a plurality of attendees are speaking can be displayed in another room, and the convenience of the video conference system is greatly improved.

【００１４】[0014]

【実施例】以下、本発明の一実施例を図面により説明す
る。図２は本発明の一実施例によるテレビ会議システム
を示す図であり、図３は本発明の一実施例による照準情
報を説明する図であり、図４は図２における映像の一例
を示す図である。なお、全図を通じて同一符号は同一対
象物を示す。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS An embodiment of the present invention will be described below with reference to the drawings. 2 is a diagram showing a video conference system according to an embodiment of the present invention, FIG. 3 is a diagram explaining aiming information according to an embodiment of the present invention, and FIG. 4 is a diagram showing an example of an image in FIG. Is. The same reference numerals denote the same objects throughout the drawings.

【００１５】図２乃至図４においては、図１における出
席者１００として出席者２が示され、また図１における
音声検出手段２００としてマイクロホン（Ｍ）３および
中央制御装置５内の音声受信部５１が設けられ、また図
１における発声源位置情報記憶手段３００として基本情
報部６１および個別情報部６２が主記憶装置６内に設け
られ、また図１における発声源識別手段４００として発
声源検出部５２が中央制御装置５内に設けられ、また図
１における照準情報作成手段５００として焦点演算部５
４、照準制御部５５および姿勢制御部５６が中央制御装
置５内に設けられ、更に図１におけるテレビカメラ６０
０としてカメラ４およびカメラ駆動部４１が設けられて
いる。2 to 4, the attendee 2 is shown as the attendee 100 in FIG. 1, and the microphone (M) 3 as the voice detecting means 200 in FIG. 1 and the voice receiving unit 51 in the central control unit 5 are shown. 1, the basic information section 61 and the individual information section 62 are provided in the main storage device 6 as the utterance source position information storage means 300 in FIG. 1, and the utterance source detection section 52 is provided as the utterance source identification means 400 in FIG. Is provided in the central controller 5, and the focus calculation section 5 is used as the aiming information creating means 500 in FIG.
4, a sighting control unit 55 and an attitude control unit 56 are provided in the central control unit 5, and the television camera 60 in FIG.
The camera 4 and the camera drive unit 41 are provided as 0.

【００１６】図２においては、それぞれ遠隔のｎ箇所に
設置されたテレビ会議システム１０（個々のテレビ会議
システムを１０₁乃至１０_nと称する）が、情報通信網
２０により相互に接続されており、その内テレビ会議シ
ステム１０₁の構成のみが詳細に表示されており、他の
テレビ会議システム１０₂乃至１０_nの構成は省略され
ている。In FIG. 2, video conferencing systems 10 (individual video conferencing systems are referred to as 10 _{1 to} 10 _n ) respectively installed at n remote locations are interconnected by an information communication network 20, Among them, only the configuration of the video conference system 10 ₁ is displayed in detail, and the configurations of the other video conference systems 10 _{2 to} 10 _n are omitted.

【００１７】テレビ会議システム１０₁が設置された会
議室内には、「コの字」形の会議机１の周辺に、９名の
出席者２（個々の出席者を２₁乃至２₉と称する、以下
同様）が着席し、会議を開催する。[0017] in a conference room with a TV conference system 10 ₁ has been installed, in the vicinity of the conference table 1 of the "U" shape, referred to as the 2 ₁ or 2 ₉ attendees 2 (individual attendees of 9 people , And so on) will sit down and hold a conference.

【００１８】なお会議机１の上には、各出席者２に対応
してそれぞれ専用のマイクロホン（Ｍ）３が設置されて
おり、また会議机１の正面には、カメラ４が設置されて
いる。A dedicated microphone (M) 3 corresponding to each attendee 2 is installed on the conference desk 1, and a camera 4 is installed in front of the conference desk 1. .

【００１９】なお各マイクロホン（Ｍ）３およびカメラ
４の位置は不変とする。また会議室内の各物体の位置
は、図３に示される如く、カメラ４（のレンズ）を原点
とし、床面に並行に設けられたｘ軸およびｙ軸と、床面
に垂直に設けられたｚ軸とによる三次元座標（ｘ、ｙ、
ｚ）により表示される。The positions of the microphones (M) 3 and the camera 4 are unchanged. As shown in FIG. 3, the position of each object in the conference room is perpendicular to the floor with the x-axis and y-axis parallel to the floor, with the camera 4 (lens) as the origin. Three-dimensional coordinates (x, y,
z).

【００２０】一方、主記憶装置６内には、基本情報部６
１、個別情報部６２および照準記録部６３が設けられ、
基本情報部６１には、当該会議室および開催される会議
に関する基本的な情報、例えば当該テレビ会議システム
が対象する会議室の大きさを表す会議室寸法
（ｘ_max）、（ｙ_max）および（ｚ_max）と、各マイク
ロホン（Ｍ）３の設置位置から対応する出席者２を包含
する範囲を表すｘ軸補正距離（Δｘ）およびｚ軸補正距
離（Δｚ）と、当該テレビ会議システムが音声信号とし
て認識する音量の範囲を示す最大音量値（Ｖ_max）およ
び最小音量値（Ｖ_min）とが格納されており、また個別
情報部６２には、各マイクロホン（Ｍ）３₁乃至３ ₉の
設置位置を示す位置ベクトル（Ｆ₁〔ｘ₁、ｙ₁、
ｚ₁〕）乃至（Ｆ₉〔ｘ ₉、ｙ₉、ｚ₉〕）が格納され
ている。On the other hand, in the main storage device 6, the basic information section 6
1, the individual information section 62 and the aim recording section 63 are provided,
The basic information section 61 includes the conference room and the conference to be held.
Basic information about the video conferencing system, for example
Meeting room size that represents the size of the meeting room
(X_max), (Y_max) And (z_max) And each microphone
Including attendees 2 from the installation position of Rohon (M) 3
X-axis corrected distance (Δx) and z-axis corrected distance
With the separation (Δz), the video conferencing system outputs an audio signal.
Maximum volume value (V_max) And
And minimum volume value (V_min) And are stored, and also individually
In the information section 62, each microphone (M) 3₁Through 3 ₉of
Position vector (F₁[X₁, Y₁,
z₁]) To (F₉[X ₉, Y₉, Z₉]) Is stored
ing.

【００２１】なお照準記録部６３には、当該テレビ会議
システムが前回作成した照準情報、即ちカメラ４の照準
現状を示す照準情報が格納されている。各マイクロホン
（Ｍ）３は、それぞれ対応する出席者２が発声する音声
信号（Ｖ）を受信し、それぞれ独立に中央制御装置５内
の音声受信部５１に伝達する。The aiming recording section 63 stores aiming information previously created by the video conference system, that is, aiming information indicating the present aiming state of the camera 4. Each microphone (M) 3 receives the voice signal (V) uttered by the corresponding attendee 2 and independently transmits the voice signal (V) to the voice receiving unit 51 in the central control device 5.

【００２２】音声受信部５１は、各マイクロホン（Ｍ）
３から伝達される音声信号（Ｖ）を、それぞれ独立に発
声源検出部５２に伝達すると共に、各音声信号を合成し
て入出力制御部５３に伝達する。The voice receiving unit 51 includes microphones (M).
The voice signals (V) transmitted from the A. 3 are independently transmitted to the voicing source detection unit 52, and the respective voice signals are combined and transmitted to the input / output control unit 53.

【００２３】ここで、出席者２₁および２₇が同時に発
言したとすると、マイクロホン（Ｍ）３₁および３₇が
共に音声信号（Ｖ₁）および（Ｖ₇）を受信してそれぞ
れ独立に音声受信部５１に伝達し、また音声受信部５１
が各音声信号（Ｖ₁）および（Ｖ₇）をそれぞれ独立に
発声源検出部５２に伝達すると共に、各音声信号
（Ｖ ₁）および（Ｖ₇）を合成して入出力制御部５３に
伝達する。Attendee 2₁And 2₇Are emitted at the same time
If you say, microphone (M) 3₁And 3₇But
Both are audio signals (V₁) And (V₇) Receive it
Independently transmitted to the voice receiving unit 51, and also transmitted to the voice receiving unit 51.
Each voice signal (V₁) And (V₇) Each independently
Each voice signal is transmitted to the voicing source detector 52.
(V ₁) And (V₇) Are synthesized into the input / output controller
introduce.

【００２４】発声源検出部５２は、音声受信部５１から
伝達された音声信号（Ｖ₁）および（Ｖ₇）を、基本情
報部６１に格納されている最大音量値（Ｖ_max）および
最小音量値（Ｖ_min）と比較し、何れも正常な音声信号
の範囲内に収まることを認識すると、出席者２₁および
２₇が同時に発言したと認識し、個別情報部６２に格納
されているマイクロホン（Ｍ）３₁およひ３₇の位置ベ
クトル（Ｆ₁〔ｘ₁、ｙ₁、ｚ₁〕）および（Ｆ₇〔ｘ
₇、ｙ₇、ｚ₇〕）を抽出し、焦点演算部５４に伝達す
る。The voicing source detection unit 52 uses the voice signals (V ₁ ) and (V ₇ ) transmitted from the voice receiving unit 51 as the maximum volume value (V _max ) and the minimum volume value stored in the basic information unit 61. When comparing with the value (V _min ) and recognizing that both are within the range of a normal audio signal, it is recognized that the attendees 2 ₁ and 2 ₇ simultaneously speak, and the microphones stored in the individual information section 62 are recognized. (M) 3 ₁ and 3 ₇ position vectors (F ₁ [x ₁ , y ₁ , z ₁ ]) and (F ₇ [x
₇ , y ₇ , z ₇ ]) are extracted and transmitted to the focus calculation unit 54.

【００２５】最初に焦点演算部５４は、発言中と認識さ
れた出席者２₁および２₇を同一映像内に収める為の、
カメラ４の焦点距離（Ｌ_f）を算出する。先ず焦点演算
部５４は、発声源検出部５２から伝達された二つの位置
ベクトル（Ｆ₁〔ｘ₁、ｙ₁、ｚ₁〕）および（Ｆ
₇〔ｘ₇、ｙ₇、ｚ₇〕）の、各ｘ軸成分（ｘ₁）およ
び（ｘ₇）、ｙ軸成分（ｙ₁）および（ｙ₇）、並びに
ｚ軸成分（ｚ₁）および（ｚ₇）のそれぞれ平均値（ｘ
_f）、（ｙ_f）および（ｚ_f）を求め、各平均値
（ｘ_f）、（ｙ_f）および（ｚ_f）をそれぞれｘ軸成
分、ｙ軸成分およびｚ軸成分とする位置ベクトル（Ｆ_f
〔ｘ_f、ｙ_f、ｚ_f〕）により定まる点（Ｐ〔ｘ_f、ｙ
_f、ｚ_f〕）を焦点（Ｐ_f）と定める。First, the focus calculation unit 54 stores the attendees 2 ₁ and 2 ₇ recognized as speaking in the same image.
The focal length (L _f ) of the camera 4 is calculated. First, the focus calculation unit 54 receives the two position vectors (F ₁ [x ₁ , y ₁ , z ₁ ]) and (F ₁ ) transmitted from the vocal source detection unit 52.
₇ [x ₇ , y ₇ , z ₇ ]), each x-axis component (x ₁ ) and (x ₇ ), y-axis component (y ₁ ) and (y ₇ ), and z-axis component (z ₁ ) and The average value of each (z ₇ ) (x
_f ), (y _f ) and (z _f ) are calculated, and a position vector (x _f ), (y _f ) and (z _f ) respectively having a mean value (x _f ), a y axis component and a z axis component ( F _f
[X _f , y _f , z _f ]) defined by a point (P [x _f , y
_f , z _f ]) is defined as the focal point (P _f ).

【００２６】続いて焦点演算部５４は、カメラ４（座標
原点）から焦点（Ｐ_f）迄の焦点距離（Ｌ_f）、即ち位
置ベクトル（Ｆ_f〔ｘ_f、ｙ_f、ｚ_f〕）の長さを、各
成分（ｘ_f）、（ｙ_f）および（ｚ_f）を用いて算出す
る。[0026] Then the focus calculating section 54, the camera 4 of the focal length of the (coordinate origin) until the focus (P _f) (L _f), that is, the position vector (F _f [x _{_{_f,}} y _f, z _f]) the length, the components (x _f), calculated using the (y _f) and (z _f).

【００２７】次に焦点演算部５４は、発言中と認識され
た出席者２₁および２₇を同一映像内に収める為の、カ
メラ４の照準方向を示すｘ軸角度（θ_x）およびｚ軸角
度（θ_z）と、カメラ４の照準範囲を示すｘ軸範囲角度
（Δθ_x）およびｚ軸範囲角度（Δθ_z）とを算出す
る。Next, the focus calculation section 54 sets the x-axis angle (θ _x ) and the z-axis indicating the aiming direction of the camera 4 so that the attendees 2 ₁ and 2 ₇ recognized as speaking can be included in the same image. The angle (θ _z ) and the x-axis range angle (Δθ _x ) and the z-axis range angle (Δθ _z ) indicating the aiming range of the camera 4 are calculated.

【００２８】先ず焦点演算部５４は、基本情報部６１か
らｘ軸補正距離（Δｘ）を抽出し、抽出したｘ軸補正距
離（Δｘ）によりマイクロホン（Ｍ）３₁および３₇の
位置ベクトル（Ｆ₁）および（Ｆ₇）のｘ軸成分を補正
することにより、出席者２₁および２₇を包含する位置
ベクトル（Ｆ_1DX〔ｘ₁−Δｘ、ｙ₁、ｚ₁〕）および
（Ｆ_7DX〔ｘ₇＋Δｘ、ｙ₇、ｚ₇〕）を求め、求めら
れた各位置ベクトル（Ｆ_1DX〔ｘ₁−Δｘ、ｙ₁、
ｚ₁〕）および（Ｆ_7DX〔ｘ₇＋Δｘ、ｙ₇、ｚ₇〕）
のｘ軸角度（θ_x1）および（θ_x7）と、ｚ軸角度
（θ_z1）および（θ_z7）とを求め、更に求められた二つ
のｘ軸角度（θ_x1）および（θ_x7）の平均値（θ_x）
と、二つのｚ軸角度（θ_z1）および（θ_z7）の平均値
（θ_z）とを求め、求められた平均値（θ_x）および
（θ_z）を、照準方向を示すｘ軸角度（θ_x）およびｚ
軸角度（θ_z）とする。First, the focus calculation unit 54 extracts the x-axis correction distance (Δx) from the basic information unit 61, and uses the extracted x-axis correction distance (Δx) to detect the position vector (F) of the microphones (M) 3 ₁ and 3 _7. By correcting the x-axis components of ( ₁ ) and (F ₇ ), the position vectors (F _1DX [x ₁ −Δx, y ₁ , z ₁ ]) and (F _7DX [including attendees 2 ₁ and 2 _7] are corrected. x ₇ + Δx, y ₇ , z ₇ ]), and the obtained position vectors (F _1DX [x ₁ −Δx, y ₁ ,
z ₁ ]) and (F _7DX [x ₇ + Δx, y ₇ , z ₇ ])
X-axis angles (θ _x1 ) and (θ _x7 ) and z-axis angles (θ _z1 ) and (θ _z7 ), and two further obtained x-axis angles (θ _x1 ) and (θ _x7 ). Average value (θ _x )
And the average value (θ _z ) of the two z-axis angles (θ _z1 ) and (θ _z7 ), and the calculated average values (θ _x ) and (θ _z ) are the x-axis angles indicating the sighting direction. (Θ _x ) and z
Axial angle (θ _z ).

【００２９】続いて焦点演算部５４は、照準方向を示す
ｘ軸角度（θ_x）と、位置ベクトル（Ｆ_1DX）のｘ軸角
度（θ_x1）または（Ｆ_7DX）のｘ軸角度（θ_x7）との差
角（θ_x−θ_x1）または（θ_x7−θ_x）を求め、求めら
れた差角（θ_x−θ_x1）＝（θ_x7−θ_x）をｘ軸範囲角
度（Δθ_x）とし、また照準方向を示すｚ軸角度
（θ _z）と、位置ベクトル（Ｆ_1Dz）のｚ軸角度
（θ_z1）または（Ｆ_7Dz）のｚ軸角度（θ_z7）との差角
（θ_z−θ_z1）または（θ_z7−θ_z）を求め、求められ
た差角（θ_z1−θ_z）＝（θ_z−θ_z7）を、ｚ軸範囲角
度（Δθ_z）とする。Subsequently, the focus calculator 54 indicates the aiming direction.
x-axis angle (θ_x) And the position vector (F_1DX) X-axis angle
Degree (θ_x1) Or (F_7DX) X-axis angle (θ_x7) Difference
Angle (θ_x−θ_x1) Or (θ_x7−θ_x) Asked for
Difference angle (θ_x−θ_x1) = (Θ_x7−θ_x) Is the x-axis range angle
Degree (Δθ_x), And the z-axis angle indicating the aiming direction
(Θ _z) And the position vector (F_1Dz) Z-axis angle
(Θ_z1) Or (F_7Dz) Z-axis angle (θ_z7) Difference angle
(Θ_z−θ_z1) Or (θ_z7−θ_z) Asked for
Difference angle (θ_z1−θ_z) = (Θ_z−θ_z7) Is the z-axis range angle
Degree (Δθ_z).

【００３０】以上により焦点演算部５４は、発言中と判
定された出席者２₁および２₇を同一映像内に収める為
に必要とする照準情報として、焦点距離（Ｌ_f）、ｘ軸
角度（θ_x）、ｚ軸角度（θ_z）、ｘ軸範囲角度（Δθ
_x）およびｚ軸範囲角度（Δθ_z）を算出終了する。As described above, the focus calculation section 54 uses the focal length (L _f ) and the x-axis angle (L _f ), as the aiming information necessary for keeping the attendees 2 ₁ and 2 ₇ determined to be speaking in the same image. θ _x ), z-axis angle (θ _z ), x-axis range angle (Δθ
_x ) and the z-axis range angle (Δθ _z ) are calculated.

【００３１】焦点演算部５４は、算出した焦点距離（Ｌ
_f）、ｘ軸範囲角度（Δθ_x）およびｚ軸範囲角度（Δ
θ_z）を照準制御部５５に伝達し、またｘ軸角度
（θ_x）およびｚ軸角度（θ_z）を姿勢制御部５６に伝
達する。The focus calculation unit 54 calculates the calculated focal length (L
_f ), the x-axis range angle (Δθ _x ) and the z-axis range angle (Δ
θ _z ) is transmitted to the aiming control unit 55, and the x-axis angle (θ _x ) and the z-axis angle (θ _z ) are transmitted to the attitude control unit 56.

【００３２】照準制御部５５は、照準記録部６３に格納
されている照準情報、即ちカメラ４の照準現状を示す照
準情報から焦点距離（Ｌ_f）、ｘ軸範囲角度（Δθ_x）
およびｚ軸範囲角度（Δθ_z）を抽出し、焦点演算部５
４から伝達された焦点距離（Ｌ_f）、ｘ軸範囲角度（Δ
θ_x）およびｚ軸範囲角度（Δθ_z）との差分を算出
し、カメラ駆動部４１に伝達した後、照準記録部６３に
格納されている焦点距離（Ｌ_f）、ｘ軸範囲角度（Δθ
_x）およびｚ軸範囲角度（Δθ_z）を、焦点演算部５４
から伝達された焦点距離（Ｌ_f）、ｘ軸範囲角度（Δθ
_x）およびｚ軸範囲角度（Δθ_z）により更新し、また
姿勢制御部５６は、照準記録部６３に格納されている照
準情報、即ちカメラ４の照準現状を示す照準情報からｘ
軸角度（θ _x）およびｚ軸角度（θ_z）を抽出し、焦点
演算部５４から伝達されたｘ軸角度（θ_x）およびｚ軸
角度（θ_z）との差分を算出し、カメラ駆動部４１に伝
達した後、照準記録部６３に格納されているｘ軸角度
（θ_x）およびｚ軸角度（θ_z）を、焦点演算部５４か
ら伝達されたｘ軸角度（θ_x）およびｚ軸角度（θ_z）
により更新する。The aiming control unit 55 is stored in the aiming recording unit 63.
The aiming information that is being provided, that is, the aiming status of the camera 4
From quasi information to focal length (L_f), X-axis range angle (Δθ_x)
And z-axis range angle (Δθ_z) Is extracted and the focus calculation unit 5
Focal length (L_f), X-axis range angle (Δ
θ_x) And the z-axis range angle (Δθ_z) With
Then, after transmitting it to the camera drive unit 41,
Stored focal length (L_f), X-axis range angle (Δθ
_x) And the z-axis range angle (Δθ_z), The focus calculation unit 54
Focal length (L_f), X-axis range angle (Δθ
_x) And the z-axis range angle (Δθ_z) Updated by
The attitude control unit 56 uses the aim recording unit 63 to store the aim
X from the aiming information, that is, aiming information indicating the present aiming state of the camera 4.
Axis angle (θ _x) And the z-axis angle (θ_z) Extract and focus
The x-axis angle (θ_x) And z-axis
Angle (θ_z) Is calculated and transmitted to the camera drive unit 41.
After reaching, the x-axis angle stored in the aim recording unit 63
(Θ_x) And the z-axis angle (θ_z) From the focus calculation unit 54
X-axis angle (θ_x) And the z-axis angle (θ_z)
To update.

【００３３】カメラ駆動部４１は、照準制御部５５から
伝達された焦点距離（Ｌ_f）、ｘ軸範囲角度（Δθ_x）
およびｚ軸範囲角度（Δθ_z）の差分によりカメラ４の
焦点距離（Ｌ_f）および照準範囲を更新し、また姿勢制
御部５６から伝達されたｘ軸角度（θ_x）およびｚ軸角
度（θ_z）の差分により、カメラ４の照準方向を更新す
る。The camera drive unit 41 has the focal length (L _f ) and the x-axis range angle (Δθ _x ) transmitted from the aiming control unit 55.
And the z-axis range angle (Δθ _z ) are used to update the focal length (L _f ) and aiming range of the camera 4, and the x-axis angle (θ _x ) and the z-axis angle (θ) transmitted from the attitude control unit 56. The aiming direction of the camera 4 is updated based on the difference of _z ).

【００３４】照準状況を更新されたカメラ４は、図４に
示される如く、出席者２₁および２ ₇を包含する映像を
作成し、映像信号（Ｉ）として中央制御装置５内の映像
受信部５７に伝達する。The camera 4 having the updated sighting condition is shown in FIG.
Attendee 2 as shown₁And 2 ₇A video that includes
Image created in the central controller 5 as a video signal (I)
It is transmitted to the receiving unit 57.

【００３５】映像受信部５７は、カメラ４から伝達され
た映像信号（Ｉ）を、入出力制御部５３に伝達される。
入出力制御部５３は、音声受信部５１から伝達された音
声信号（Ｖ₁）および（Ｖ₇）と、映像受信部５７から
伝達された映像信号（Ｉ）とを結合し、情報通信網２０
を経由して他のテレビ会議システム１０₂乃至１０_nに
伝達する。The image receiving unit 57 transmits the image signal (I) transmitted from the camera 4 to the input / output control unit 53.
The input / output control unit 53 combines the audio signals (V ₁ ) and (V ₇ ) transmitted from the audio receiving unit 51 with the video signal (I) transmitted from the video receiving unit 57, and connects the information communication network 20.
And is transmitted to other video conference systems 10 _{2 to} 10 _n via.

【００３６】また入出力制御部５３は、他のテレビ会議
システム１０₂乃至１０_nから情報通信網２０を経由し
て伝達される音声信号（Ｖ）および映像信号（Ｉ）（結
合済）を受信すると、音声信号（Ｖ）および映像信号
（Ｉ）に分離し、音声信号（Ｖ）はスピーカ３１に出力
して会議室内の各出席者２に聴取させ、映像信号（Ｉ）
はモニタ４２に出力して各出席者２に表示する。The input / output control unit 53 also receives the audio signal (V) and the video signal (I) (combined) transmitted from the other video conference systems 10 _{2 to} 10 _n via the information communication network 20. Then, the audio signal (V) and the video signal (I) are separated, and the audio signal (V) is output to the speaker 31 so that each attendee 2 in the conference room can hear the video signal (I).
Is output to the monitor 42 and displayed to each attendee 2.

【００３７】以上の説明から明らかな如く、本実施例に
よれば、テレビ会議システム１０₁は発言者２₁および
２₇を検出し、両者を包含する映像を撮影する如き照準
情報を算出し、カメラ４を照準させる為、他のテレビ会
議システム１０₂乃至１０_nに対して発言者２₁および
２₇の討議状況が一つの画面で忠実に転送表示すること
となる。As is apparent from the above description, according to the present embodiment, the video conference system 10 ₁ detects the speakers 2 ₁ and 2 ₇ and calculates aiming information such that an image including both the speakers is photographed. Since the camera 4 is aimed at, the discussion status of the speakers 2 ₁ and 2 ₇ is faithfully transferred and displayed on one screen to the other video conference systems 10 _{2 to} 10 _n .

【００３８】なお、図２はあく迄本発明の一実施例に過
ぎず、例えば発声源検出部５２は音声信号（Ｖ）の有無
のみにより発言中の出席者２を識別するものに限定され
ることは無く、各出席者２の音声信号が停止した後、予
め定められた経過時間の間は発言者として扱うことによ
り、複数の出席者２が交互に発言する間も総ての発言者
を包含した映像を保持させる等、他に幾多の変形が考慮
されるが、何れの場合にも本発明の効果は変わらない。
また本発明の対象とするテレビ会議システムは、図示さ
れるものに限定されぬことは言う迄も無い。It is to be noted that FIG. 2 is merely an embodiment of the present invention, and for example, the voicing source detection unit 52 is limited to the one for identifying the attendee 2 who is speaking only by the presence or absence of the voice signal (V). However, after the voice signal of each attendee 2 is stopped, it is treated as a speaker for a predetermined elapsed time, so that all the speakers can be treated while a plurality of attendees 2 alternately speak. Many other modifications are considered, such as holding the included image, but the effect of the present invention does not change in any case.
It goes without saying that the video conference system to which the present invention is applied is not limited to the one shown in the figure.

【００３９】[0039]

【発明の効果】以上、本発明によれば、前記テレビ会議
システムにおいて、複数の出席者が発言している討議状
況が他室にも表示可能となり、当該テレビ会議システム
の利便性が大幅に向上する。As described above, according to the present invention, in the video conference system, the discussion status in which a plurality of attendees are speaking can be displayed in another room, and the convenience of the video conference system is greatly improved. To do.

[Brief description of drawings]

【図１】本発明の原理を示す図FIG. 1 is a diagram showing the principle of the present invention.

【図２】本発明の一実施例によるテレビ会議システム
を示す図FIG. 2 is a diagram showing a video conference system according to an embodiment of the present invention.

【図３】本発明の一実施例による照準情報を説明する
図FIG. 3 is a diagram illustrating aiming information according to an embodiment of the present invention.

【図４】図２における映像の一例を示す図FIG. 4 is a diagram showing an example of an image in FIG.

[Explanation of symbols]

１会議机２、１００出席者３マイクロホン（Ｍ）４カメラ５中央制御装置６主記憶装置１０テレビ会議システム２０情報通信網３１スピーカ４１カメラ駆動部４２モニタ５１音声受信部５２発声源検出部５３入出力制御部５４焦点演算部５５照準制御部５６姿勢制御部５７映像受信部６１基本情報部６２個別情報部６３照準記録部２００音声検出手段３００発声源位置情報記憶手段４００発声源識別手段５００照準情報作成手段６００テレビカメラ 1 Conference Desk 2, 100 Attendees 3 Microphone (M) 4 Camera 5 Central Control Device 6 Main Storage Device 10 Video Conference System 20 Information Communication Network 31 Speaker 41 Camera Drive Unit 42 Monitor 51 Voice Receiver 52 Voice Source Detector 53 In Output control unit 54 Focus calculation unit 55 Aiming control unit 56 Posture control unit 57 Image receiving unit 61 Basic information unit 62 Individual information unit 63 Aiming recording unit 200 Voice detecting means 300 Vocal source position information storing means 400 Vocal source identifying means 500 Aiming information Creating means 600 TV camera

Claims

[Claims]

1. In a video conference system for transferring and displaying an image of a conference room to another room, a voice detecting means (2) for detecting a speech of each attendant (100).
00), a voice source position information storage unit (300) that stores the position information of each voice source detected by the voice detection unit (200), and a speech detected by the voice detection unit (200). Based on a voice source identification unit (400) for identifying a plurality of simultaneous speakers, an identification result of the voice source identification unit (400), and the position information stored in the voice source position information storage unit (300). Aiming information creating means (500) for creating aiming information necessary for aiming by the TV camera (600) installed at a predetermined position including all speakers and transmitting the aiming information to the TV camera (600). A voice tracking type camera aiming system characterized by providing and.

2. The voicing source identifying means (400) is any attendee (10) detected by the voice detecting means (200).
The voice tracking camera aiming system according to claim 1, wherein the attendant (100) is regarded as a speaker until a predetermined period of time elapses after the speech of (0) is stopped.