JP7568371B2

JP7568371B2 - System and method for activity target selection for robotic process automation - Patents.com

Info

Publication number: JP7568371B2
Application number: JP2022521760A
Authority: JP
Inventors: ヴィー．ヴォイク，コスミン; エイチ．ボボリー，ドラゴス; マイロン，アイオン; リパ，ボグダン; シー．パウネル，イリー
Original assignee: UiPath Inc
Current assignee: UiPath Inc
Priority date: 2019-10-14
Filing date: 2020-08-12
Publication date: 2024-10-16
Anticipated expiration: 2040-08-12
Also published as: WO2021076204A1; CN113015956B; CN113015956A; JP2022551933A; EP4046011A1; WO2021076205A1

Description

本発明は、ロボティック・プロセス・オートメーション（ＲＰＡ）に関し、特に、マウスクリック又はテキスト入力などのアクティビティの対象となるユーザーインターフェース要素を自動的に識別するシステム及び方法に関する。 The present invention relates to robotic process automation (RPA), and more particularly to a system and method for automatically identifying user interface elements that are the target of an activity such as a mouse click or text entry.

ＲＰＡは、反復的なコンピューティングタスクをオートメーション化することによって生産性を改善させることを目的とした情報技術の新興分野であり、したがって人間のオペレータは、知的に洗練された及び／又は創造的なアクティビティを自由に実施することができる。オートメーション化の対象となる注目すべきタスクは、文書から構造化データを抽出することと、とりわけ、例えばフォームに記入するためにユーザーインターフェースとインタラクトすることと、を含む。 RPA is an emerging field of information technology that aims to improve productivity by automating repetitive computing tasks, thus freeing human operators to perform intellectually sophisticated and/or creative activities. Notable tasks targeted for automation include extracting structured data from documents and interacting with user interfaces, for example to fill out forms, among others.

ＲＰＡ開発の明確な方針は、ソフトウェアロボットのプログラミング及び管理を単純化することに向けられており、高度なプログラミングスキル又はトレーニングを欠くユーザーに、ＲＰＡ技術の範囲を拡大するという最終的な目標がある。ＲＰＡのアクセスを容易にする１つの方法は、コーディング自体ではなく、グラフィカル・ユーザー・インターフェース（ＧＵＩ）ツールを介してロボットのプログラミングを可能にするＲＰＡ指向の統合開発環境（ＩＤＥ）の開発である。 A clear direction of RPA development is aimed at simplifying the programming and management of software robots, with the ultimate goal of expanding the reach of RPA technology to users who lack advanced programming skills or training. One way to make RPA more accessible is the development of RPA-oriented integrated development environments (IDEs) that allow programming of robots via graphical user interface (GUI) tools rather than coding itself.

しかしながら、ユーザーインターフェースとのインタラクションをオートメーション化することは、例えばボタン又はフォームフィールドなどの対象要素を明確に識別する、実質的な技術的課題を提起する。更に、ＲＰＡアプリケーションは、それぞれのソフトウェアロボットの設計と実行時との間で生じるインターフェース（例えば、様々な要素の位置付け、配色、フォントなど）の外観の変化のために失敗する可能性がある。したがって、そのような変化に影響されない堅牢でスケーラブルなソフトウェアロボットを開発することに継続的な関心がある。 However, automating interactions with a user interface poses substantial technical challenges, e.g., unambiguously identifying target elements such as buttons or form fields. Furthermore, RPA applications may fail due to changes in the appearance of the interface (e.g., positioning of various elements, color schemes, fonts, etc.) that occur between the design and run-time of the respective software robot. There is therefore a continuing interest in developing robust and scalable software robots that are immune to such changes.

一態様によれば、方法は、コンピュータシステムの少なくとも１つのハードウェアプロセッサを採用することを含み、コンピュータシステムの少なくとも１つのハードウェアプロセッサは、対象機能のセット及びアンカー機能のセットを備えるＲＰＡスクリプトの受取りに応答して、コンピュータシステムによって公開された実行時ユーザーインターフェース（ＵＩ）内の対象要素の実行時インスタンスを自動的に識別することであって、対象機能が、対象ＵＩの対象要素の特性であり、アンカー機能が、対象ＵＩのアンカー要素の特性である、識別することを実施する。方法は、人間のオペレータと対象要素の実行時インスタンスとのインタラクションの結果を再現する操作を自動的に実行することであって、操作が、ＲＰＡスクリプトに従って決定される、実行することを更に含む。対象機能のセットは、対象ＵＩのツリー表現内の対象要素の位置を示す対象ＩＤと、対象ＵＩ内の対象要素の画像を備える対象画像と、対象ＵＩ内の対象要素によって表示される一連の文字を備える対象テキストと、を備える。アンカー機能のセットは、対象ＵＩのツリー表現内のアンカー要素の位置を示すアンカーＩＤと、対象ＵＩ内のアンカー要素の画像を備えるアンカー画像と、対象ＵＩ内のアンカー要素によって表示される一連の文字を備えるアンカーテキストと、を備える。本方法は、対象ＩＤ、対象画像、対象テキスト、アンカーＩＤ、アンカー画像、及びアンカーテキストに従って、対象要素の実行時インスタンスを識別することを含む。 According to one aspect, the method includes employing at least one hardware processor of a computer system, which performs, in response to receiving an RPA script comprising a set of target functions and a set of anchor functions, automatically identifying a runtime instance of a target element in a runtime user interface (UI) exposed by the computer system, where the target function is a property of the target element of the target UI and the anchor function is a property of the anchor element of the target UI. The method further includes automatically performing an operation that reproduces a result of an interaction between a human operator and the runtime instance of the target element, where the operation is determined according to the RPA script. The set of target functions comprises a target ID indicating a location of the target element in a tree representation of the target UI, a target image comprising an image of the target element in the target UI, and a target text comprising a sequence of characters displayed by the target element in the target UI. The set of anchor functions comprises an anchor ID indicating a location of the anchor element in a tree representation of the target UI, an anchor image comprising an image of the anchor element in the target UI, and an anchor text comprising a sequence of characters displayed by the anchor element in the target UI. The method includes identifying a runtime instance of a target element according to a target ID, a target image, a target text, an anchor ID, an anchor image, and an anchor text.

別の態様によれば、コンピュータシステムは、オートメーション化対象アプリケーション及びＲＰＡロボットを遂行するように構成された少なくとも１つのハードウェアプロセッサを備える。オートメーション化対象アプリケーションは、実行時ＵＩを公開するように構成される。ＲＰＡロボットは、対象機能のセット及びアンカー機能のセットを備えるＲＰＡスクリプトの受取りに応答して、コンピュータシステムによって公開された実行時ＵＩ内の対象要素の実行時インスタンスを自動的に識別することであって、対象機能が、対象ＵＩの対象要素の特性であり、アンカー機能が、対象ＵＩのアンカー要素の特性である、識別することを実施するように構成される。ＲＰＡロボットは、人間のオペレータと対象要素の実行時インスタンスとのインタラクションの結果を再現する操作を自動的に実行することであって、操作が、ＲＰＡスクリプトに従って決定される、実行することを実施するように更に構成される。対象機能のセットは、対象ＵＩのツリー表現内の対象要素の位置を示す対象ＩＤと、対象ＵＩ内の対象要素の画像を備える対象画像と、対象ＵＩ内の対象要素によって表示される一連の文字を備える対象テキストと、を備える。アンカー機能のセットは、対象ＵＩのツリー表現内のアンカー要素の位置を示すアンカーＩＤと、対象ＵＩ内のアンカー要素の画像を備えるアンカー画像と、対象ＵＩ内のアンカー要素によって表示される一連の文字を備えるアンカーテキストと、を備える。対象要素の実行時インスタンスを自動的に識別することは、対象ＩＤ、対象画像、対象テキスト、アンカーＩＤ、アンカー画像、及びアンカーテキストに従って、対象要素の実行時インスタンスを識別することを含む。 According to another aspect, a computer system comprises at least one hardware processor configured to execute an automated target application and an RPA robot. The automated target application is configured to expose a runtime UI. The RPA robot is configured to automatically identify a runtime instance of a target element in the runtime UI exposed by the computer system in response to receiving an RPA script comprising a set of target functions and a set of anchor functions, where the target function is a property of the target element of the target UI and the anchor function is a property of the anchor element of the target UI. The RPA robot is further configured to automatically execute an operation that reproduces the result of an interaction between a human operator and the runtime instance of the target element, where the operation is determined according to the RPA script. The set of target functions comprises a target ID indicating a position of the target element in a tree representation of the target UI, a target image comprising an image of the target element in the target UI, and a target text comprising a sequence of characters displayed by the target element in the target UI. The set of anchor features includes an anchor ID indicating a location of the anchor element in a tree representation of the target UI, an anchor image comprising an image of the anchor element in the target UI, and anchor text comprising a sequence of characters displayed by the anchor element in the target UI. Automatically identifying a runtime instance of the target element includes identifying a runtime instance of the target element according to the target ID, the target image, the target text, the anchor ID, the anchor image, and the anchor text.

別の態様によれば、非一時的コンピュータ可読媒体は、命令を記憶し、その命令は、実行時ＵＩを公開するように構成されたコンピュータシステムの少なくとも１つのハードウェアプロセッサによって遂行された場合、コンピュータシステムに、対象機能のセット及びアンカー機能のセットを備えるＲＰＡスクリプトの受取りに応答して、コンピュータシステムによって公開された実行時ＵＩ内の対象要素の実行時インスタンスを自動的に識別することであって、対象機能が、対象ＵＩの対象要素の特性であり、アンカー機能が、対象ＵＩのアンカー要素の特性である、識別することを実施させる。命令は更に、コンピュータシステムに、人間のオペレータと対象要素の実行時インスタンスとのインタラクションの結果を再現する操作を自動的に実行することであって、操作が、ＲＰＡスクリプトに従って決定される、実行することを更に実施させる。対象機能のセットは、対象ＵＩのツリー表現内の対象要素の位置を示す対象ＩＤと、対象ＵＩ内の対象要素の画像を備える対象画像と、対象ＵＩ内の対象要素によって表示される一連の文字を備える対象テキストと、を備える。アンカー機能のセットは、対象ＵＩのツリー表現内のアンカー要素の位置を示すアンカーＩＤと、対象ＵＩ内のアンカー要素の画像を備えるアンカー画像と、対象ＵＩ内のアンカー要素によって表示される一連の文字を備えるアンカーテキストと、を備える。対象要素の実行時インスタンスを自動的に識別することは、対象ＩＤ、対象画像、対象テキスト、アンカーＩＤ、アンカー画像、及びアンカーテキストに従って、対象要素の実行時インスタンスを識別することを含む。 According to another aspect, a non-transitory computer-readable medium stores instructions that, when executed by at least one hardware processor of a computer system configured to expose a runtime UI, cause the computer system to automatically identify a runtime instance of a target element in a runtime UI exposed by the computer system in response to receiving an RPA script comprising a set of target functions and a set of anchor functions, where the target function is a property of the target element of the target UI and the anchor function is a property of the anchor element of the target UI. The instructions further cause the computer system to automatically perform an operation that reproduces a result of an interaction between a human operator and the runtime instance of the target element, where the operation is determined according to the RPA script. The set of target functions comprises a target ID indicating a location of the target element in a tree representation of the target UI, a target image comprising an image of the target element in the target UI, and a target text comprising a sequence of characters displayed by the target element in the target UI. The set of anchor features includes an anchor ID indicating a location of the anchor element in a tree representation of the target UI, an anchor image comprising an image of the anchor element in the target UI, and anchor text comprising a sequence of characters displayed by the anchor element in the target UI. Automatically identifying a runtime instance of the target element includes identifying a runtime instance of the target element according to the target ID, the target image, the target text, the anchor ID, the anchor image, and the anchor text.

本発明の前述の態様及び利点は、以下の詳細な説明を読み、図面を参照すると、良好に理解されるであろう。 The foregoing aspects and advantages of the present invention will be better understood upon reading the following detailed description and upon reference to the drawings.

本発明のいくつかの実施形態による、例示的なロボティック・プロセス・オートメーション（ＲＰＡ）システムを示す図である。FIG. 1 illustrates an exemplary robotic process automation (RPA) system, according to some embodiments of the present invention.

本発明のいくつかの実施形態による、ＲＰＡクライアント上で遂行される例示的なソフトウェアを示す図である。FIG. 1 illustrates exemplary software performed on an RPA client in accordance with some embodiments of the present invention.

本発明のいくつかの実施形態による、複数のＵＩ要素を備える例示的なユーザーインターフェース（ＵＩ）を示す図である。FIG. 2 illustrates an exemplary user interface (UI) comprising multiple UI elements in accordance with some embodiments of the present invention.

本発明のいくつかの実施形態による、スクリプト作成アプリケーションによって実施される例示的な一連のステップを示す図である。FIG. 2 illustrates an exemplary series of steps performed by a scripting application according to some embodiments of the present invention.

本発明のいくつかの実施形態による、例示的なユーザーインターフェース、対象要素、及び複数の候補アンカー要素を示す図である。1 illustrates an exemplary user interface, a target element, and multiple candidate anchor elements according to some embodiments of the present invention.

本発明のいくつかの実施形態による、対象要素に関連するアンカー要素を自動的に決定するために実行される例示的な一連のステップを示す図である。FIG. 2 illustrates an exemplary series of steps performed to automatically determine an anchor element associated with a target element, according to some embodiments of the present invention.

本発明のいくつかの実施形態による、例示的なユーザーインターフェース、対象要素、及び複数の候補アンカー配置を示す図である。1A-1C illustrate an exemplary user interface, an element of interest, and multiple candidate anchor placements according to some embodiments of the present invention.

本発明のいくつかの実施形態による、対象要素に関連するアンカー要素を自動的に識別するためにスクリプト作成アプリケーションによって実行される代替の一連のステップを示す図である。FIG. 10 illustrates an alternative series of steps performed by a scripting application to automatically identify anchor elements related to a target element, according to some embodiments of the present invention.

本発明のいくつかの実施形態による、例示的なＵＩツリー、及びＵＩツリーのノードを特徴付ける例示的な要素ＩＤを示す図である。2A-2C illustrate an example UI tree and example element IDs characterizing nodes of the UI tree, according to some embodiments of the present invention.

本発明のいくつかの実施形態による、ＵＩ要素を特徴付けるデータの様々なタイプを示す図である。2A-2C are diagrams illustrating various types of data that characterize UI elements according to some embodiments of the present invention.

本発明のいくつかの実施形態による、ＲＰＡロボットによって実施される例示的な一連のステップを示す図である。FIG. 1 illustrates an exemplary series of steps performed by an RPA robot, according to some embodiments of the present invention.

本発明のいくつかの実施形態による、実行時対象ＵＩ要素を識別するためにＲＰＡロボットによって実行される例示的な一連のステップを示す図である。FIG. 1 illustrates an exemplary sequence of steps performed by an RPA robot to identify target UI elements at runtime, according to some embodiments of the present invention.

本発明のいくつかの実施形態による、例示的な要素間距離のセットを示す図である。FIG. 2 illustrates a set of exemplary inter-element distances according to some embodiments of the present invention.

本発明のいくつかの実施形態による、例示的な要素間距離の別のセットを示す図である。FIG. 13 illustrates another set of exemplary inter-element distances, according to some embodiments of the present invention.

本発明のいくつかの実施形態による、例示的な要素間角度を示す図である。1A-1C are diagrams illustrating exemplary inter-element angles, according to some embodiments of the present invention.

本発明のいくつかの実施形態による、２つのＵＩ要素間の例示的な重複度を示す図である。4A-4C illustrate exemplary overlap between two UI elements according to some embodiments of the present invention.

本明細書に記載の方法を実行するように構成されたコンピューティングデバイスの例示的な実施形態を示す図である。FIG. 1 illustrates an exemplary embodiment of a computing device configured to perform the methods described herein.

以下の説明では、構造間の列挙されたすべての接続が、直接的な動作接続、又は中間構造を介した間接的な動作接続であり得ることが理解される。要素のセットは、１つ又は複数の要素を含む。要素の列挙は、少なくとも１つの要素を指すと理解される。複数の要素は、少なくとも２つの要素を含む。「又は」の任意の使用は、否定排他的論理和を意味する。別段の要求がない限り、記載した方法ステップは、必ずしも特定の例示された順序で実施される必要はない。第２の要素から導出された第１の要素（例えば、データ）は、第２の要素に等しい第１の要素、並びに第２の要素及び任意選択的な他のデータを処理することによって生成された第１の要素を包含する。パラメータに従って決定又は判断を行うことは、パラメータに従って、及び任意選択で他のデータに従って、決定又は判断を行うことを包含する。別段の指定がない限り、いくつかの量／データのインジケータは、量／データ自体、又は量／データ自体とは異なるインジケータであってもよい。コンピュータプログラムは、タスクを実行する一連のプロセッサ命令である。本発明のいくつかの実施形態で説明されるコンピュータプログラムは、スタンドアロンのソフトウェアエンティティ、又は他のコンピュータプログラムのサブエンティティ（例えば、サブルーチン、ライブラリ）であってもよい。「データベース」という用語は、本明細書では、組織化された検索可能なデータの集合を示すために使用される。コンピュータ可読媒体は、磁気、光学、及び半導体記憶媒体（例えば、ハードドライブ、光ディスク、フラッシュメモリ、ＤＲＡＭ）などの非一時的媒体、並びに導電ケーブル及び光ファイバリンクなどの通信リンクを包含する。いくつかの実施形態によれば、本発明は、とりわけ、本明細書に記載の方法を実施するようにプログラムされたハードウェア（例えば、１つ又は複数のプロセッサ）と、本明細書に記載の方法を実施する命令を符号化したコンピュータ可読媒体と、を備えるコンピュータシステムを提供する。 In the following description, it is understood that all enumerated connections between structures may be direct operational connections or indirect operational connections through intermediate structures. A set of elements includes one or more elements. An enumeration of elements is understood to refer to at least one element. A plurality of elements includes at least two elements. Any use of "or" means a negative exclusive or. Unless otherwise required, the described method steps need not necessarily be performed in the particular illustrated order. A first element (e.g., data) derived from a second element includes a first element equal to the second element, as well as a first element generated by processing the second element and optionally other data. Making a decision or judgment according to a parameter includes making a decision or judgment according to the parameter and optionally other data. Unless otherwise specified, an indicator of some quantity/data may be the quantity/data itself or an indicator different from the quantity/data itself. A computer program is a sequence of processor instructions that perform a task. Computer programs described in some embodiments of the invention may be standalone software entities or subentities (e.g., subroutines, libraries) of other computer programs. The term "database" is used herein to denote an organized, searchable collection of data. Computer-readable media encompass non-transitory media such as magnetic, optical, and semiconductor storage media (e.g., hard drives, optical disks, flash memory, DRAM), as well as communication links such as conductive cables and fiber optic links. According to some embodiments, the invention provides a computer system comprising, among other things, hardware (e.g., one or more processors) programmed to perform the methods described herein, and a computer-readable medium encoding instructions for performing the methods described herein.

以下の説明は、本発明の実施形態を例として示しており、必ずしも限定するものではない。 The following description illustrates exemplary embodiments of the present invention and is not intended to be limiting.

図１は、本発明のいくつかの実施形態による、例示的なロボティック・プロセス・オートメーション・システムを示している。複数のＲＰＡクライアント１０ａ～ｅの各々は、少なくともハードウェアプロセッサと、メモリユニットと、それぞれのＲＰＡクライアントがコンピュータネットワーク及び／又は他のコンピューティングデバイスへの接続を可能にするネットワークアダプタと、を有するコンピューティングデバイスを表す。例示的なＲＰＡクライアント１０ａ～ｅは、とりわけ、パーソナルコンピュータ、ラップトップ及びタブレットコンピュータ、並びにモバイル通信デバイス（例えば、スマートフォン）を含む。例示的な使用事例シナリオでは、ＲＰＡクライアント１０ａ～１０ｄは、企業の会計又は人事部に属するデスクトップコンピュータを表す。図示のＲＰＡクライアント１０ａ～ｄは、ローカル・エリア・ネットワーク（ＬＡＮ）を備え得るローカル通信ネットワーク１２によって相互接続される。クライアント１０ａ～ｄは、広域ネットワーク（ＷＡＮ）及び／又はインターネットを備え得る拡張ネットワーク１４に更にアクセスし得る。図１の構成例では、ＲＰＡクライアント１０ｅは、拡張ネットワーク１４に直接接続されている。そのようなクライアントは、様々なアクセスポイントでネットワーク１４に接続するラップトップ、タブレットコンピュータ、又は携帯電話などのモバイルコンピュータを表し得る。 1 illustrates an exemplary robotic process automation system according to some embodiments of the present invention. Each of a plurality of RPA clients 10a-e represents a computing device having at least a hardware processor, a memory unit, and a network adapter that enables the respective RPA client to connect to a computer network and/or other computing devices. The exemplary RPA clients 10a-e include, among others, personal computers, laptop and tablet computers, as well as mobile communication devices (e.g., smartphones). In an exemplary use case scenario, the RPA clients 10a-10d represent desktop computers belonging to the accounting or human resources departments of a company. The illustrated RPA clients 10a-d are interconnected by a local communication network 12, which may comprise a local area network (LAN). The clients 10a-d may further access an extended network 14, which may comprise a wide area network (WAN) and/or the Internet. In the exemplary configuration of FIG. 1, the RPA client 10e is directly connected to the extended network 14. Such clients may represent mobile computers, such as laptops, tablet computers, or mobile phones, that connect to the network 14 at various access points.

典型的なＲＰＡシナリオでは、会社の従業員は、例えば様々なビジネスクライアントに請求書を発行するために、ビジネスアプリケーション（例えば、ワードプロセッサ、スプレッドシートエディタ、ブラウザ、電子メールアプリケーション）を使用して、反復的なタスクを実施する。それぞれのタスクを実際に実行するために、従業員は、一連の操作／アクションを実施し、これは、本明細書ではビジネスプロセスと見なされる。請求書発行ビジネスプロセスの一部を形成する例示的な操作は、ＭｉｃｒｏｓｏｆｔＥｘｃｅｌ（登録商標）スプレッドシートを開くことと、クライアントの会社の詳細を検索することと、それぞれの詳細を請求書テンプレートにコピーすることと、購入された商品を示す請求書フィールドに記入することと、電子メールアプリケーションに切り替えることと、それぞれのクライアントへの電子メールメッセージを作成することと、新規に作成した請求書をそれぞれの電子メールメッセージに添付することと、「送信」ボタンをクリックすることと、を含んでもよい。従業員のコンピュータ上で遂行されるＲＰＡソフトウェアは、それぞれのタスクを実行する過程で、それぞれの人間のオペレータによって実施される操作のセットを模倣することによって、それぞれのビジネスプロセスをオートメーション化し得る。そのようなオートメーション化を典型的に対象とする例示的なプロセスは、支払いの処理、請求書発行、ビジネスクライアントとの通信（例えば、社報及び／又は製品の提供物の配布）、内部通信（例えば、メモ、会議及び／又はタスクのスケジューリング）、給与処理などを含む。 In a typical RPA scenario, company employees perform repetitive tasks using business applications (e.g., word processors, spreadsheet editors, browsers, email applications), for example to issue invoices to various business clients. To actually perform each task, the employees perform a series of operations/actions, which are considered herein as business processes. Exemplary operations forming part of an invoice issuing business process may include opening a Microsoft Excel spreadsheet, searching for the client's company details, copying the respective details into an invoice template, filling in invoice fields indicating the items purchased, switching to an email application, composing an email message to the respective client, attaching the newly created invoice to the respective email message, and clicking a "send" button. The RPA software running on the employee's computer may automate the respective business process by mimicking the set of operations performed by the respective human operators in the course of performing the respective tasks. Exemplary processes typically subject to such automation include processing payments, invoicing, communications with business clients (e.g., distribution of company newsletters and/or product offerings), internal communications (e.g., scheduling of memos, meetings and/or tasks), payroll, etc.

人間の操作／アクションを模倣することは、本明細書では、人間のオペレータがコンピュータ上でそれぞれの操作／アクションを実施するときに発生する一連のコンピューティングイベントを再現することと、人間のオペレータがコンピュータ上で実施したそれぞれの操作結果を再現することと、を包含すると理解される。例えば、グラフィカル・ユーザー・インターフェースのボタンをクリックするアクションを模倣することは、オペレーティングシステムに、マウスポインタをそれぞれのボタンに移動させることと、マウス・クリック・イベントを生成することと、を含んでもよく、又はそれぞれのＧＵＩボタン自体をクリック状態に切り替えることを含んでもよい。 Mimicking a human operation/action is understood herein to encompass reproducing the sequence of computing events that occur when a human operator performs the respective operation/action on a computer, and reproducing the results of the respective operation performed by the human operator on a computer. For example, mimicking the action of clicking a button on a graphical user interface may include the operating system moving a mouse pointer to the respective button and generating a mouse click event, or may include toggling the respective GUI button itself to a clicked state.

図２は、本発明のいくつかの実施形態による、ＲＰＡクライアント１０上で遂行される例示的なソフトウェアを示している。ＲＰＡクライアント１０は、図１のＲＰＡクライアント１０ａ～ｅのいずれかを表している。ＲＰＡクライアント１０は、オペレーティングシステム（ＯＳ）４０、及びビジネスアプリケーション４２のセットを遂行する。ＯＳ４０は、アプリケーション４２とＲＰＡクライアント１０のハードウェアとの間をインターフェースするソフトウェア層を備える、とりわけ、ＭｉｃｒｏｓｏｆｔＷｉｎｄｏｗｓ（登録商標）、ＭａｃＯＳ（登録商標）、Ｌｉｎｕｘ（登録商標）、ｉＯＳ（登録商標）、又はＡｎｄｒｏｉｄ（登録商標）などの任意の広く利用可能なオペレーティングシステムを備えてもよい。ビジネスアプリケーション４２は、タスクを実行するために、ＲＰＡクライアント１０の人間のオペレータによって使用される任意のコンピュータプログラムを全般に表す。例示的なビジネスアプリケーション４２は、とりわけ、ワードプロセッサ、スプレッドシートアプリケーション、グラフィックアプリケーション、ブラウザ、ソーシャル・メディア・アプリケーション、及び電子通信アプリケーションを含む。少なくとも１つのビジネスアプリケーション４２は、以下に詳述するように、オートメーション化の対象となるユーザーインターフェース（ＵＩ）を公開するように構成される。 FIG. 2 illustrates exemplary software executed on an RPA client 10 according to some embodiments of the present invention. The RPA client 10 represents any of the RPA clients 10a-e of FIG. 1. The RPA client 10 executes an operating system (OS) 40 and a set of business applications 42. The OS 40 may comprise any widely available operating system, such as Microsoft Windows, MacOS, Linux, iOS, or Android, among others, that comprises a software layer that interfaces between the applications 42 and the hardware of the RPA client 10. The business applications 42 generally represent any computer program used by a human operator of the RPA client 10 to perform tasks. Exemplary business applications 42 include, among others, word processors, spreadsheet applications, graphics applications, browsers, social media applications, and electronic communication applications. At least one business application 42 is configured to expose a user interface (UI) that is subject to automation, as described in more detail below.

いくつかの実施形態では、ＲＰＡクライアント１０は、ビジネスプロセスのオートメーション化を集合的に実装する相互接続されたコンピュータプログラムのセットを備えるＲＰＡロボット４４を更に遂行する。例示的なＲＰＡロボットは、Ｍｉｃｒｏｓｏｆｔ（登録商標）株式会社からのＷｉｎｄｏｗｓＷｏｒｋｆｌｏｗＦｏｕｎｄａｔｉｏｎアプリケーション・プログラミング・インターフェースを使用して構築される。いくつかの実施形態では、ＲＰＡロボット４４は、ＲＰＡクライアント１０上でインスタンス化された別個の専用仮想マシン内で遂行される。 In some embodiments, the RPA client 10 further executes an RPA robot 44 that comprises a set of interconnected computer programs that collectively implement business process automation. An exemplary RPA robot is built using the Windows Workflow Foundation application programming interface from Microsoft Corporation. In some embodiments, the RPA robot 44 executes within a separate dedicated virtual machine instantiated on the RPA client 10.

ＲＰＡロボット４４のコンポーネントは、ＲＰＡエージェント４３と、ロボットエグゼキュータ４５のセットと、を含む。ロボットエグゼキュータ４５は、ビジネスプロセスを実行する人間のオペレータのアクションを模倣する一連の操作（アクティビティとして当技術分野でも知られる）を示すＲＰＡスクリプト５０を受け取り、それぞれのクライアントマシン上でそれぞれの一連の操作を実際に遂行するように構成される。ＲＰＡスクリプト５０は通常、プロセス固有であり、すなわち、各別個のビジネスプロセスは、ＲＰＡスクリプトの別個のセットによって記述される。ＲＰＡスクリプト５０は、当技術分野で知られている任意のデータ仕様に従って定式化され得る。好ましい実施形態では、ＲＰＡスクリプト５０は、拡張可能マークアップ言語（ＸＭＬ）のバージョンで符号化されるが、スクリプト５０はまた、Ｃ＃、ＶｉｓｕａｌＢａｓｉｃ、Ｊａｖａなどのプログラミング言語で定式化されてもよい。あるいは、ＲＰＡスクリプト５０は、バイトコードのＲＰＡ固有のバージョンで、又は英語、スペイン語、日本語などの自然言語で定式化された一連の命令としてでさえ指定されてもよい。いくつかの実施形態では、スクリプト５０は、ネイティブプロセッサ命令のセット（例えば、マシンコード）に事前コンパイルされる。 The components of the RPA robot 44 include an RPA agent 43 and a set of robot executors 45. The robot executors 45 are configured to receive an RPA script 50, which indicates a series of operations (also known in the art as activities) that mimic the actions of a human operator executing a business process, and to actually perform each series of operations on a respective client machine. The RPA script 50 is typically process-specific, i.e., each separate business process is described by a separate set of RPA scripts. The RPA script 50 may be formulated according to any data specification known in the art. In a preferred embodiment, the RPA script 50 is coded in a version of the Extensible Markup Language (XML), although the script 50 may also be formulated in a programming language such as C#, Visual Basic, Java, etc. Alternatively, the RPA script 50 may be specified in an RPA-specific version of bytecode, or even as a series of instructions formulated in a natural language such as English, Spanish, Japanese, etc. In some embodiments, the script 50 is pre-compiled into a set of native processor instructions (e.g., machine code).

いくつかの実施形態では、ロボットエグゼキュータ４５は、それぞれのスクリプトに記載された操作を実行するためのプロセッサ命令を備える実行時パッケージに、ＲＰＡスクリプト５０を変換するように構成されたインタプリタ（例えば、ジャストインタイムインタプリタ又はコンパイラ）を備える。したがって、スクリプト５０を遂行することは、エグゼキュータ４５が、ＲＰＡスクリプト５０を変換することと、結果として得られた実行時パッケージをメモリにロードし、更に実行時パッケージを起動して遂行するように、ＲＰＡクライアント１０のプロセッサに命令することと、を含んでもよい。 In some embodiments, the robot executor 45 comprises an interpreter (e.g., a just-in-time interpreter or compiler) configured to convert the RPA scripts 50 into runtime packages comprising processor instructions for performing operations described in the respective scripts. Thus, executing the scripts 50 may include the executor 45 converting the RPA scripts 50, loading the resulting runtime package into memory, and instructing the processor of the RPA client 10 to launch and execute the runtime package.

ＲＰＡエージェント４３は、ロボットエグゼキュータ４５の動作を管理し得る。例えば、ＲＰＡエージェント４３は、人間のオペレータからの入力に従って、及び／又はスケジュールに従って、ロボットエグゼキュータ４５による遂行のためのタスク／スクリプトを選択してもよい。エージェント４３は、エグゼキュータ４５の様々な動作パラメータを更に構成してもよい。ロボット４４が複数のエグゼキュータ４５を含む場合、エージェント４３は、それらのアクティビティ及び／又はプロセス間通信を調整し得る。ＲＰＡエージェント４３は、ＲＰＡロボット４４と図１に示すＲＰＡシステムの他のコンポーネントとの間の通信を更に管理し得る。そのようなコンポーネントは、他のＲＰＡクライアント上、及び／又はロボット管理サーバ１１ａ～ｂのセット上で遂行し得る。そのような一例では、複数のクライアントマシンにわたってＲＰＡアクティビティを調整し、更に複雑なスケジューリング及び／又はライセンス管理を可能にするロボット・オーケストレータ・サービスを、サーバ１１ａ～ｂは動作させてもよい。サーバ１１ａ～ｂは、様々な中間値及び／又はＲＰＡスクリプトの遂行結果を示すデータを、個々のＲＰＡロボットから更に受け取ってもよい。そのようなデータは、アクティビティ報告を生成し、ライセンス契約を施行し、及び／又は誤動作を軽減するために使用され得る。 The RPA agent 43 may manage the operation of the robot executor 45. For example, the RPA agent 43 may select tasks/scripts for execution by the robot executor 45 according to input from a human operator and/or according to a schedule. The agent 43 may further configure various operating parameters of the executor 45. If the robot 44 includes multiple executors 45, the agent 43 may coordinate their activities and/or inter-process communication. The RPA agent 43 may further manage communication between the RPA robot 44 and other components of the RPA system shown in FIG. 1. Such components may run on other RPA clients and/or on a set of robot management servers 11a-b. In one such example, the servers 11a-b may run a robot orchestrator service that coordinates RPA activities across multiple client machines and enables more complex scheduling and/or license management. The servers 11a-b may further receive data from the individual RPA robots indicative of various intermediate values and/or results of execution of the RPA scripts. Such data may be used to generate activity reports, enforce license agreements, and/or mitigate malfunctions.

いくつかの実施形態では、ＲＰＡクライアント１０の人間のオペレータがＲＰＡスクリプト５０を作成し、したがってロボットを効果的に設計して、アクティビティのセットを実施し得るように構成されたスクリプト作成アプリケーション４６を、ＲＰＡクライアント１０は更に遂行する。オペレータがビジネスプロセスをモデル化するためのツールのセットとインタラクトすることを可能にするコードエディタ及び／又はユーザーインターフェースを備える統合開発環境（ＩＤＥ）のように、作成アプリケーション４６は機能し得る。例示的な作成アプリケーションは、ユーザーが、ビジネスアプリケーション４２を選択することと、それぞれのアプリケーションとインタラクトする所望の方法を示すこと、例えばロボット４４によって実施される一連の操作を示すことと、を可能にしてもよい。例示的な操作は、例えば、特定のＥｘｃｅｌ（登録商標）スプレッドシートを開くこと、データテーブルの特定の行／列からデータを読み取ること、特定の方法でそれぞれのデータを処理すること、特定のボタンをクリックすること、電子メールメッセージを作成及び送信すること、特定の統一された記録場所（ＵＲＬ）にナビゲートすること、などを含む。いくつかの実施形態では、作成アプリケーション４６は、ＲＰＡロボット４４によって読み取り可能なフォーマット（例えば、ＸＭＬ）でＲＰＡスクリプト５０を出力する。ネットワーク１２及び／又は１４を介してＲＰＡクライアント１０ａ～ｅに通信可能に結合され、ＲＰＡクライアント１０ａ～ｅにアクセス可能なスクリプトリポジトリ１５に、ＲＰＡスクリプト５０を記憶してもよい（図１参照）。好ましい実施形態では、スクリプトリポジトリ１５は、ロボット管理サーバ１１ａ～ｂに直接リンクされる。スクリプトリポジトリ１５は、データベース、例えば、基準のセットに従って、スクリプト５０の選択的検索を可能にする任意の構造化データ集合として編成されてもよい。 In some embodiments, the RPA client 10 further executes a scripting application 46 configured to enable a human operator of the RPA client 10 to create the RPA script 50 and thus effectively design a robot to perform a set of activities. The creation application 46 may function like an integrated development environment (IDE) with a code editor and/or user interface that allows the operator to interact with a set of tools to model a business process. An exemplary creation application may allow a user to select a business application 42 and indicate a desired way to interact with each application, e.g., a sequence of operations to be performed by the robot 44. Exemplary operations include, for example, opening a specific Excel spreadsheet, reading data from a specific row/column of a data table, processing the respective data in a specific way, clicking a specific button, creating and sending an email message, navigating to a specific uniform repository (URL), etc. In some embodiments, the creation application 46 outputs the RPA script 50 in a format (e.g., XML) that is readable by the RPA robot 44. The RPA scripts 50 may be stored in a script repository 15 that is communicatively coupled to and accessible to the RPA clients 10a-e via networks 12 and/or 14 (see FIG. 1). In a preferred embodiment, the script repository 15 is directly linked to the robot management servers 11a-b. The script repository 15 may be organized as a database, e.g., any structured collection of data that allows selective retrieval of the scripts 50 according to a set of criteria.

当業者であれば、図２に示すすべてのコンポーネントが、同じ物理プロセッサ又はマシン上で遂行する必要がないことを理解するであろう。典型的なＲＰＡ構成では、スクリプト開発／ロボット設計は、１つのマシン上（当技術分野で「設計サイド」として一般的に知られている）で実行される。次に、結果として得られるＲＰＡスクリプト５０は、遂行のために複数の他のユーザー及びマシンに配信される（通常、「実行時サイド」又は単に「実行時」として知られている）。 Those skilled in the art will appreciate that not all of the components shown in FIG. 2 need to execute on the same physical processor or machine. In a typical RPA configuration, script development/robot design is performed on one machine (commonly known in the art as the "design side"). The resulting RPA script 50 is then distributed to multiple other users and machines for execution (commonly known as the "runtime side" or simply "runtime").

図３は、本発明のいくつかの実施形態による、例示的なユーザーインターフェース（ＵＩ）５８を示している。ＵＩ５８は、ビジネスアプリケーション４２のいずれかによって公開されてもよい。ユーザーインターフェースは、人間とマシンとのインタラクションを可能にするコンピュータインターフェースであり、例えば、ユーザー入力を受け取り、それぞれの入力に応答するように構成されたインターフェースである。ユーザーインターフェースの一般的な例は、グラフィカル・ユーザー・インターフェース（ＧＵＩ）として知られており、それは、ユーザーに対して表示する視覚要素のセットを介して人間とマシンとのインタラクションを可能にする。例示的なＵＩ５８は、例示的なウィンドウ６０ａ～ｂのセットと、メニューインジケータ６２ａ、アイコン６２ｂ、ボタン６２ｃ、及びテキストボックス６２ｄを含む例示的なＵＩ要素のセットと、を有する。他の例示的なＵＩ要素は、とりわけ、ウィンドウ、ラベル、フォーム、個別フォームフィールド、トグル、リンク（例えば、ハイパーリンク、ハイパーテキスト、又は統一資源識別子）を備える。ＵＩ要素は、情報を表示し、入力（テキスト、マウスイベント）を受け取り、並びに／あるいはソフトウェアの機能及び／又はそれぞれのコンピュータシステムを制御し得る。 3 illustrates an exemplary user interface (UI) 58 according to some embodiments of the present invention. The UI 58 may be exposed by any of the business applications 42. A user interface is a computer interface that enables human-machine interaction, e.g., an interface configured to receive user input and respond to the respective input. A common example of a user interface is known as a graphical user interface (GUI), which enables human-machine interaction through a set of visual elements that are displayed to a user. The exemplary UI 58 includes a set of exemplary windows 60a-b and a set of exemplary UI elements including a menu indicator 62a, an icon 62b, a button 62c, and a text box 62d. Other exemplary UI elements include windows, labels, forms, individual form fields, toggles, links (e.g., hyperlinks, hypertext, or uniform resource identifiers), among others. The UI elements may display information, receive input (text, mouse events), and/or control functions of the software and/or the respective computer system.

いくつかのＵＩ要素は、それら（例えば、クリックボタン６２ｃ）に作用することが挙動／反応をトリガするという意味でインタラクティブである。そのような挙動／反応は通常、それぞれの要素に、又は要素のグループに固有である。例えば、保存ボタンをクリックすることと、印刷ボタンをクリックすることでは、異なる効果が生じる。同じキーボードショートカット（例えば、Ｃｔｒｌ－Ｇ）は、１つのウィンドウ／アプリケーションで遂行される場合１つの効果を有し、別のウィンドウ／アプリケーションで遂行される場合、全く異なる効果を有し得る。そのため、操作／アクション（クリックを遂行すること、キーボードキーの組合せを押すこと、一連の文字を書き込むことなど）は同じであるが、それぞれのアクションの結果は、それぞれの操作のオペランドに実質的に依存し得る。オペランドは、本明細書では、クリック又はキーボードイベントなどの現在の操作／アクションによって作用されるＵＩ要素として定義されるか、そうでなければ、それぞれのユーザー入力を受け取るために選択されたＵＩ要素として定義される。「対象」及び「オペランド」という用語は、本明細書では互換的に使用される。ＵＩ要素の挙動は要素固有であるため、成功したＲＰＡは、各スクリプト化されたＲＰＡアクティビティのオペランドを明確に、且つ正確に識別する必要があり得る。 Some UI elements are interactive in the sense that acting on them (e.g., click button 62c) triggers a behavior/response. Such behavior/response is usually specific to each element or group of elements. For example, clicking a save button and clicking a print button produce different effects. The same keyboard shortcut (e.g., Ctrl-G) may have one effect when performed in one window/application and a completely different effect when performed in another window/application. Thus, although the operation/action (performing a click, pressing a keyboard key combination, writing a series of characters, etc.) is the same, the outcome of each action may depend substantially on the operand of each operation. An operand is defined herein as a UI element that is acted upon by a current operation/action, such as a click or keyboard event, or that is otherwise selected to receive a respective user input. The terms "target" and "operand" are used interchangeably herein. Because UI element behavior is element specific, successful RPA may require unambiguously and precisely identifying the operand of each scripted RPA activity.

図４は、本発明のいくつかの実施形態による、スクリプト作成アプリケーション４６によって実施される例示的な一連のステップを示している。ステップ１０１は、現在のオートメーション化の対象である対象ＵＩ、すなわち、ビジネスアプリケーション４２のユーザーインターフェースの設計サイドインスタンスを公開する。ステップ１０１は、例えば、ビジネスアプリケーション４２のインスタンスを呼び出すことを含んでもよい。ステップ１０２において、公開された対象ＵＩ上でロボット４４によって実施されるべき所望のアクティビティを、ユーザーが示すことを可能にするロボット設計インターフェース（例えば、ＧＵＩ）を、アプリケーション４６は公開し得る。いくつかの実施形態では、アクティビティは、アクティビティメニューの階層を介して達成されてもよい。様々な基準に従って、例えば、ビジネスアプリケーションのタイプ（例えば、ＭＳＥｘｃｅｌ（登録商標）アクティビティ、ウェブアクティビティ、電子メールアクティビティ）に従って、及び／又はインタラクションのタイプ（例えば、マウスアクティビティ、ホットキーアクティビティ、データ・グラブ・アクティビティ、フォーム記入アクティビティなど）に従って、アクティビティをグループ化し得る。ステップ１０４は、それぞれのアクティビティを示すユーザー入力を受け取る。例えば、ステップ１０４は、マウス・クリック・イベントをインターセプトし、ユーザーがアクティビティを選択するためにクリックしたメニュー項目を決定することを含んでもよい。更なるステップ１０６において、ユーザーがそれぞれのアクティビティの様々な選択肢及び／又はパラメータを構成することを可能にするアクティビティ構成インターフェースを、アプリケーション４６は公開し得る。１つの例示的なアクティビティパラメータは、それぞれのアクティビティのオペランド／対象ＵＩ要素である。アクティビティがマウスクリックを含む一例では、対象ＵＩ要素はボタン、メニュー項目、ハイパーリンクなどであってもよい。アクティビティがフォームに記入することを含む別の例では、対象ＵＩ要素は、それぞれのテキスト入力を受容するべき特定のフォームフィールドであってもよい。アプリケーション４６により、ユーザーは、様々な方法で対象ＵＩ要素を示すことが可能になる。例えば、それは、候補ＵＩ要素のメニュー／リストから対象要素を選択するようにユーザーを促してもよい。好ましい実施形態では、アプリケーション４６は、対象ＵＩ（すなわち、ロボット４４が、例えばＭＳＥｘｃｅｌ（登録商標）、ブラウザ、電子メールプログラムなどとインタラクトすることになっているビジネスアプリケーションのＵＩ）のインスタンスを公開し、それぞれのＵＩ内のＵＩ要素のサブセットをハイライトし、ユーザーに選択を示すためにいずれかをクリックするように促してもよい。ステップ１０８において、アプリケーション４６は、例えば、特定のＯＳ機能を呼び出してマウスクリックを検出することによって、更にクリックされたＵＩ要素を識別することによって、選択された対象要素を示すユーザー入力を受け取って、処理し得る。 FIG. 4 illustrates an exemplary sequence of steps performed by the scripting application 46 according to some embodiments of the present invention. Step 101 exposes a design-side instance of a target UI, i.e., a user interface of a business application 42, that is the subject of the current automation. Step 101 may include, for example, invoking an instance of the business application 42. In step 102, the application 46 may expose a robot design interface (e.g., a GUI) that allows a user to indicate desired activities to be performed by the robot 44 on the exposed target UI. In some embodiments, the activities may be accomplished through a hierarchy of activity menus. Activities may be grouped according to various criteria, for example, according to type of business application (e.g., MS Excel activity, web activity, email activity) and/or according to type of interaction (e.g., mouse activity, hotkey activity, data grab activity, form fill activity, etc.). Step 104 receives user input indicating the respective activity. For example, step 104 may include intercepting a mouse click event and determining which menu item the user clicked to select the activity. In a further step 106, the application 46 may expose an activity configuration interface that allows the user to configure various options and/or parameters of the respective activity. One exemplary activity parameter is the operand/target UI element of the respective activity. In one example where the activity involves a mouse click, the target UI element may be a button, a menu item, a hyperlink, etc. In another example where the activity involves filling out a form, the target UI element may be a particular form field that is to accept the respective text input. The application 46 may allow the user to indicate the target UI element in various ways. For example, it may prompt the user to select the target element from a menu/list of candidate UI elements. In a preferred embodiment, the application 46 may expose an instance of the target UI (i.e., the UI of the business application with which the robot 44 is to interact, e.g., MS Excel, a browser, an email program, etc.), highlight a subset of UI elements within each UI, and prompt the user to click on one to indicate a selection. In step 108, application 46 may receive and process user input indicating the selected target element, for example, by invoking specific OS functionality to detect the mouse click and further by identifying the clicked UI element.

次に、ステップ１１０において、いくつかの実施形態は、選択された対象要素に関連するアンカーＵＩ要素を自動的に決定し得る。アンカー要素（又は単に「アンカー」）は、本明細書では、対象及びアンカーがそれぞれのユーザーインターフェース内で同時に見えるという意味で、関連する対象ＵＩ要素と同時に表示されるＵＩ要素として定義される。更に、アンカー要素及び対象要素は通常、意味的接続を有し、例えば、それらは両方とも、ＵＩ要素の同じグループ／コンテナに属し、及び／又はそれらは共に機能を実施する。入力フィールドに関連する例示的なアンカー要素は、とりわけ、それぞれの入力フィールドの近傍に表示されたテキストラベルと、それぞれの入力フィールドを含むフォームのタイトルと、を含む。ボタンに関連する例示的なアンカー要素は、それぞれのボタン、及び同じＵＩの別のボタンの上に表示されたテキストを含む。図５は、本発明のいくつかの実施形態による、対象要素６４及び複数の潜在的なアンカー要素６６ａ～ｅを有する例示的なＵＩ５８を示している。 Next, in step 110, some embodiments may automatically determine an anchor UI element associated with the selected target element. An anchor element (or simply "anchor") is defined herein as a UI element that is displayed simultaneously with an associated target UI element, in the sense that the target and anchor are visible simultaneously in their respective user interfaces. Furthermore, anchor elements and target elements typically have a semantic connection, e.g., they both belong to the same group/container of UI elements and/or they both perform a function. Exemplary anchor elements associated with input fields include, among others, a text label displayed near the respective input field and a title of a form that includes the respective input field. Exemplary anchor elements associated with buttons include the respective buttons and text displayed on another button of the same UI. FIG. 5 illustrates an exemplary UI 58 having a target element 64 and multiple potential anchor elements 66a-e in accordance with some embodiments of the present invention.

対象要素のアンカーを決定することは、例えば図６に示すように、候補ＵＩ要素のセットからアンカーを選択することを含み得る。ステップ２０２において、アプリケーション４６は、ＵＩ５８によって表示されたＵＩ要素のセットから選択された候補アンカー要素のセット（例えば、図５のアイテム６６ａ～ｅを参照）を生成し得る。候補アンカー要素は、それぞれの対象要素の要素タイプ（例えば、ボタン、テキスト、入力フィールドなど）に従って選択され得る。いくつかの実施形態では、候補アンカーが、それぞれの対象と同じ要素／ＵＩコンテナのグループに属するか否かに従って、候補アンカーを選択してもよい。例えば、対象要素がフォームフィールドである場合、いくつかの実施形態は、同じフォームフィールドに属するＵＩ要素の中からのみ、アンカー候補を選択することになる。ＨＴＭＬ文書の場合、いくつかの実施形態は、対象要素と同じ＜ｄｉｖ＞又は＜ｓｐａｎ＞コンテナからラベル候補を選択してもよい。 Determining the anchor of the target element may include selecting an anchor from a set of candidate UI elements, for example as shown in FIG. 6. In step 202, application 46 may generate a set of candidate anchor elements (see, for example, items 66a-e in FIG. 5) selected from the set of UI elements displayed by UI 58. The candidate anchor elements may be selected according to the element type (e.g., button, text, input field, etc.) of the respective target element. In some embodiments, the candidate anchors may be selected according to whether they belong to the same group of elements/UI containers as the respective target. For example, if the target element is a form field, some embodiments will select anchor candidates only from among UI elements that belong to the same form field. In the case of an HTML document, some embodiments may select label candidates from the same <div> or <span> container as the target element.

次に、ステップ２０４において、アプリケーション４６は、基準のセットに従って各候補アンカー要素を評価し得る。いくつかの実施形態では、ステップ２０４は、別個の基準に従って評価された複数のサブスコアを組み合わせ得るアンカー適応度スコアを決定することを含んでもよい。例示的な基準は、対象要素に対する候補アンカーの相対位置である。相対位置は、それぞれの対象要素と候補アンカー要素との間の距離、角度、及び／又は重複度のセットに従って決定され得る。そのような決定の例は、図１３～図１６に関連して以下で詳細に説明する。いくつかの実施形態は、対象要素の近傍に配置される、及び／又は対象要素と実質的に位置合わせされているＵＩ要素が、比較的信頼性の高いアンカーであると考える。そのような実施形態では、そのようなＵＩ要素は、選択された対象要素から離れている、及び／又は選択された対象要素と位置合わせされていない他のＵＩ要素よりも、高い適応度スコアを受け取ってもよい。 Next, in step 204, application 46 may evaluate each candidate anchor element according to a set of criteria. In some embodiments, step 204 may include determining an anchor fitness score that may combine multiple sub-scores evaluated according to separate criteria. An exemplary criterion is the relative position of the candidate anchor with respect to the target element. The relative position may be determined according to a set of distances, angles, and/or overlaps between the respective target element and the candidate anchor element. Examples of such determinations are described in detail below in connection with FIGS. 13-16. Some embodiments consider UI elements that are located near and/or substantially aligned with the target element to be relatively reliable anchors. In such embodiments, such UI elements may receive a higher fitness score than other UI elements that are farther away from and/or not aligned with the selected target element.

他の例示的なアンカー適応度基準は、それぞれのＵＩ要素の画像及び／又はテキストコンテンツを含んでもよい。いくつかの実施形態は、テキストラベルをアンカー要素として優先するので、テキストを含まないＵＩ要素は、テキストの断片を表示する他のＵＩ要素よりも相対的に低い適応度スコアを受け取り得る。別の例示的な基準は、ＵＩ要素によって表示されるテキストの長さであってもよく、いくつかの実施形態は、それらは、ラベルである可能性が高いため、小さいテキスト要素を優先してもよい。そのような実施形態では、比較的小さいテキスト要素は、かなりの量のテキストを有するテキスト要素と比較して、比較的高い適応度スコアを受け取り得る。 Other exemplary anchor fitness criteria may include the image and/or text content of the respective UI element. Some embodiments prioritize text labels as anchor elements, so that UI elements that do not contain text may receive a relatively lower fitness score than other UI elements that display snippets of text. Another exemplary criterion may be the length of the text displayed by the UI element, and some embodiments may prioritize small text elements because they are more likely to be labels. In such an embodiment, relatively small text elements may receive a relatively higher fitness score compared to text elements that have a significant amount of text.

更に別の例示的な基準は、類似の外観を有するアンカー候補の数、例えば、同一のテキストを表示するＵＩ要素の数を含んでもよい。例示的な一シナリオでは、複数の人物に関するデータを収集するように設計され、「名字」とラベル付けされた複数のフィールドを有するフォームを、対象ＵＩ５８は含む。そのような状況では、「名字」ラベルは、特定のフォームフィールドを識別する際に極めて信頼できない場合がある。したがって、いくつかの実施形態は、各アンカー候補が（同様の画像を有するか、又は同様のテキストを表示する他のＵＩ要素がないという意味で）、一意であるか否かを判定し、否である場合、比較的低いアンカー適応度スコアを、それぞれのアンカー候補に割り当ててもよい。代替の実施形態は、例えば、それぞれのフォームフィールドの近傍に配置されたラベル、及びそれぞれの入力フォーム又は入力ブロックのタイトルなど、複数のアンカーを同じ対象要素に割り当ててもよい。 Yet another exemplary criterion may include the number of anchor candidates with similar appearance, e.g., the number of UI elements that display the same text. In one exemplary scenario, the target UI 58 includes a form designed to collect data about multiple people and that has multiple fields labeled "Last Name." In such a situation, the "Last Name" label may be highly unreliable in identifying a particular form field. Thus, some embodiments may determine whether each anchor candidate is unique (in the sense that there are no other UI elements that have a similar image or display similar text) and, if not, assign a relatively low anchor fitness score to each anchor candidate. Alternative embodiments may assign multiple anchors to the same target element, e.g., labels located near each form field and the title of each input form or input block.

次いで、ステップ２０６は、候補アンカーについて評価されたスコアを比較し得る。明確に当選した候補が存在する場合、ステップ２１０において、スクリプト作成アプリケーション４６は、ステップ１０８（図４）で決定された対象要素に関連するアンカー要素として、最も高い適応度スコアを有する候補要素を選択し得る。同点の場合、すなわち、複数の候補が同じ適応度スコアを有する場合、いくつかの実施形態は、アンカーとして使用されるＵＩ要素を明示的に示すようにユーザーを促し得る（ステップ２０８）。 Step 206 may then compare the assessed scores for the candidate anchors. If there is a clear winning candidate, then in step 210, the scripting application 46 may select the candidate element with the highest fitness score as the anchor element associated with the target element determined in step 108 (FIG. 4). In the event of a tie, i.e., multiple candidates have the same fitness score, some embodiments may prompt the user to explicitly indicate the UI element to be used as the anchor (step 208).

アンカーＵＩ要素を自動的に選択する代替の方法を図７～図８に示す。アプリケーション４６が候補要素のセットを生成し、次いで対象要素に対するそれらの位置に従ってそれらの適応度をアンカーとして評価する前述の方法とは対照的に、ステップ２２２は、例えば、画面座標｛Ｘ，Ｙ｝のペアとして、ＵＩ５８内の候補配置を生成し得る。そのような実施形態は、テキストラベルなどの信頼可能なアンカーが通常、対象ＵＩ５８のそれぞれの自然言語のデフォルトの読取り方向に応じて、それらの関連する対象の隣に、例えば、それらの左に、又は真上若しくは真下に見られるという観察に依存する。したがって、いくつかの実施形態は、潜在的なアンカー要素を、そのような配置に明示的に探してもよい。図７は、複数の候補配置６５ａ～ｄを示している。そのような候補配置は、対象ＵＩ要素の画面位置（図７においてアイテム６４として示す）に従って、及び／又は対象要素のサイズに従って、決定され得る。いくつかの実施形態では、候補配置は、例えば決定論的コンポーネントとランダムコンポーネントとの和としてランダムに生成される。 7-8 show an alternative method of automatically selecting an anchor UI element. In contrast to the above-mentioned method in which application 46 generates a set of candidate elements and then evaluates their suitability as an anchor according to their position relative to the target element, step 222 may generate candidate placements within UI 58, e.g., as pairs of screen coordinates {X,Y}. Such an embodiment relies on the observation that reliable anchors, such as text labels, are typically found next to their associated targets, e.g., to the left of them, or directly above or below them, depending on the default reading direction of the respective natural language of target UI 58. Some embodiments may therefore explicitly look for potential anchor elements in such placements. FIG. 7 shows multiple candidate placements 65a-d. Such candidate placements may be determined according to the screen position of the target UI element (shown as item 64 in FIG. 7) and/or according to the size of the target element. In some embodiments, the candidate placements are generated randomly, e.g., as a sum of a deterministic component and a random component.

次に、ステップ２２４は、ほぼ候補配置に配置されるすべてのＵＩ要素を識別し得る。いくつかの実施形態では、それぞれの配置がそれぞれの要素の画面境界内にある場合に、要素は、特定の配置に配置されると見なされる。別の実施形態は、それぞれの要素の中心／重心と、それぞれの配置との間の距離が、所定の閾値よりも小さい場合、要素が特定の配置に配置されると考えられてもよい。図７の例では、ＵＩ要素６６は、候補配置６５ａに配置されていると考えられ得る。いくつかの実施形態では、ステップ２２４は、ＯＳ４０のネイティブ機能に呼出しを発行することを含み、そのそれぞれの機能は、画面の特定の領域を占有するＵＩ要素のリストを返すように構成される。どのＵＩ要素が候補配置に配置されるかを決定する他の方法は、それぞれのＵＩの基礎となるソースコード（例えば、ＨＴＭＬスクリプト、スタイルシート）を解析することを含む。 Step 224 may then identify all UI elements that are approximately located in the candidate arrangement. In some embodiments, an element may be considered to be located in a particular arrangement if the respective arrangement is within the screen bounds of the respective element. Another embodiment may consider an element to be located in a particular arrangement if the distance between the center/centre of gravity of the respective element and the respective arrangement is less than a predefined threshold. In the example of FIG. 7, UI element 66 may be considered to be located in candidate arrangement 65a. In some embodiments, step 224 includes issuing a call to a native function of OS 40, the respective function configured to return a list of UI elements that occupy a particular region of the screen. Other methods of determining which UI elements are located in the candidate arrangements include parsing the underlying source code (e.g., HTML scripts, style sheets) of the respective UI.

ＵＩ要素がそれぞれの候補配置に配置されていない場合、いくつかの実施形態は、別の候補配置を生成するためにステップ２２２に戻る。そうでなければ、ステップ２２６において、スクリプト作成アプリケーション４６は、アンカー適応度基準のセットに従ってＵＩ要素の識別されたセットをフィルタリングしてもよい。そのような基準は、とりわけ、視認性（例えば、可視ＵＩ要素のみがアンカーとして選択され得る）、及び要素タイプ（例えば、テキスト要素が、他のタイプのＵＩ要素よりも好まれ得る）を含んでもよい。他の適応度基準は、図５～図６に関連して上述したものと同様であってもよい。例えば、それぞれのＵＩ要素が対象要素と位置合わせされているか否か、それぞれのＵＩ要素が対象要素と実質的に重複しているか否かなどに従って、アプリケーション４６は、位置スコアを評価してもよい。 If the UI elements are not placed in the respective candidate placements, some embodiments return to step 222 to generate another candidate placement. Otherwise, in step 226, the scripting application 46 may filter the identified set of UI elements according to a set of anchor suitability criteria. Such criteria may include, among others, visibility (e.g., only visible UI elements may be selected as anchors), and element type (e.g., text elements may be preferred over other types of UI elements). Other suitability criteria may be similar to those described above in connection with FIGS. 5-6. For example, the application 46 may evaluate the position score according to whether the respective UI elements are aligned with the target element, whether the respective UI elements substantially overlap with the target element, etc.

候補配置に配置されるＵＩ要素のいずれもアンカーに適応していると見なされない場合（例えば、所定の閾値を超える適応度スコアを受け取っていない場合）、いくつかの実施形態は、ステップ２２２に戻って、別の候補配置を生成してもよい。そうでなければ、ステップ２３２は、それぞれの対象要素に関連するアンカーとして適格なＵＩ要素を選択し得る。 If none of the UI elements placed in the candidate arrangements are deemed suitable for the anchor (e.g., if they do not receive a suitability score above a predefined threshold), some embodiments may return to step 222 to generate another candidate arrangement. Otherwise, step 232 may select a UI element that qualifies as an anchor associated with the respective target element.

対象要素及び／又はアンカーＵＩ要素の識別に応答して、一連のステップ１１２～１１４（図４）において、スクリプト作成アプリケーション４６は、それぞれの対象要素及びアンカー要素の要素特性機能のセットを決定し得る。本発明のいくつかの実施形態による、そのような要素特性機能を図９に示しており、その要素特性機能は、とりわけ、対象要素６４及びアンカー要素６６をそれぞれ特徴付ける要素ＩＤ８０ａ～ｂのセット、要素テキスト８２ａ～ｂのセット、及び要素画像８４ａ～ｂのセットを含む。 In response to identifying the target element and/or anchor UI element, in a series of steps 112-114 (FIG. 4), the scripting application 46 may determine a set of element property features for each target element and anchor element. Such element property features, according to some embodiments of the invention, are illustrated in FIG. 9 and include, among other things, a set of element IDs 80a-b, a set of element texts 82a-b, and a set of element images 84a-b that characterize the target element 64 and the anchor element 66, respectively.

要素ＩＤ８０ａ～ｂは、オペレーティングシステム及び／又はそれぞれのビジネスアプリケーション４２への各ＵＩ要素を、例えば、ＲＰＡクライアント１０がそれぞれのユーザーインターフェースを表現及び／又はレンダリングするために使用するオブジェクトの階層内の特定のオブジェクトとして、識別する。いくつかの実施形態では、要素ＩＤ８０ａ～ｂは、例えば属性と値とのペアのセットとして、インターフェース５８のソースコードに含まれる。ユーザーインターフェースのソースコードという用語は、本明細書では、それぞれのユーザーインターフェースによって表示されるコンテンツのプログラム表現を示すと理解される。ソースコードは、プログラミング言語で書かれたプログラム／スクリプト、並びにＲＰＡクライアント１０のメモリに存在するデータ構造を包含し得る。例示的なソースコードは、ウェブ・ブラウザ・アプリケーションによってウェブページとしてレンダリングされるＨＴＭＬ文書を備える。 The element IDs 80a-b identify each UI element to the operating system and/or respective business application 42 as a particular object within a hierarchy of objects used by the RPA client 10 to represent and/or render the respective user interface. In some embodiments, the element IDs 80a-b are included in the source code of the interface 58, e.g., as a set of attribute-value pairs. The term source code of a user interface is understood herein to indicate a programmatic representation of the content displayed by the respective user interface. Source code may encompass programs/scripts written in a programming language as well as data structures residing in the memory of the RPA client 10. Exemplary source code comprises an HTML document that is rendered as a web page by a web browser application.

最新のコンピューティングプラットフォームでは、オペレーティングシステムは通常、ＵＩツリーとして一般に知られている階層データ構造として、各ユーザーインターフェースを表す。例示的なＵＩツリーは、ブラウザアプリケーションによってレンダリングされたウェブページの基礎となる文書オブジェクトモデル（ＤＯＭ）を備える。図１０は、複数のノード７２ａ～ｅを有する例示的なＵＩツリー７０を示している。いくつかの実施形態では、各ノード７２ａ～ｅは、ＵＩ５８の一部を表すオブジェクトを備える。図５に示すような例示的なＵＩでは、ルートノード７２ａは、ＵＩウィンドウ全体を表してもよい。その子ノードは、個々のＵＩ要素（例えば、テキストボックス、ラベル、フォームフィールド、ボタンなど）、要素のグループ、それぞれのＵＩの別個の領域又はブロックなどを表してもよい。図１０のノード７２ｂなどの中間ノードは、そのすべての入力フィールド、ラベル及びボタンを含むフォーム全体を表してもよい。例えば、ノード７２ｃは、ＨＴＭＬ文書の＜ｆｏｒｍ＞又は＜ｆｉｅｌｄｓｅｔ＞コンテナのコンテンツを表してもよい。中間ノードの別の例は、＜ｄｉｖ＞又は＜ｓｐａｎ＞ＨＴＭＬコンテナのコンテンツを表してもよい。中間ノードの更に別の例は、文書のヘッダ又はフッタのコンテンツを備える。７２ｂ、７２ｄ、及び７２ｅなどのエンドノード（当該技術分野ではリーフノードとしても知られる）は、更なる子ノードを持たないノードであり、個々のＵＩ要素（例えば、ボタン、個々のラベル、個々の入力フィールド）を表してもよい。ウェブブラウザＵＩの一例では、 In modern computing platforms, operating systems typically represent each user interface as a hierarchical data structure commonly known as a UI tree. An exemplary UI tree comprises the document object model (DOM) underlying a web page rendered by a browser application. FIG. 10 illustrates an exemplary UI tree 70 having multiple nodes 72a-e. In some embodiments, each node 72a-e comprises an object that represents a portion of the UI 58. In the exemplary UI as shown in FIG. 5, the root node 72a may represent the entire UI window. Its child nodes may represent individual UI elements (e.g., text boxes, labels, form fields, buttons, etc.), groups of elements, separate areas or blocks of the respective UI, etc. An intermediate node, such as node 72b in FIG. 10, may represent an entire form with all its input fields, labels, and buttons. For example, node 72c may represent the contents of a <form> or <fieldset> container in an HTML document. Another example of an intermediate node may represent the contents of a <div> or <span> HTML container. Yet another example of an intermediate node comprises the contents of a document header or footer. End nodes (also known in the art as leaf nodes), such as 72b, 72d, and 72e, are nodes that have no further child nodes and may represent individual UI elements (e.g., buttons, individual labels, individual input fields). In one example of a web browser UI:

いくつかの実施形態では、各ノード７２ａ～ｅは、例えば、とりわけ、それぞれのノードの親ノードの識別情報、それぞれのノードの子ノードの識別情報、名前、及びそれぞれのノードによって表されるＵＩ要素のタイプを示し得る属性と値とのペアのセットを使用して指定される。 In some embodiments, each node 72a-e is specified using a set of attribute-value pairs that may indicate, for example, among other things, the identity of the respective node's parent node, the identity of the respective node's child node, a name, and the type of UI element represented by the respective node.

いくつかの実施形態では、ＵＩ要素を特徴付ける要素ＩＤは、ＵＩツリー７０内のノードの配置を集合的に示すノード識別子のセットを備え、そのそれぞれのノードは、それぞれのＵＩ要素を表す。そのような一例では、要素ＩＤ８０ｃは、本明細書では、サブツリー（図１０の例示的なサブツリー７４ａ～ｄを参照）と呼ばれるＵＩツリー７０のノードのサブセットを示す。したがって、要素ＩＤ８０ｃは、ノード／ＵＩ要素を、それぞれのサブツリーに属するものとして識別する。例えば、ノード７２ｄはサブツリー７４ｃに属する。例示的な要素ＩＤ８０ｃは、「ｕｉｄｏｕｂｌｅ．ｅｘｅ」と呼ばれるアプリケーションのウィンドウ内で見える「承認」と呼ばれる「プッシュボタン」としてそれぞれのＵＩ要素を識別する属性と値とのペアのセットを含む。要素ＩＤ８０ｃの図示したフォーマットは、一例としてのみ提供され、当業者であれば、属性と値とのペアのリストの他に、ＵＩツリー内の特定のノードの配置を表す複数の他の方法があり得ることを理解するであろう。 In some embodiments, the element IDs characterizing the UI elements comprise a set of node identifiers that collectively indicate the placement of the nodes in the UI tree 70, with each node representing a respective UI element. In one such example, the element ID 80c indicates a subset of the nodes of the UI tree 70, referred to herein as a subtree (see example subtrees 74a-d in FIG. 10). Thus, the element ID 80c identifies the node/UI element as belonging to the respective subtree. For example, node 72d belongs to subtree 74c. The example element ID 80c includes a set of attribute-value pairs that identify the respective UI element as a "push button" called "Accept" that is visible in a window of an application called "uidouble.exe". The illustrated format of the element ID 80c is provided by way of example only, and one of ordinary skill in the art would understand that there may be multiple other ways of representing the placement of a particular node in a UI tree besides a list of attribute-value pairs.

いくつかの実施形態では、対象要素及びアンカー要素をそれぞれ特徴付ける要素ＩＤ８０ａ～ｂを決定することは、対象ユーザーインターフェース５８のソースコード（例えば、ＨＴＭＬ文書）を解析することと、例えば各ＵＩ要素に関連する属性と値とのペアのセットとして、それぞれの要素ＩＤを抽出することと、を含む。 In some embodiments, determining the element IDs 80a-b characterizing the target element and anchor element, respectively, includes parsing the source code (e.g., an HTML document) of the target user interface 58 and extracting the respective element IDs, e.g., as a set of attribute-value pairs associated with each UI element.

いくつかの実施形態では、各要素テキスト８２ａ～ｂ（図９）は、それぞれのＵＩ要素の画面境界内に表示されるテキスト（一連の英数字文字）のコンピュータ符号化を含む。図示の例では、対象要素６４がいかなるテキストも表示しないため、要素テキスト８２ａは値ＮＵＬＬを有する。一方、要素テキスト８２ｂは、テキスト「現金預入」で構成される。テキストのコンピュータ符号化は、例えば、一連の数字コード（例えば、ユニコード）を含んでもよく、各コードは、要素テキスト８２ａ～ｂの別個の文字に対応する。 In some embodiments, each element text 82a-b (FIG. 9) comprises a computer encoding of text (a series of alphanumeric characters) to be displayed within the screen bounds of the respective UI element. In the illustrated example, element text 82a has the value NULL because target element 64 does not display any text. Element text 82b, on the other hand, consists of the text "Cash Deposit." The computer encoding of the text may comprise, for example, a series of numeric codes (e.g., Unicode), each code corresponding to a distinct character of element text 82a-b.

スクリプト作成アプリケーション４６の実施形態は、様々な方法を使用して要素テキスト８２ａ～ｂを決定してもよい。アプリケーション４６がＵＩ５８のソースコードにアクセスする場合、アプリケーション４６は、それぞれのソースコードから要素テキスト８２ａ～ｂを抽出しようと試みてもよい。例えば、ウェブページのボタン上に表示されるラベルは、それぞれのウェブページに関連するＨＴＭＬ文書を構文解析することによって見つけられ得る。他のビジネスアプリケーション４２の場合、スクリプト作成アプリケーション４６は、ＯＳ４０及び／又はビジネスアプリケーション４２のデータ構造を解析して、要素テキスト８２ａ～ｂがＵＩ５８のソースコードに含まれるか否かを判定してもよい。 Embodiments of the scripting application 46 may determine the element text 82a-b using various methods. If the application 46 has access to the source code of the UI 58, the application 46 may attempt to extract the element text 82a-b from the respective source code. For example, labels displayed on buttons on a web page may be found by parsing the HTML document associated with the respective web page. For other business applications 42, the scripting application 46 may analyze data structures of the OS 40 and/or the business application 42 to determine whether the element text 82a-b is included in the source code of the UI 58.

代替の実施形態では、アプリケーション４６は、光学文字認識（ＯＣＲ）コンピュータプログラムなどの画像解析ツールを採用して、要素テキスト８２ａ～ｂを判定してもよい。そのような一例では、ＯＣＲツールは、それぞれの対象要素及び／又はアンカーＵＩ要素を含む画面領域の画像を入力し、テキストトークン（例えば、単語）のセット、及び各テキストトークンについて決定されたバウンディングボックスを返してもよい。例示的なバウンディングボックスは、とりわけ、それぞれのテキストトークンに外接する多角形、及びそれぞれのトークンの凸包を含む。図９では、テキスト「現金預入」を囲む破線の矩形によって、バウンディングボックスを示している。テキストトークン及びバウンディングボックスの受取りに応答して、アプリケーション４６は、任意のバウンディングボックスがそれぞれのＵＩ要素と実質的に重複するか否かを判定し、重複する場合、それぞれの対象要素又はアンカーＵＩ要素を特徴付けるテキスト要素８２として、それぞれのバウンディングボックス内に配置されるテキストトークンを選択し得る。それぞれのバウンディングボックスの十分な割合（例えば、５０％超、通常８０～１００％）が、それぞれのＵＩ要素の画面境界内に配置される場合、実質的な重複が確立され得る。 In an alternative embodiment, application 46 may employ an image analysis tool, such as an optical character recognition (OCR) computer program, to determine element text 82a-b. In one such example, the OCR tool may input an image of a screen region including each target element and/or anchor UI element and return a set of text tokens (e.g., words) and a bounding box determined for each text token. Exemplary bounding boxes include, among other things, a polygon circumscribing each text token and a convex hull of each token. In FIG. 9, the bounding box is illustrated by a dashed rectangle that encloses the text "Cash Deposit." In response to receiving the text tokens and bounding boxes, application 46 may determine whether any bounding boxes substantially overlap the respective UI elements and, if so, select the text tokens located within the respective bounding boxes as text elements 82 that characterize the respective target element or anchor UI element. If a sufficient percentage of each bounding box (e.g., more than 50%, typically 80-100%) is located within the screen boundaries of the respective UI element, a substantial overlap can be established.

いくつかの実施形態では、ＵＩ要素を特徴付ける各要素画像８４ａ～ｂ（図９）は、それぞれのＵＩ要素の境界内の画面上に表示された画像のコンピュータ符号化を含む。画像のコンピュータ符号化は、場合によっては複数のチャネル（例えば、ＲＧＢ）にわたる、それぞれの画面領域に対応する画素値のアレイ、及び／又は画素値のそれぞれのアレイに従って計算された値のセット（例えば、画素値のそれぞれのアレイのＪＰＥＧ又はウェーブレット表現）を含み得る。各要素画像８４ａ～ｂを決定することは、ＵＩ５８のクリッピング、すなわち、それぞれのＵＩ要素を示すＵＩ５８の限定された領域のコンテンツをグラブすることを含み得る。 In some embodiments, each element image 84a-b (FIG. 9) characterizing a UI element includes a computer encoding of an image displayed on the screen within the boundaries of the respective UI element. The computer encoding of the image may include an array of pixel values, possibly across multiple channels (e.g., RGB), corresponding to the respective screen region, and/or a set of values calculated according to the respective array of pixel values (e.g., a JPEG or wavelet representation of the respective array of pixel values). Determining each element image 84a-b may include clipping the UI 58, i.e., grabbing the content of a limited region of the UI 58 that represents the respective UI element.

更なるステップ１１６（図４）において、スクリプト作成アプリケーション４６は、選択されたＲＰＡアクティビティに対応するＲＰＡスクリプトを定式化し得る。換言すれば、ステップ１１６において、アプリケーション４６は、実行時に使用されるべきロボットのコードを、例えばスクリプトファイルに出力する。ＲＰＡスクリプト５０は、当技術分野で知られている任意のコンピュータ可読符号化で、例えばＸＭＬのバージョンで、定式化されてもよく、又は一連のネイティブプロセッサ命令（例えば、マシンコード）にコンパイルされてもよい。 In a further step 116 (FIG. 4), the scripting application 46 may formulate an RPA script corresponding to the selected RPA activity. In other words, in step 116, the application 46 outputs the code of the robot to be used at run time, for example in a script file. The RPA script 50 may be formulated in any computer-readable encoding known in the art, for example in a version of XML, or may be compiled into a sequence of native processor instructions (for example machine code).

各アクティビティ／オートメーション化ステップについて、作成アプリケーション４６は、ＲＰＡスクリプト５０に、それぞれのアクティビティ（例えば、クリックする、タイプ入力するなど）のインジケータを出力し、更に、ステップ１０８～１１０で決定された対象要素及びアンカーＵＩ要素を特徴付ける要素ＩＤ８０ａ～ｂ、要素テキスト８２ａ～ｂ、及び要素画像８４ａ～ｂの符号化を更に出力し得る。特性機能の符号化は、特性データ自体、及び／又はそのようなデータの他の表現、例えば、要素特性データがリモートにアクセスされ得るネットワーク配置のインジケータ（例えば、ＵＲＬ、ネットワークアドレス）を含み得る。 For each activity/automation step, the authoring application 46 outputs to the RPA script 50 an indicator of the respective activity (e.g., clicking, typing, etc.) and may further output encodings of element IDs 80a-b, element text 82a-b, and element images 84a-b that characterize the target element and anchor UI element determined in steps 108-110. The encodings of the property functions may include the property data itself and/or other representations of such data, e.g., an indicator of a network location (e.g., URL, network address) where the element property data may be accessed remotely.

いくつかの実施形態では、アプリケーション４６は、例えば属性と値とのペアのセットを使用して、それぞれのアクティビティを構成するためのパラメータ値のセットを、ＲＰＡスクリプト５０に更に出力してもよい。一例示的なパラメータは、ＲＰＡスクリプト５０に保存された設計時要素画像８４を、候補ＵＩ要素（図１１～図１２に関連して以下の詳細を参照）の実行時画像と比較するための閾値を示す一致精度である。別の例示的なパラメータは、ロボット４４が実行時及び／又は対象ＵＩ要素を識別しようと試みるために費やし得る最大時間量を示すタイムアウト閾値である。 In some embodiments, the application 46 may further output a set of parameter values to the RPA script 50 for configuring each activity, for example using a set of attribute-value pairs. One exemplary parameter is a match accuracy that indicates a threshold for comparing the design-time element image 84 stored in the RPA script 50 with the runtime images of the candidate UI elements (see details below in connection with FIGS. 11-12). Another exemplary parameter is a timeout threshold that indicates a maximum amount of time the robot 44 may spend at runtime and/or attempting to identify a target UI element.

オートメーション化の設計段階が完了すると、ＲＰＡスクリプト５０は、遂行のために、スクリプトリポジトリ１５に送られてもよく、及び／又は他のＲＰＡクライアントに配信されてもよい（例えば、図１を参照）。図１１は、実行時にＲＰＡロボット４４によって実行される例示的な一連のステップを示している。ＲＰＡスクリプト５０の受取りに応答して、ステップ３０４は、ＲＰＡスクリプト５０のコンテンツに従って、実施されるべきアクティビティのタイプを決定する。ステップ３０４は、ＲＰＡスクリプト５０に従って、それぞれのロボットがインタラクトするように構成される、対象ＵＩ及び／又は実行時ビジネスアプリケーション（例えば、ＭＳＥｘｃｅｌ（登録商標）、ＧｏｏｇｌｅＣｈｒｏｍｅ（登録商標）など）を更に決定してもよい。ステップ３０６において、ＲＰＡロボット４４は、例えば、ローカル・クライアント・マシン上のそれぞれのビジネスアプリケーションのインスタンスを呼び出すことによって、それぞれの対象ＵＩを公開し得る。更なるステップ３０８は、ＲＰＡスクリプト５０に記憶された情報に従って、それぞれのアクティビティの実行時対象ＵＩ要素を自動的に識別し得る。実行時対象ＵＩ要素は、それぞれのアクティビティのオペランド、すなわち、ロボット４４が作用（例えば、クリックする、何らかのテキストを入力する、コンテンツをグラブするなど）するように構成された実行時対象ＵＩのＵＩ要素を備える。ステップ３０８の遂行は、以下に詳細に説明する。実行時対象ＵＩ要素を首尾よく識別したことに応答して、ステップ３１０は、スクリプト化されたアクティビティを自動的に実行する、すなわち、ＲＰＡスクリプト５０に示されるようにそれぞれのＵＩ要素とインタラクトし得る。 Once the automation design phase is complete, the RPA script 50 may be sent to the script repository 15 and/or distributed to other RPA clients for execution (see, e.g., FIG. 1). FIG. 11 shows an exemplary sequence of steps performed by the RPA robot 44 at runtime. In response to receiving the RPA script 50, step 304 determines the type of activity to be performed according to the content of the RPA script 50. Step 304 may further determine the target UI and/or the runtime business application (e.g., MS Excel, Google Chrome, etc.) with which the respective robot is configured to interact according to the RPA script 50. In step 306, the RPA robot 44 may expose the respective target UI, for example, by invoking an instance of the respective business application on the local client machine. A further step 308 may automatically identify the runtime target UI elements of the respective activity according to the information stored in the RPA script 50. The run-time target UI elements comprise the operands of the respective activity, i.e., the UI elements of the run-time target UI that the robot 44 is configured to act on (e.g., click, enter some text, grab content, etc.). Performance of step 308 is described in more detail below. In response to successfully identifying the run-time target UI elements, step 310 may automatically execute the scripted activity, i.e., interact with the respective UI elements as shown in the RPA script 50.

図１２は、本発明のいくつかの実施形態による、実行時対象ＵＩ要素を自動的に識別するために、ロボット４４によって実施される例示的な一連のステップを示している。ステップ３１２において、ロボット４４は、現在のアクティビティの対象のタイプと一致するＵＩ要素を検出し得る。例えば、それぞれのアクティビティがフォームフィールドへのタイプ入力を含む場合、ステップ３１２は、実行時ＵＩ内のフォームフィールドのセットを識別することを含んでもよい。ステップ３１２は、例えばコンピュータビジョン（例えば、ボタン、テキストボックス、入力フィールドなどの様々なＵＩ要素を自動的に認識するようにトレーニングされたニューラルネットワーク）を使用して、実行時対象ＵＩの下にあるソースコードを解析すること、及び／又は実行時ＵＩの画面上画像に従ってＵＩ要素を識別すること、を含むことができる。意図した対象要素及び／又はアンカー要素がテキストを備える場合、いくつかの実施形態は、ＯＣＲ技術を更に採用して、テキスト要素を自動的に検出し、更にそれぞれのテキスト要素のためのバウンディングボックスを構築してもよい。 12 illustrates an exemplary sequence of steps performed by the robot 44 to automatically identify run-time target UI elements according to some embodiments of the present invention. In step 312, the robot 44 may detect UI elements that match the target type of the current activity. For example, if the respective activity involves typing into form fields, step 312 may include identifying a set of form fields in the run-time UI. Step 312 may include analyzing source code underlying the run-time target UI, for example using computer vision (e.g., a neural network trained to automatically recognize various UI elements such as buttons, text boxes, input fields, etc.), and/or identifying UI elements according to an on-screen image of the run-time UI. If the intended target element and/or anchor element comprises text, some embodiments may further employ OCR technology to automatically detect text elements and further construct bounding boxes for each text element.

次に、ステップ３１２によって返されたＵＩ要素のセット内を見て、ステップ３１４は、要素ＩＤに従って、実行時対象ＵＩ要素を識別しようと試み得る（図９～図１０に関する上記の説明を参照）。いくつかの実施形態では、ステップ３１４は、ステップ３１２によって返されたセット内の各ＵＩ要素の要素ＩＤを決定することと、それぞれの要素ＩＤを、設計サイド対象要素（例えば、図１０の要素ＩＤ８０ａ）の要素ＩＤと比較する、すなわち、対象を特徴付けるものとしてＲＰＡスクリプト５０によって指定された要素ＩＤと比較することと、を含む。ステップ３１６は、任意の要素ＩＤが、現在のアクティビティの意図した対象の要素ＩＤと一致するか否かを判定し、一致する場合、ステップ３１８は、一致するＵＩ要素を実行時対象として選択し得る。いくつかの実施形態では、ステップ３１６は、２つの要素ＩＤ間で、厳密に一致するものがあるか否かを判定する。要素ＩＤが属性と値とのペアのセットを使用して指定される場合で、対応する属性のすべての値が同一である場合に、完全に一致するのがあり得る。 Next, looking within the set of UI elements returned by step 312, step 314 may attempt to identify run-time target UI elements according to element IDs (see above discussion regarding FIGS. 9-10). In some embodiments, step 314 includes determining the element ID of each UI element in the set returned by step 312 and comparing the respective element IDs with element IDs of design-side target elements (e.g., element ID 80a in FIG. 10), i.e., with element IDs specified by the RPA script 50 as characterizing the target. Step 316 may determine whether any element IDs match element IDs of intended targets of the current activity, and if so, step 318 may select the matching UI element as the run-time target. In some embodiments, step 316 determines whether there is an exact match between two element IDs. An exact match may occur when element IDs are specified using a set of attribute-value pairs, where all values of the corresponding attributes are identical.

しかしながら、設計時と実行時との間に発生する対象ユーザーインターフェースの不定期の変化に起因して、実行時対象ＵＩのＵＩ要素が、意図した対象の設計時要素ＩＤと一致しないことが起こり得る。例えば、フォームフィールドの名前が変更された場合がある。ＵＩ要素がＲＰＡスクリプト５０に示される要素ＩＤと一致しない場合、ロボット４４は、利用可能な情報から現在のアクティビティの対象／オペランドを自動的に推論してもよい。本発明のいくつかの実施形態は、要素テキスト８２及び要素画像８４を、要素ＩＤが一致しない場合に、実行時対象を識別するための代替のフォールバックデータとして使用する。 However, due to occasional changes in the target user interface that occur between design time and run time, it may happen that a UI element in the run-time target UI does not match the design-time element ID of the intended target. For example, a form field may have been renamed. If a UI element does not match the element ID indicated in the RPA script 50, the robot 44 may automatically infer the target/operand of the current activity from available information. Some embodiments of the present invention use the element text 82 and element image 84 as alternative fallback data for identifying the run-time target when the element ID does not match.

そのような一例では、一連のステップ３２２～３２４は、設計サイド対象要素及びアンカー要素に対してそれぞれＲＰＡスクリプト５０で指定された要素ＩＤに従って、候補実行時対象要素のセット及び候補実行時アンカー要素のセットを組み立ててもよい。「候補」という用語は、本明細書では、ＵＩ要素を示すために使用され、その要素ＩＤは、意図した対象要素又はアンカー要素のものとそれぞれ同じである。類似性は、様々な方法で決定され得る。例示的な一実施形態では、ロボット４４は、正規表現を使用して、２つの要素ＩＤが部分的に一致するか否かを判定してもよい。例示的な正規表現手法では、機能の特定のサブセットが両方の要素ＩＤにおいて同一である場合（例えば、要素タイプが同じであるが、要素名が異なる場合）、２つの要素ＩＤは類似していると見なされる。要素ＩＤがＵＩツリー内の要素の位置を示す一実施形態では、正規表現を使用する部分一致戦略により、ロボット４４は、特定のサブツリー内の候補を検索すること、例えば、それらの要素ＩＤで指定された同じルートノードを有する候補のみを選択することができる（例えば、図１０に関連して上述した説明を参照）。例えば、ＲＰＡクライアント１０が同時に実行されるビジネスアプリケーションの複数のインスタンスを有し、それらのうちの一方のみが意図した対象要素を有する場合に、この状況が生じ得る。固定ノードを用いて候補対象要素を探索することにより、ロボット４４は、候補について、それぞれのＵＩウィンドウのすべてを探索することができる。 In one such example, a series of steps 322-324 may assemble a set of candidate runtime target elements and a set of candidate runtime anchor elements according to element IDs specified in the RPA script 50 for the design-side target elements and anchor elements, respectively. The term "candidate" is used herein to denote a UI element whose element ID is the same as that of the intended target element or anchor element, respectively. Similarity may be determined in various ways. In one exemplary embodiment, the robot 44 may use regular expressions to determine whether two element IDs partially match. In an exemplary regular expression approach, two element IDs are considered similar if a certain subset of features is identical in both element IDs (e.g., the element type is the same but the element names are different). In one embodiment where the element ID indicates the location of the element in the UI tree, a partial matching strategy using regular expressions allows the robot 44 to search for candidates in a particular subtree, e.g., select only candidates that have the same root node specified in their element IDs (e.g., see the discussion above in connection with FIG. 10). For example, this situation may arise when the RPA client 10 has multiple instances of a business application running simultaneously, only one of which has the intended target element. By searching for candidate target elements using fixed nodes, the robot 44 can search all of the respective UI windows for candidates.

別の例示的な候補選択戦略は、２つの要素ＩＤの間で異なる機能の計算に従って、２つの要素ＩＤが類似しているか否かを判定してもよい。そのような手法は、例えば、２つの要素ＩＤ間のレーベンシュタイン距離を決定し、それぞれの距離を所定の閾値と比較してもよい。閾値未満の距離だけ離れた要素ＩＤは、類似していると見なされ得る。いくつかの実施形態では、閾値は、設計時に指定され、ＲＰＡスクリプト５０に含まれてもよい。正規表現を使用した部分一致方法とは対照的に、レーベンシュタイン距離を使用する方法は、比較される２つの要素ＩＤ間でどの機能が異なっているかということに影響されない可能性がある。 Another exemplary candidate selection strategy may determine whether two element IDs are similar according to a calculation of features that differ between the two element IDs. Such an approach may, for example, determine the Levenshtein distance between the two element IDs and compare the respective distances to a predefined threshold. Element IDs that are separated by a distance less than the threshold may be considered similar. In some embodiments, the threshold may be specified at design time and included in the RPA script 50. In contrast to partial matching methods using regular expressions, methods using the Levenshtein distance may not be sensitive to which features differ between the two element IDs being compared.

候補実行時対象のセット及び候補実行時アンカー要素のセットの選択に応答して、ロボット４４のいくつかの実施形態は、候補をペア（例えば、対象候補とアンカー候補とのすべての組合せ）で評価して、最も可能性の高い実行時対象を決定してもよい。いくつかの実施形態では、一連のステップ３３０～３３２は、それぞれの要素の相対画面位置に従って、及びそれぞれのペアの各メンバのコンテンツ（要素テキスト及び／又は要素画像）に従って、各ペアを評価してもよい。 In response to selecting a set of candidate runtime targets and a set of candidate runtime anchor elements, some embodiments of the robot 44 may evaluate the candidates in pairs (e.g., all combinations of candidate targets and candidate anchors) to determine the most likely runtime target. In some embodiments, a series of steps 330-332 may evaluate each pair according to the relative screen positions of the respective elements and according to the content (element text and/or element image) of each member of each pair.

候補の各ペアについて、いくつかの実施形態は、候補対象が、意図した実行時対象要素である可能性を示すそれぞれの対象とアンカー候補とのペアの位置スコアを評価してもよい（ステップ３３０）。換言すれば、ステップ３３０において、いくつかの実施形態は、対象候補要素及びアンカー候補要素の相対位置に従って、対象候補が真の意図した実行時対象であり、更にアンカー候補がＲＰＡスクリプトで指定されたアンカー要素である可能性を判定する。 For each pair of candidates, some embodiments may evaluate a location score for each target and anchor candidate pair that indicates the likelihood that the candidate target is the intended run-time target element (step 330). In other words, in step 330, some embodiments determine the likelihood that the target candidate is the true intended run-time target and that the anchor candidate is the anchor element specified in the RPA script according to the relative locations of the target candidate element and the anchor candidate element.

例示的な位置スコアは、様々な基準に従って、例えば、候補アンカーと候補対象との間の距離に従って、決定されてもよい。図１３は、本発明のいくつかの実施形態による、候補対象要素６８（この例では、入力フィールド）と候補アンカー要素６９（ラベル）とを隔てる例示的な距離のセットを示している。画面の主座標（例えば、水平及び垂直）に沿って、それぞれの要素の中心／重心間の距離ｄ１及びｄ２を測定することができる。ＯＣＲを使用して検出されたテキスト要素の場合、距離は、それぞれのテキスト要素に外接するバウンディングボックスの中心又は重心まで測定され得る。マンハッタン距離、ユークリッド距離などの他の例示的な要素間距離を、ｄ１及びｄ２に従って評価することができる。いくつかの実施形態は、アンカー要素が通常、その対象要素の近傍に配置されるという観察に依存しているため、候補アンカーと候補対象との間の距離が大きいほど、それぞれのペアが設計時対象要素及びアンカー要素を表す可能性は低い。そのような実施形態では、例示的な位置スコアは、１／Ｄ又は（１－Ｄ／Ｄｍａｘ）に従って決定されてもよく、ここで、Ｄは、ｄ１及び／又はｄ２に従って決定される要素間距離を表し、Ｄｍａｘは、所定の閾値を表し、それを超えると２つのＵＩ要素が対象とアンカーとのペアである可能性が低いと考えられる。 Exemplary position scores may be determined according to various criteria, for example, according to the distance between the candidate anchor and the candidate target. FIG. 13 shows a set of exemplary distances separating a candidate target element 68 (in this example, an input field) and a candidate anchor element 69 (a label) according to some embodiments of the present invention. The distances d1 and d2 between the centers/centroids of the respective elements can be measured along the screen's primary coordinates (e.g., horizontal and vertical). For text elements detected using OCR, the distance can be measured to the center or centroid of a bounding box circumscribing the respective text element. Other exemplary inter-element distances, such as Manhattan distance, Euclidean distance, etc., can be evaluated according to d1 and d2. Some embodiments rely on the observation that an anchor element is usually placed in the vicinity of its target element, so the larger the distance between the candidate anchor and the candidate target, the less likely the respective pair represents the design-time target element and anchor element. In such an embodiment, an exemplary position score may be determined according to 1/D or (1-D/Dmax), where D represents the inter-element distance determined according to d1 and/or d2, and Dmax represents a predefined threshold above which the two UI elements are considered unlikely to be a target-anchor pair.

別の例示的な位置スコアは、候補アンカー要素と候補対象要素との間の位置合わせ度に従って決定されてもよい。位置合わせは、例えば図１４に示すように、距離の別のセットに従って決定されてもよい。例示的な距離ｄ３は、アンカー候補６９の左端と対象候補６８の左端とを隔てている。一方、距離ｄ４は、アンカー候補６９の上端と対象候補６８の上端とを隔てている。いくつかの実施形態は、アンカーが通常、それらの対象要素と位置合わせされるという観察に依存しているため、それぞれのアンカー候補要素及び対象候補要素が実際に対象とアンカーとのペアである比較的高い可能性と、比較的小さいｄ３又はｄ４距離は、関連付けられ得る。図１４は、左及び／又は上の位置合わせをテストするために使用され得る距離のみを示しており、当業者であれば、図示した距離測定値は、右及び／又は下の位置合わせをテストするように改良され得ることを理解するであろう。例示的な適応度スコアは、以下のように計算され得る。
ここで、δは、ｄ３及び／又はｄ４に従って決定された位置合わせ距離であり、δｍａｘは、所定の閾値であり、それを超えると２つのＵＩ要素が位置合わせされていないと見なされる。 Another exemplary position score may be determined according to the degree of alignment between the candidate anchor element and the candidate target element. The alignment may be determined according to another set of distances, for example as shown in FIG. 14. An exemplary distance d3 separates the left edge of the anchor candidate 69 from the left edge of the target candidate 68. Meanwhile, a distance d4 separates the top edge of the anchor candidate 69 from the top edge of the target candidate 68. Since some embodiments rely on the observation that anchors are usually aligned with their target elements, a relatively small d3 or d4 distance may be associated with a relatively high likelihood that the respective anchor candidate element and target candidate element are in fact a target and anchor pair. FIG. 14 only shows distances that may be used to test for left and/or top alignment, and one skilled in the art will understand that the distance measures shown may be refined to test for right and/or bottom alignment. An exemplary fitness score may be calculated as follows:
where δ is the alignment distance determined according to d3 and/or d4, and δmax is a predefined threshold beyond which two UI elements are considered to be misaligned.

別の例示的な位置スコアは、候補アンカーと候補対象との間の角度に従って決定されてもよい。図１５は、アンカー候補６９と対象候補６８との間の例示的な角度Ａを示しており、２つのそれぞれの要素の中心／重心を結ぶ直線の角度として決定される。いくつかの実施形態では、角度Ａは、図１３の表記を使用して、距離測定値、例えばＡ＝ｄ２／ｄ１に従って決定される。いくつかの実施形態では、角度は、対象候補とアンカー候補との位置合わせ度を決定するための手段として機能する。いくつかの実施形態は、対象候補とアンカー候補との間の実行時に計算された角度を、実際のアンカー要素と対象要素との間の設計時に決定された角度と比較することによって、位置スコアを更に計算してもよい。設計時角度は、ＲＰＡスクリプト５０に含まれてもよい。設計時角度と実行時角度との間の比較的小さい差は、現在の対象とアンカー候補とのペアが設計時対象要素及びアンカー要素とほぼ同じ相対位置にあることを示し、したがって、候補が真に求められる実行時対象要素及びアンカー要素である比較的高い可能性を示し得る。角度に従って決定される例示的な位置スコアは、１／｜Ａｄ－Ａｒ｜に従って決定されてもよく、ここで、Ａｄは、（例えば、ＲＰＡスクリプト５０で指定される）真のアンカー要素と対象要素との間の設計時に決定される角度を表し、Ａｒは、候補対象と候補アンカーとの間の実行時に決定される角度を表す。 Another exemplary location score may be determined according to the angle between the candidate anchor and the candidate target. FIG. 15 shows an exemplary angle A between the anchor candidate 69 and the target candidate 68, determined as the angle of a line connecting the centers/centroids of the two respective elements. In some embodiments, the angle A is determined according to a distance measure, e.g., A=d2/d1, using the notation of FIG. 13. In some embodiments, the angle serves as a means for determining the degree of alignment of the target candidate and the anchor candidate. Some embodiments may further calculate the location score by comparing the run-time calculated angle between the target candidate and the anchor candidate with the design-time determined angle between the actual anchor element and the target element. The design-time angle may be included in the RPA script 50. A relatively small difference between the design-time angle and the run-time angle may indicate that the current target and anchor candidate pair is in approximately the same relative position as the design-time target element and anchor element, and thus may indicate a relatively high likelihood that the candidate is the truly desired run-time target element and anchor element. An exemplary position score determined according to angle may be determined according to 1/|Ad-Ar|, where Ad represents the design-time determined angle between the true anchor element (e.g., specified in the RPA script 50) and the target element, and Ar represents the run-time determined angle between the candidate target and the candidate anchor.

更に別の例示的な位置スコアは、アンカー候補要素と対象候補要素との間の重複度に従って決定されてもよい。図１６は、本発明のいくつかの実施形態による、例示的な重複度６７を示しており、重複度６７は、一方の要素が他方の要素と交差する割合として、又は換言すれば、一方の要素が他方の要素とどの程度重なるかで、決定される。そのような実施形態では、交差しない２つの要素は重複がゼロであるが、一方の要素が他方を完全に含む２つの要素は、１００％の重複を有する。いくつかの実施形態は、ボタンラベルなどの特定のアンカーを識別するために、重複度に従って決定された位置スコアを使用する。ロボット４４がボタンタイプの対象要素を探しているそのような一例では、ロボットは、実質的な重複度を有しないすべての対象とアンカー候補とのペアを排除することができる（例えば、９０％超）。 Yet another exemplary location score may be determined according to the overlap between the anchor candidate element and the target candidate element. FIG. 16 illustrates an exemplary overlap 67, according to some embodiments of the present invention, determined as the percentage of one element that intersects with the other element, or in other words, how much one element overlaps with the other element. In such an embodiment, two elements that do not intersect have zero overlap, while two elements where one element completely contains the other have 100% overlap. Some embodiments use the location score determined according to the overlap to identify a particular anchor, such as a button label. In one such example where the robot 44 is looking for a button-type target element, the robot may eliminate all target-anchor candidate pairs that do not have a substantial overlap (e.g., more than 90%).

更なるステップ３３２（図１２）において、ロボット４４のいくつかの実施形態は、対象とアンカー候補とのペアのコンテンツスコアを決定する。コンテンツスコアは、対象候補及びアンカー候補の画面上コンテンツ（画像及び／又はテキスト）を、設計サイド対象及びアンカーのそれぞれのコンテンツと比較した結果に従って、決定され得る。いくつかの実施形態では、画面上コンテンツを比較することは、対象候補によって表示されたテキストと設計サイド対象要素によって表示されたテキストとの間の類似性の数値尺度と、アンカー候補によって表示されたテキストと設計サイドアンカー要素によって表示されたテキストとの間の類似性の別の尺度と、を評価することを含む。設計サイド対象要素及びアンカー要素の要素テキストは、ＲＰＡ５０で指定される（例えば、図９のアイテム８２ａ～ｂ及び関連する説明を参照）。２つのテキスト断片間の類似性は、例えばレーベンシュタイン距離を使用して評価されてもよく、比較的小さい距離は、比較される断片間の比較的高い類似性を示し得る。 In a further step 332 (FIG. 12), some embodiments of the robot 44 determine a content score for the target and anchor candidate pair. The content score may be determined according to a result of comparing the on-screen content (images and/or text) of the target and anchor candidates with the respective content of the design side targets and anchors. In some embodiments, comparing the on-screen content includes evaluating a numerical measure of similarity between the text displayed by the target candidate and the text displayed by the design side target element, and another measure of similarity between the text displayed by the anchor candidate and the text displayed by the design side anchor element. The element text of the design side target element and the anchor element are specified in the RPA 50 (see, e.g., items 82a-b in FIG. 9 and related discussion). The similarity between the two text fragments may be evaluated, for example, using the Levenshtein distance, where a relatively small distance may indicate a relatively high similarity between the compared fragments.

ステップ３３２は、対象候補の画像と設計サイド対象要素の画像との間の類似性の数値尺度と、アンカー候補の画像と設計サイドアンカー要素の画像との間の類似性の別の尺度と、を決定することを更に含んでもよい。設計サイド対象要素及びアンカー要素の要素画像は、ＲＰＡスクリプト５０で指定される（例えば、図９のアイテム８４ａ～ｂ及び関連する説明を参照）。２つの画像間の類似性のいくつかの尺度は、当技術分野で知られている。 Step 332 may further include determining a numerical measure of similarity between the image of the target candidate and the image of the design side target element, and another measure of similarity between the image of the anchor candidate and the image of the design side anchor element. The element images of the design side target elements and anchor elements are specified in the RPA script 50 (see, e.g., items 84a-b in FIG. 9 and associated discussion). Several measures of similarity between two images are known in the art.

テキストの類似性は、画像の類似性とは無関係に使用されてもよく、又はこれら２つは、集約コンテンツスコアにおいて組み合わされてもよい。テキスト又は対象要素若しくはアンカー要素の画像のいずれかが、設計と実行時との間で変化した状況において、画像及びテキストの態様を集約することは、実行時対象要素を識別するロバストな方法を提供し得る。そのような状況では、画像コンテンツが一致しない場合でも、又はその逆であっても、ロボット４４は、テキストコンテンツに従って２つのＵＩ要素が類似していると判定し得る。また、対象要素のみが設計時と実行時との間で変化している一方で、アンカーがほぼ同一のままである状況において、アンカーについて決定されたコンテンツスコアと、対象について決定されたコンテンツスコアとを組み合わせることは、ロバストな方法をもたらし得る。そのような状況では、ロボット４４は、候補アンカーのコンテンツに従って、実行時対象を識別し得る場合がある。 Text similarity may be used independently of image similarity, or the two may be combined in an aggregate content score. In situations where either the text or the image of the target or anchor element has changed between design and run time, aggregating image and text aspects may provide a robust method of identifying the run time target element. In such situations, the robot 44 may determine that two UI elements are similar according to the text content even if the image content does not match, or vice versa. Also, in situations where only the target element has changed between design and run time, while the anchor remains nearly identical, combining the content score determined for the anchor with the content score determined for the target may result in a robust method. In such situations, the robot 44 may be able to identify the run time target according to the content of the candidate anchor.

代替の実施形態では、ロボット４４は、対象とアンカー候補とのペアのフィルタとして、ステップ３３０を使用する。そのような一例では、各候補ペアについて、ロボット４４は、アンカー候補に対する対象候補の相対位置のインジケータのセット、例えば図１３～１４に関連して上述したような距離のセットを評価してもよい。対象候補及びアンカー候補が、例えば離れすぎている、及び／又は位置合わせされていないために、対象及びアンカー候補が、真に求められる実行時対象とアンカーとのペアである可能性が低いことを、評価した距離が示す場合、それぞれの対象とアンカー候補とのペアは、コンテンツスコア評価について、もはや考慮されない（ステップ３３２）。このような最適化は、画像解析が通常、リソース消費型であるため、実行時対象を識別する計算コストを大幅に低減し得る。 In an alternative embodiment, the robot 44 uses step 330 as a filter of object-candidate anchor pairs. In one such example, for each candidate pair, the robot 44 may evaluate a set of indicators of the relative position of the object candidate with respect to the anchor candidate, such as a set of distances as described above in connection with FIGS. 13-14. If the evaluated distances indicate that the object and anchor candidates are unlikely to be a truly desired run-time object-anchor pair, e.g., because they are too far apart and/or misaligned, the respective object-anchor candidate pair is no longer considered for content score evaluation (step 332). Such optimization may significantly reduce the computational cost of identifying run-time objects, since image analysis is typically resource-intensive.

ステップ３３６において、対象とアンカー候補とのペアのセットの各々に対して決定された位置スコア及び／又はコンテンツスコアに従って、ロボット４４は、ステップ３２２で識別された対象候補のセットから実行時対象を選択し得る。いくつかの実施形態では、ステップ３３６は、各ペアについての集約スコアを計算してもよく、集約スコアは、それぞれのペアについて決定された位置スコアとコンテンツスコアとの組合せである。スコアは、当技術分野で知られている様々な方法を使用して、例えば、各スコアに所定の数値重みが乗算された加重平均として、組み合わせられてもよい。重みの値は、それぞれのスコアに関連する信頼度を示し得る（例えば、実行時対象を正しく識別する可能性が高いスコアには、比較的高い重みが与えられてもよい）。 In step 336, the robot 44 may select a runtime object from the set of object candidates identified in step 322 according to the location score and/or content score determined for each of the set of object-candidate pairs. In some embodiments, step 336 may calculate an aggregate score for each pair, the aggregate score being a combination of the location score and content score determined for each pair. The scores may be combined using various methods known in the art, for example, as a weighted average where each score is multiplied by a predetermined numerical weight. The weight value may indicate the confidence associated with each score (e.g., a relatively high weight may be given to a score that is more likely to correctly identify the runtime object).

いくつかの実施形態では、集約スコアがＲＰＡ５０で指定された設計サイド対象とアンカーとのペアに最も高い類似性を示すペアの対象候補要素は、実行時対象要素として選択される。次に、ロボット４４は、スクリプト化されたアクティビティに進むことができる（ステップ３２０）、すなわち、現在のアクティビティを、選択された実行時対象に適用することができる。 In some embodiments, the candidate target element of the pair whose aggregate score indicates the highest similarity to the design side target and anchor pair specified in the RPA 50 is selected as the run-time target element. The robot 44 can then proceed to the scripted activity (step 320), i.e., apply the current activity to the selected run-time target.

図１７は、本明細書に記載の方法のいくつかを遂行するようにプログラムされたコンピューティングデバイスの例示的なハードウェア構成を示している。それぞれのコンピューティングデバイスは、図１のＲＰＡクライアント１０ａ～ｅのいずれか、例えば図１８に示すパーソナルコンピュータを表してもよい。携帯電話、タブレットコンピュータ、及びウェアラブルなどの他のコンピューティングデバイスは、わずかに異なる構成を有し得る。プロセッサ２２は、信号及び／又はデータのセットを用いて計算演算及び／又は論理演算を遂行するように構成された物理デバイス（例えば、マイクロプロセッサ、半導体基板上に形成されたマルチコア集積回路）を備える。そのような信号又はデータは、符号化され、例えばマシンコードなどのプロセッサ命令の形態でプロセッサ２２に送達され得る。プロセッサ２２は、中央処理装置（ＣＰＵ）及び／又はグラフィックス・プロセッシング・ユニット（ＧＰＵ）のアレイを含み得る。 Figure 17 illustrates an exemplary hardware configuration of a computing device programmed to perform some of the methods described herein. Each computing device may represent any of the RPA clients 10a-e of Figure 1, such as the personal computer shown in Figure 18. Other computing devices, such as mobile phones, tablet computers, and wearables, may have slightly different configurations. The processor 22 comprises a physical device (e.g., a microprocessor, a multi-core integrated circuit formed on a semiconductor substrate) configured to perform computational and/or logical operations using a set of signals and/or data. Such signals or data may be encoded and delivered to the processor 22 in the form of processor instructions, e.g., machine code. The processor 22 may include an array of central processing units (CPUs) and/or graphics processing units (GPUs).

メモリユニット２４は、演算を実行する過程でプロセッサ２２によってアクセス又は生成されるデータ／信号／命令符号化を記憶する揮発性コンピュータ可読媒体（例えば、ダイナミック・ランダムアクセス・メモリ－ＤＲＡＭ）を備え得る。入力デバイス２６は、コンピュータキーボード、マウス、及びマイクロフォンを含み、とりわけ、ユーザーがデータ及び／又は命令をＲＰＡクライアント１０に導入することを可能にするそれぞれのハードウェアインターフェース及び／又はアダプタを含み得る。出力デバイス２８は、とりわけモニタ及びスピーカなどのディスプレイデバイス、並びにそれぞれのコンピューティングデバイスがユーザーにデータを通信し得るようにするグラフィックカードなどのハードウェアインターフェース／アダプタを含み得る。いくつかの実施形態では、入力デバイス２６及び出力デバイス２８は、共通のハードウェア（例えば、タッチスクリーンである。）を共有する。ストレージデバイス３２は、ソフトウェア命令及び／又はデータの不揮発性記憶、読取り、及び書込みを可能にするコンピュータ可読媒体を含む。例示的なストレージデバイスは、磁気ディスクデバイス、光ディスクデバイス及びフラッシュメモリデバイス、並びにＣＤ及び／又はＤＶＤディスクなどのリムーバブルメディア及びそれらのドライブを含む。ネットワークアダプタ３４により、それぞれのコンピューティングデバイスは、電子通信ネットワーク（例えば、図１のネットワーク１２及び１４）及び／又は他のデバイス／コンピュータシステムに接続することができる。 The memory unit 24 may comprise a volatile computer-readable medium (e.g., Dynamic Random Access Memory - DRAM) that stores data/signal/instruction encodings accessed or generated by the processor 22 in the course of performing operations. The input devices 26 may include a computer keyboard, a mouse, and a microphone, among other hardware interfaces and/or adapters that allow a user to introduce data and/or instructions to the RPA client 10. The output devices 28 may include display devices such as a monitor and speakers, among other hardware interfaces/adapters such as a graphics card that allow the respective computing devices to communicate data to the user. In some embodiments, the input devices 26 and the output devices 28 share common hardware (e.g., a touch screen). The storage devices 32 include computer-readable media that allow non-volatile storage, reading, and writing of software instructions and/or data. Exemplary storage devices include magnetic disk devices, optical disk devices, and flash memory devices, as well as removable media such as CD and/or DVD disks and their drives. A network adapter 34 enables each computing device to connect to electronic communications networks (e.g., networks 12 and 14 in FIG. 1) and/or other devices/computer systems.

コントローラハブ３０は、複数のシステムバス、周辺バス、及び／又はチップセットバス、並びに／あるいはプロセッサ２２とＲＰＡクライアント１０の残りのハードウェアコンポーネントとの間の通信を可能にする他のすべての回路を一般的に表す。例えば、コントローラハブ３０は、メモリコントローラ、入力／出力（Ｉ／Ｏ）コントローラ、及び割込みコントローラを備えてもよい。ハードウェア製造業者に応じて、いくつかのそのようなコントローラは、単一の集積回路に組み込まれてもよく、及び／又はプロセッサ２２と統合されてもよい。別の例では、コントローラハブ３０は、プロセッサ２２をメモリ２４に接続するノースブリッジ、並びに／又はプロセッサ２２をデバイス２６、２８、３２及び３４に接続するサウスブリッジを備えてもよい。 The controller hub 30 generally represents multiple system buses, peripheral buses, and/or chipset buses, and/or any other circuitry that enables communication between the processor 22 and the remaining hardware components of the RPA client 10. For example, the controller hub 30 may include a memory controller, an input/output (I/O) controller, and an interrupt controller. Depending on the hardware manufacturer, some such controllers may be incorporated into a single integrated circuit and/or may be integrated with the processor 22. In another example, the controller hub 30 may include a northbridge that connects the processor 22 to the memory 24 and/or a southbridge that connects the processor 22 to the devices 26, 28, 32, and 34.

上述の例示的なシステム及び方法は、アクティビティ対象、すなわちロボットソフトウェアによって作用されるユーザーインターフェース要素の自動識別を改善することによって、ＲＰＡ操作を容易にする。典型的なＲＰＡ用途では、対象ユーザーインターフェース（例えば、電子商取引ウェブページ、会計インターフェースなど）は、それぞれのインターフェースとインタラクトするためにロボット設計とは無関係に開発及び維持されるので、対象識別は、実質的な技術的問題を提起する。したがって、対象ＵＩの機能性及び／又は外観は、ＲＰＡ開発者の知識なしに変化することがある。したがって、成功したＲＰＡは、アクティビティ対象を識別するロバストな方法、すなわち対象ユーザーインターフェースの設計の変動に比較的影響されない方法に依存し得る。 The exemplary systems and methods described above facilitate RPA operations by improving the automatic identification of activity objects, i.e., user interface elements that are acted upon by the robot software. In typical RPA applications, object identification poses substantial technical challenges because the target user interfaces (e.g., e-commerce web pages, accounting interfaces, etc.) are developed and maintained independently of the robot design to interact with the respective interfaces. Thus, the functionality and/or appearance of the target UI may change without the knowledge of the RPA developer. Successful RPA may therefore depend on a robust method of identifying activity objects, i.e., a method that is relatively insensitive to variations in the design of the target user interfaces.

ロボットソフトウェア（設計時として一般に知られているオートメーション化の段階）を設計する場合、ＲＰＡ開発者は、対象ＵＩのインスタンスを呼び出し、対象要素、及びそれぞれの対象要素に対して実施されるべきアクティビティを示す。例えば、開発者は、対象ＵＩのボタンを示し、それぞれのボタンをクリックするようにロボットを構成してもよい。別の例では、開発者は、入力フィールドを示し、それぞれの入力フィールドに何らかのテキストをタイプ入力するようにロボットを構成してもよい。更に別の例では、開発者は、ユーザーインターフェースのテキストボックスを示し、それぞれのテキストボックスのコンテンツをグラブするようにロボットを構成してもよい。結果として得られるロボットコードは、対象要素のインジケータと、それぞれのアクティビティのインジケータと、を含み得る。次いで、ロボットコードは、ＲＰＡクライアントに配信され得る。 When designing the robot software (a phase of automation commonly known as design time), the RPA developer invokes an instance of a target UI and indicates target elements and the activity to be performed on each target element. For example, the developer may configure the robot to indicate buttons in the target UI and click on each button. In another example, the developer may configure the robot to indicate input fields and type some text into each input field. In yet another example, the developer may configure the robot to indicate text boxes in a user interface and grab the content of each text box. The resulting robot code may include an indicator of the target elements and an indicator of the respective activities. The robot code may then be delivered to the RPA client.

実行時として一般に知られているオートメーション化の別の段階では、クライアントマシンは、対象ＵＩの別のクライアントサイドインスタンスとインタラクトしようと試み得るそれぞれのロボットを遂行してもよい。しかしながら、クライアントサイドＵＩは、設計サイドＵＩと同一でない場合がある。対象ＵＩがウェブインターフェースを備える場合、特に、それぞれのロボットが複雑なウェブサイトとインタラクトするように設計されている場合、それぞれのユーザーインターフェースは、１日のうちに複数回変化してもよい。それぞれのウェブサイトのウェブ開発者は、例えば、ボタンの位置の変更、メニューの構成の変更、及び／又は様々な要素の配色、フォント、及びサイズの変更など、外観を微調整してもよい。したがって、インターフェースの外観が変化した場合でも、ロボットソフトウェアは、対象要素を首尾よく識別する必要があり得る。 At another stage of automation, commonly known as run time, the client machine may execute the respective robot, which may attempt to interact with another client-side instance of the target UI. However, the client-side UI may not be identical to the design-side UI. If the target UI comprises a web interface, the respective user interface may change multiple times throughout the day, especially if the respective robots are designed to interact with complex websites. The web developers of the respective websites may tweak the appearance, for example, changing the position of buttons, changing the configuration of menus, and/or changing the color scheme, fonts, and size of various elements. Thus, the robot software may need to successfully identify the target element even if the appearance of the interface has changed.

いくつかの従来のＲＰＡシステムは、それぞれのユーザーインターフェースの基礎となるソースコード又はデータ構造（例えば、ウェブページの外観及びコンテンツを指定するＨＴＭＬコードで、指定されたその名前又はＩＤに従って対象要素を識別する。しかしながら、そのようなシステム及び方法は、それぞれの要素の名前が予期せず変化した場合に失敗する可能性がある。特に、かなりの割合のウェブ文書が現在動的に生成されており、ウェブ文書の様々な態様がアルゴリズム的に制御されているため、このような変化は非常に頻繁に発生する可能性がある。 Some conventional RPA systems identify target elements according to their names or IDs, which are specified in the underlying source code or data structures of the respective user interface (e.g., HTML code that specifies the appearance and content of a web page). However, such systems and methods can fail when the names of the respective elements change unexpectedly. Such changes can occur very frequently, especially since a significant proportion of web documents are now dynamically generated and various aspects of web documents are algorithmically controlled.

そのような従来の手法とは対照的に、本発明のいくつかの実施形態は、設計時に表示されるその画像及びテキストに従って、対象要素を更に識別する。設計時の画像及びテキストは、ロボットのコードに保存され、ＲＰＡクライアントに送られる。実行時に、ロボットは、複数の候補対象要素を識別し、要素ＩＤに従って、更にそれぞれの候補要素によって表示された画像及びテキストに従って、複数の候補対象要素のそれぞれを評価し得る。設計時対象要素のＩＤ、画像、及びテキストと少なくとも部分的に一致する候補が、実行時対象として選択され得る。次いで、ロボットは、スクリプト化されたアクティビティを、選択された実行時対象要素に適用し得る。 In contrast to such conventional approaches, some embodiments of the present invention further identify target elements according to their images and text displayed at design time. The design-time images and text are stored in the robot's code and sent to the RPA client. At run-time, the robot may identify multiple candidate target elements and evaluate each of the multiple candidate target elements according to the element ID and further according to the image and text displayed by each candidate element. Candidates that at least partially match the ID, image, and text of the design-time target element may be selected as run-time targets. The robot may then apply scripted activities to the selected run-time target elements.

いくつかの実施形態は、計算リソースを節約し、したがって実行時のＲＰＡ効率及びユーザー体験を改善するために、最適化戦略を使用してもよい。第１のフェーズでは、ロボットは、要素ＩＤに従って実行時対象を識別しようと試みてもよく、そのような識別が失敗した場合（例えば、要素の名前がＵＩのソースコード内で変化したことにより）、テキスト一致及び／又は画像一致をフォールバック位置として使用し得る。候補ＵＩ要素は、それらが設計時対象要素の要素ＩＤと部分的に一致するように選択され得る。要素ＩＤと部分的に一致していれば、ロボットは、候補の関連するサブグループ内の対象要素（例えば、設計サイド対象要素と同じＵＩの領域に属する候補）を検索することができる。 Some embodiments may use optimization strategies to save computational resources and thus improve runtime RPA efficiency and user experience. In the first phase, the robot may attempt to identify the runtime target according to the element ID, and if such identification fails (e.g., because the name of the element has changed in the source code of the UI), it may use text and/or image matches as fallback locations. Candidate UI elements may be selected such that they partially match the element ID of the design-time target element. If there is a partial match with the element ID, the robot may search for target elements in a related subgroup of candidates (e.g., candidates that belong to the same region of the UI as the design-side target element).

方法の堅牢性を更に改善するために、いくつかの実施形態は、対象インターフェースの別のＵＩ要素の特性データ（例えば、要素ＩＤ、画像及びテキストデータ）を採用し、他の要素は、対象要素と同時表示され、対象要素のアンカーと見なされる。実行時において、いくつかの実施形態は、複数の候補アンカー要素を識別し、要素ＩＤ、画像、及び／又はデータに従って、各候補を設計時アンカーと一致させるように試みてもよい。アンカー要素データを対象要素データと組み合わせて使用することは、対象とアンカーの両方が設計時と実行時との間で変更された可能性が低いという仮定に依存し、これにより、対象は、そのアンカーを特徴付けるデータに基づいて首尾よく識別され得る。 To further improve the robustness of the method, some embodiments employ characteristic data (e.g., element ID, image, and text data) of another UI element of the target interface, which is co-displayed with the target element and considered as an anchor of the target element. At run-time, some embodiments may identify multiple candidate anchor elements and attempt to match each candidate with a design-time anchor according to element ID, image, and/or data. Using anchor element data in combination with target element data relies on the assumption that both the target and the anchor are unlikely to have changed between design time and run time, such that a target may be successfully identified based on data characterizing its anchor.

上記の実施形態が本発明の範囲から逸脱することなく多くの方法で変更され得ることは、当業者には明らかであろう。したがって、本発明の範囲は、以下の特許請求の範囲、及びそれらの法的均等物によって決定されるべきである。

It will be apparent to those skilled in the art that the above embodiments can be modified in many ways without departing from the scope of the present invention. Therefore, the scope of the present invention should be determined by the following claims and their legal equivalents.

Claims

1. A method comprising employing at least one hardware processor of a computer system, the at least one hardware processor of the computer system comprising:
In response to receiving a Robotic Process Automation (RPA) script having a set of target functions that are characteristic of a target element of a target UI and a set of anchor functions that are characteristic of an anchor element of the target UI, automatically identifying a runtime instance of the target element within a runtime user interface (UI) exposed by the computer system;
2. A method for automatically performing operations determined according to the RPA script that replicate results of an interaction between a human operator and the runtime instance of the target element, comprising:
The set of target features is
A target ID indicating a position of the target element within a tree representation of the target UI;
a target image comprising an image of the target element within the target UI;
a target text comprising a sequence of characters to be displayed by the target element in the target UI;
The set of anchor functions
an anchor ID indicating a location of the anchor element within the tree representation of the target UI;
an anchor image comprising an image of the anchor element within the target UI;
and anchor text comprising a series of characters displayed by the anchor element in the target UI;
The method includes identifying the runtime instance of the target element according to the target ID, target image, target text, anchor ID, anchor image, and anchor text.

automatically identifying the runtime instance of the target element,
For each candidate of a plurality of candidate UI elements of the runtime UI, determining whether an element ID of the each candidate, indicating a position of the each candidate within a tree representation of the runtime UI, closely matches the target ID;
in response, designating each of the candidates as the runtime instance of the target element if the element ID of the candidate exactly matches the target ID;
2. The method of claim 1, further comprising: if none of the plurality of candidate UI elements has an element ID that closely matches the target ID, identifying the runtime instance of the target element further according to the target image and target text.

selecting a candidate object from the plurality of candidate UI elements according to whether an element ID of the candidate object partially matches the object ID;
selecting a candidate anchor from the plurality of candidate UI elements according to whether an element ID of the candidate anchor partially matches the anchor ID;
3. The method of claim 2, further comprising: in response to selection of the candidate object and candidate anchor, determining whether to designate the candidate object as the runtime instance of the target element according to a result of comparing the object text with text displayed by the candidate object and further according to a result of comparing the anchor text with text displayed by the candidate anchor.

The method of claim 3, further comprising: in response to the selection of the candidate object and the candidate anchor, determining whether to designate the candidate object as the runtime instance of the target element further according to a result of comparing the object image with an on-screen image of the candidate object and further according to a result of comparing the anchor image with an on-screen image of the candidate anchor.

The method of claim 3, further comprising, in response to the selection of the candidate object and the candidate anchor, determining whether to designate the candidate object as the runtime instance of the target element further according to a relative on-screen position of the candidate object with respect to the candidate anchor.

The method of claim 5, wherein determining the relative on-screen position includes determining the angle of a line connecting the center of the candidate object with the center of the candidate anchor.

The method of claim 5, wherein determining the relative on-screen position includes determining an overlap between the candidate object and the candidate anchor.

The method of claim 1, wherein the interaction comprises an item selected from the group consisting of: performing a mouse click on the runtime instance of the target element; pressing a particular combination of keyboard keys; writing a sequence of characters into the runtime instance of the target element; grabbing an on-screen image of the runtime instance of the target element; and grabbing text displayed by the runtime instance of the target element.

1. A computer system comprising at least one hardware processor configured to execute an application to be automated and a robotic process automation (RPA) robot, comprising:
the automation target application is configured to expose a run-time user interface (UI);
The RPA robot,
In response to receiving an RPA script comprising a set of target functions that are characteristic of a target element of a target UI and a set of anchor functions that are characteristic of an anchor element of the target UI, automatically identifying a runtime instance of the target element within the runtime UI;
1. A computer system configured to automatically perform operations determined according to the RPA script, the operations replicating results of an interaction between a human operator and the runtime instance of the target element, comprising:
The set of target features is
A target ID indicating a position of the target element within a tree representation of the target UI;
a target image comprising an image of the target element within the target UI;
a target text comprising a sequence of characters to be displayed by the target element in the target UI;
The set of anchor functions
an anchor ID indicating a location of the anchor element within the tree representation of the target UI;
an anchor image comprising an image of the anchor element within the target UI;
and anchor text comprising a series of characters displayed by the anchor element in the target UI;
The computer system, wherein automatically identifying the runtime instance of the target element includes identifying the runtime instance of the target element according to the target ID, target image, target text, anchor ID, anchor image, and anchor text.

automatically identifying the runtime instance of the target element,
For each candidate of a plurality of candidate UI elements of the runtime UI, determining whether an element ID of the each candidate closely matches the target ID indicating a position of the each candidate within a tree representation of the runtime UI;
in response, designating each of the candidates as the runtime instance of the target element if the element ID of the candidate exactly matches the target ID;
10. The computer system of claim 9, further comprising: if none of the plurality of candidate UI elements has an element ID that closely matches the target ID, identifying the runtime instance of the target element further according to the target image and target text.

The RPA robot,
selecting a candidate object from the plurality of candidate UI elements according to whether an element ID of the candidate object partially matches the object ID;
selecting a candidate anchor from the plurality of candidate UI elements according to whether an element ID of the candidate anchor partially matches the anchor ID;
11. The computer system of claim 10, further configured to: in response to selection of the candidate object and candidate anchor, determine whether to designate the candidate object as the runtime instance of the target element according to a result of comparing the object text with text displayed by the candidate object and further according to a result of comparing the anchor text with text displayed by the candidate anchor.

The computer system of claim 11, further configured to perform the following in response to the selection of the candidate target and the candidate anchor: determining whether to designate the candidate target as the runtime instance of the target element further according to a result of comparing the target image with an on-screen image of the candidate target and further according to a result of comparing the anchor image with an on-screen image of the candidate anchor.

The computer system of claim 11, wherein the RPA robot is further configured to determine whether to designate the candidate object as the runtime instance of the target element in response to the selection of the candidate object and the candidate anchor, further depending on a relative on-screen position of the candidate object with respect to the candidate anchor.

The computer system of claim 13, wherein determining the relative on-screen position includes determining an angle of a line connecting a center of the candidate object with a center of the candidate anchor.

The computer system of claim 13, wherein determining the relative on-screen position includes determining an overlap between the candidate object and the candidate anchor.

The computer system of claim 9, wherein the interaction comprises an item selected from the group consisting of: performing a mouse click on the runtime instance of the target element; pressing a particular combination of keyboard keys; writing a sequence of characters into the runtime instance of the target element; grabbing an on-screen image of the runtime instance of the target element; and grabbing text displayed by the runtime instance of the target element.

A non-transitory computer-readable medium storing instructions that, when executed by at least one hardware processor of a computer system configured to expose a runtime user interface (UI), provide the computer system with:
in response to receiving a robotic process automation (RPA) script comprising a set of target functions that are characteristic of a target element of a target UI and a set of anchor functions that are characteristic of an anchor element of the target UI, automatically identifying a runtime instance of the target element within the runtime UI;
1. A computer system for automatically executing operations determined according to the RPA script, the operations replicating a result of an interaction between a human operator and the runtime instance of the target element, the computer system comprising:
The set of target features is
A target ID indicating a position of the target element within a tree representation of the target UI;
a target image comprising an image of the target element within the target UI;
a target text comprising a sequence of characters to be displayed by the target element in the target UI;
The set of anchor functions
an anchor ID indicating a location of the anchor element within the tree representation of the target UI;
an anchor image comprising an image of the anchor element within the target UI;
and anchor text comprising a series of characters displayed by the anchor element in the target UI;
A non-transitory computer-readable medium, wherein automatically identifying the runtime instance of the target element includes identifying the runtime instance of the target element according to the target ID, target image, target text, anchor ID, anchor image, and anchor text.