RU2145115C1

RU2145115C1 - Group method for verification of computer codes with respect to respective original versions

Info

Publication number: RU2145115C1
Application number: RU98114579A
Authority: RU
Inventors: С.Г. Попов; В.В. Терещенко; Д.Е. Ян
Original assignee: Закрытое акционерное общество "Аби Программное обеспечение"
Priority date: 1998-08-10
Filing date: 1998-08-10
Publication date: 2000-01-27
Also published as: AU5310999A; WO2000008587A2; WO2000008587A3

Abstract

FIELD: computer engineering. SUBSTANCE: method involves converting source character information of original document into set of respective computer codes in found and selected document fields and matching computer codes to original. Goal of invention is achieved by detection of character computer codes in order to verify recognition of each character using results of recognition of said character by prior-art methods, comparison of said results and dictionary verification result. Verification is achieved using several graphical images, which are simultaneously displayed by display unit. EFFECT: increased speed and precision of verification. 2 cl

Description

Изобретение относится к области электроники и может быть использовано, например, в качестве группового способа верификации компьютерных кодов с соответствующими им оригиналами. The invention relates to the field of electronics and can be used, for example, as a group method of verification of computer codes with their corresponding originals.

Известен способ верификации компьютерных кодов с соответствующими им оригиналами, включающий преобразование исходной символьной информации оригинала документа в совокупность адекватных ей компьютерных кодов в найденных и отобранных полях документа и сличение оператором соответствия компьютерных кодов с оригиналом. A known method of verifying computer codes with their corresponding originals, including converting the original symbolic information of the original document into a set of computer codes adequate to it in the found and selected fields of the document and comparing the operator of matching computer codes with the original.

Известен также способ верификации компьютерных кодов с соответствующими им оригиналами, включающий преобразование исходной символьной информации оригинала документа в совокупность адекватных ей компьютерных кодов в найденных и отобранных полях документа и приведение в соответствие компьютерных кодов с оригиналом, - прототип. There is also a method of verifying computer codes with their corresponding originals, including converting the original symbolic information of the original document into a set of computer codes adequate to it in the found and selected fields of the document and matching the computer codes with the original, a prototype.

Недостатком известных способов являются относительно низкие их функциональные и технические характеристики, в том числе низкие значения достигаемых скорости верификации ее усредненной точности. A disadvantage of the known methods is their relatively low functional and technical characteristics, including low values of the achieved verification speed of its average accuracy.

Решаемой изобретением задачей является совершенствование способов верификации компьютерных кодов с соответствующими им оригиналами с достижением технического результата в виде повышения скорости верификации и ее усредненной точности. Скорость верификации определяется как количество верифицируемых символов в единицу времени. The problem solved by the invention is to improve the verification methods of computer codes with their corresponding originals with the achievement of a technical result in the form of an increase in the verification speed and its average accuracy. Verification speed is defined as the number of verified characters per unit time.

Для удобства и однозначного понимания целесообразно привести расшифровки и определения используемых далее обозначений, символов и/или терминов. For convenience and unambiguous understanding, it is advisable to give decipherments and definitions of the symbols, symbols and / or terms used below.

Исходное графическое изображение на материальном носителе - подлежащее вводу в компьютер изображение с целью последующей компьютерной обработки или хранения в машиночитаемом виде. The original graphic image on a tangible medium is the image to be entered into a computer for the purpose of subsequent computer processing or storage in a machine-readable form.

Графическое изображение, введенное в компьютер, - компьютерное представление некоторого фрагмента графической информации. A graphic image entered into a computer is a computer representation of a piece of graphic information.

Компьютерный код символа - компьютерное представление некоторого фрагмента символьной информации. A computer symbol code is a computer representation of a piece of symbol information.

Компьютерные коды символов получают в процессе компьютерного распознавания графического изображения, введенного в компьютер, например, с помощью сканера, или его фрагментов. Computer character codes are obtained in the process of computer recognition of a graphic image entered into a computer, for example, using a scanner, or fragments thereof.

Процесс верификации - производимое человеком и/или заменяющим его устройством, и/или компьютерной программой сличение (определение адекватности) компьютерных кодов символов с графическим изображением, введенным в компьютер. The verification process is a comparison (determination of adequacy) of computer codes of symbols with a graphic image entered into a computer by a person and / or a replacement device and / or computer program.

Процесс распознавания - процесс обработки системой распознавания введенного в компьютер графического изображения некоторого символа, в результате чего система распознавания приписывает изображению компьютерный код этого символа. Recognition process - the process by which the recognition system processes a graphic image of a symbol entered into a computer, as a result of which the recognition system ascribes the image a computer code to that symbol.

Точность процесса распознавания - усредненный процент правильно распознанных символов по статистически представительному практически релевантному множеству текстов. The accuracy of the recognition process is the average percentage of correctly recognized characters over a statistically representative practically relevant set of texts.

Правильно распознанные символы - символы, компьютерный код которых правильно определен системой распознавания. Correctly recognized characters are characters whose computer code is correctly determined by the recognition system.

Неправильно распознанные символы - символы, компьютерный код которых неправильно определен системой распознавания. Incorrectly recognized characters are characters whose computer code is incorrectly detected by the recognition system.

Выделенные символы - символы, выделенные в процессе фильтрации для последующей верификации. В идеале выделенные символы должны включать все неправильно распознанные символы. Highlighted characters - characters selected during the filtering process for subsequent verification. Ideally, the selected characters should include all incorrectly recognized characters.

Цена ошибки - параметр, адекватный величине убытка, причиненного попаданием неправильно распознанного символа в окончательный результат распознавания. The price of an error is a parameter adequate to the amount of loss caused by the hit of an incorrectly recognized character in the final recognition result.

Обозначения:
N_исх - общее число символов в документе,
N_выд - число символов, выделенное алгоритмом фильтрации,
N_невыд - число символов, не выделенное алгоритмом фильтрации,
N_пр - число правильно распознанных символов,
N_непр - общее число неправильно распознанных символов,
N_выд.пр - число выделенных правильно распознанных символов,
N_{выд.непр.} - число выделенных неправильно распознанных символов,
N_{невыд.пр.} - число невыделенных правильно распознанных символов,
N_невыд _.непр - число невыделенных неправильно распознанных символов,
верхний индекс C (как в N^C) обозначает число символов, которые получили в процессе распознавания компьютерный код C,
A - точность распознавания данного документа,
A=N_пр/N_исх,
A_ср - усредненная точность распознавания данного документа:

где N - общее число документов в выборке, a i - номер документа в выборке,

где С - числовое значение компьютерного кода распознанного символа (порядковый номер), выбираемое из всего множества допустимых значений без исключений,
N_гр ^C - количество сгруппированных для верификации одинаковых компьютерных кодов,
N_экр - количество графических изображений выводимых на экран одновременно (из общего количества N_гр ^C).Designations:
N _ref is the total number of characters in the document,
N _vyd - the number of characters allocated by the filtering algorithm,
N _nonexistent is the number of characters not allocated by the filtering algorithm,
N _CR - the number of correctly recognized characters,
N _un - the total number of incorrectly recognized characters,
N _vyd.pr - the number of selected correctly recognized characters,
N _ext. - the number of highlighted incorrectly recognized characters,
N _nonexp. - the number of unselected correctly recognized characters,
_Nevyd _.nepr N - number of incorrectly recognized characters unselected,
superscript C (as in N ^C ) denotes the number of characters that computer code C received in the recognition process,
A - recognition accuracy of this document,
A = N _pr / N _ref ,
A _cf - average accuracy of recognition of this document:

where N is the total number of documents in the sample, ai is the number of the document in the sample,

where C is the numerical value of the computer code of the recognized character (serial number), selected from the entire set of valid values without exception,
N _gr ^C - the number of identical computer codes grouped for verification,
N _scr - the number of graphic images displayed on the screen at the same time (the total number N _c ^C).

В качестве кратких сведений, раскрывающих сущность изобретения, следует отметить, что достигаемый технический результат обеспечивают с помощью предложенного группового способа АБИ (ABBYY) верификации компьютерных кодов с соответствующими им оригиналами, включающего преобразование исходной символьной информации оригинала документа в совокупность адекватных ей компьютерных кодов в найденных и отобранных полях документа и приведение в соответствие компьютерных кодов с оригиналом. Отличительные особенности заявленного способа заключаются в том, что в процессе фильтрации выделяют компьютерные коды символов, определяя достоверность распознавания каждого символа на основе результатов распознавания изображения этого символа различными известными способами, сравнения этих результатов между собой и с результатом словарного контроля и выбирая их из исходной последовательности компьютерных кодов символов общим числом N_исх, в количестве N_выд = F - aN_невыд. _непр, где а - экспериментальный коэффициент, выбираемый в зависимости от цены ошибки и усредненной точности системы распознавания в пределах: 10^-12≤ α ≤ 10¹⁵, a F - экспериментальный параметр, выбираемый в зависимости от точности системы распознавания и числа подлежащих распознаванию символов в документе в пределах: 1 ≤F≤10¹⁶.As brief information revealing the essence of the invention, it should be noted that the achieved technical result is ensured by the proposed group method ABI (ABBYY) verification of computer codes with their corresponding originals, including the conversion of the original symbolic information of the original document into a set of computer codes adequate to it in the found and selected document fields and alignment of computer codes with the original. Distinctive features of the claimed method are that in the filtering process, computer character codes are distinguished, determining the recognition accuracy of each character based on the recognition results of the image of this character by various known methods, comparing these results with each other and with the result of vocabulary control, and selecting them from the original computer sequence character codes with a total number of N _ref , in the amount of N _vyd = F - aN _non-nd. _npn , where a is the experimental coefficient selected depending on the error price and the average accuracy of the recognition system in the range: 10 ^-12 ≤ α ≤ 10 ¹⁵ , and F is the experimental parameter selected on the basis of the recognition system accuracy and the number of characters to be recognized in document within: 1 ≤F≤10 ¹⁶ .

Затем после фильтрации группируют одинаково распознанные компьютерные коды общим числом N_выд таким образом, что в каждую группу включают N_выд ^C одинаковых компьютерных кодов символов, где C - числовое значение верифицируемого компьютерного кода, выбираемое из всего множества допустимых значений, выбирая значение N_выд ^C в пределах: 1≤(N_выд ^C + N_выд)/N_выд ≤2. При этом в каждую группу N_выд ^C включают правильно распознанные компьютерные коды символов в количестве N_выд _пр ^C и неправильно распознанные компьютерные коды символов в количестве N_выд _непр ^C, а соотношение между N_выд _пр ^C и N_выд _непр ^C выбирают в пределах: -0.5≤(N_выд _пр ^C + N_выд _непр ^C - bN_выд ^C)/N_выд _пр ^C ≤1.5, где b- экспериментальный коэффициент, выбираемый в зависимости от четкости и контрастности исходного графического изображения в пределах: 10^-9≤b≤1. Для верификации выбирают количество N_гр сгруппированных одинаковых компьютерных кодов в пределах N_гр= βγN $\binom{С}{выд}$ , где γ -экспериментальный коэффициент, в зависимости от количества группируемых выделенных компьютерных кодов и включаемых дополнительно эталонных и/или вспомогательных и/или информационных кодов выбираемый в пределах 10^-5≤ α ≤ 10⁶, β - экспериментальный вероятностный коэффициент уверенности в достоверности распознавания, выбираемый на основании статистической обработки и результатов оценки качества исходных графических изображений на материальном носителе, в пределах 0,01 ≤ β ≤ 1.
Группы одинаково распознанных компьютерных кодов выводят для их верификации специализированным устройством или оператором, например, в случайном порядке либо в порядке убывания весовой W^C значимости группы компьютерных кодов, которую определяют экспериментально на основе статистической обработки больших массивов информации в зависимости от алфавитного порядка и/или размера группы компьютерных кодов, и/или степени важности данного компьютерного кода для содержания документа и др., исходя из практической значимости достоверности верификации компьютерных кодов, и выбирают в пределах: 10^-8≤ W^C/N_выд ^C≤10¹⁶. Производят верификацию, сличая, например, показанное на устройстве отображения визуальной информации изображение, введенное в компьютер, с изображением компьютерного кода символа, для чего одновременно в устройство отображения визуальной информации вводят N_экр разных графических изображений, предоставляя при этом на верификацию одного изображения промежуток Т_вер времени, который по отношению к N_экр выбирают в экспериментально найденных пределах: -20 ≤ log₂(αT_верN_экр) ≤ 37, где α - экспериментальный коэффициент, выбираемый в зависимости от кинетических характеристик устройства ввода символьной информации в компьютер в пределах 0.2c^-1≤ α ≤ 10c^-1.
При изложении сведений, подтверждающих возможность осуществления изобретения, целесообразно более детально описать предложенный групповой способ АБИ (ABBYY) верификации компьютерных кодов с соответствующими им оригиналами. При описании способа нецелесообразно детально останавливаться на известных из опубликованных данных особенностях выполнения его операций, в частности, преобразование исходной символьной информации оригинала документа в совокупность адекватных ей компьютерных кодов в найденных и отобранных полях документа и приведение в соответствие компьютерных кодов с оригиналом.Then, after filtering, identically recognized computer codes are grouped with a total number of N _output so that each group includes N _output ^{C of the} same computer character codes, where C is the numerical value of the verified computer code, selected from the entire set of valid values, choosing the value of N _output ^C in range: 1≤ (N _vyd _vyd ^C + N) / N _vyd ≤2. In this case, each group of N _vyd ^C include correctly recognized by the computer codes of characters in an amount of N _vyd _straight ^C and incorrectly recognized computer character codes in a quantity N _vyd _indirect ^C, and the ratio N _vyd _straight ^C and N _vyd _indirect ^C is selected between: - 0.5≤ (N _vyd _straight ^C + N _vyd _indirect ^C - bN _vyd ^C) / N _vyd _straight ^C ≤1.5, where b- experimental factor chosen depending on starting of graphic clarity and contrast of the image within the 10 ^-9 ≤b≤ 1. For verification, select the number N _{g of} grouped identical computer codes within N _gr = βγN $\binom{FROM}{out}$ , where γ is an experimental coefficient, depending on the number of selected dedicated computer codes and additionally included reference and / or auxiliary and / or information codes, selected within 10 ^-5 ≤ α ≤ 10 ⁶ , β is the experimental probabilistic coefficient of confidence in recognition reliability, selected on the basis of statistical processing and the results of assessing the quality of the original graphic images on a tangible medium, within 0.01 ≤ β ≤ 1.
Groups of identically recognized computer codes are derived for verification by a specialized device or operator, for example, in random order or in decreasing order of weight W ^{C of the} significance of a group of computer codes, which is determined experimentally based on statistical processing of large amounts of information depending on the alphabetical order and / or size groups of computer codes, and / or the degree of importance of a given computer code for the content of a document, etc., based on the practical significance of the reliability of ver identification of computer codes, and choose between: 10 ^-8 ≤ W ^C / N _output ^C ≤10 ¹⁶ . Verification is performed by comparing, for example, the image entered on the computer displayed on the visual information display device with the image of the computer symbol code, for which N _{screens of} different graphic images are simultaneously input into the visual information display device, while providing a verification interval of _Ver time, which with respect to N _{ecr is} chosen within the experimentally found limits: -20 ≤ log ₂ (αT _ver N _ecr ) ≤ 37, where α is the experimental coefficient selected depending and from the kinetic characteristics of a device for inputting symbolic information into a computer within 0.2c ^-1 ≤ α ≤ 10c ^-1 .
When presenting information confirming the possibility of carrying out the invention, it is advisable to describe in more detail the proposed group method of ABI (ABBYY) verification of computer codes with their corresponding originals. When describing the method, it is impractical to dwell in detail on the specifics of performing its operations known from published data, in particular, converting the original symbolic information of the original document into a set of computer codes adequate to it in the found and selected fields of the document and matching the computer codes with the original.

Детально целесообразно остановиться только на отличительных существенных особенностях осуществления операций предложенного способа, заключающихся в том, что в процессе фильтрации выделяют компьютерные коды символов, определяя достоверность распознавания каждого символа на основе результатов распознавания изображения этого символа различными известными способами, сравнения этих результатов между собой и с результатом словарного контроля и выбирая их из исходной последовательности компьютерных кодов символов общим числом N_исх, в количестве N_выд= F-αN_{невыд.непр},, где a - экспериментальный коэффициент, выбираемый в зависимости от цены ошибки и усредненной точности системы распознавания в пределах: 10¹² ≤ α ≤ 10¹⁵, a F - экспериментальный параметр, выбираемый в зависимости от точности системы распознавания и числа подлежащих распознаванию символов в документе в пределах: 1 ≤ F ≤10¹⁶. Обычно а выбирают в диапазоне 1 - 10⁵, a F - в диапазоне 10 ≤ F ≤ 10⁶.In detail, it is advisable to dwell only on the distinctive essential features of the operations of the proposed method, namely, that during the filtering process computer codes of symbols are distinguished, determining the recognition accuracy of each symbol based on the recognition results of the image of this symbol by various known methods, comparing these results with each other and with the result dictionary control and selecting them from the original sequence of computer character codes with a total number N _ref , in the number of N _output = F-αN is not _detected , where a is the experimental coefficient selected depending on the price of the error and the average accuracy of the recognition system in the range: 10 ¹² ≤ α ≤ 10 ¹⁵ , a F is the experimental parameter selected depending on the accuracy of the recognition system and the number of characters to be recognized in the document within: 1 ≤ F ≤10 ¹⁶ . Typically, a is selected in the range of 1 - 10 ⁵ , a F - in the range of 10 ≤ F ≤ 10 ⁶ .

В некоторых случаях, в частности, словарный контроль существенно повышает достоверность распознавания отдельных символов, так при этом даже полная невозможность распознания некоторых символов позволяет определить их значение исходя из смыслового содержания слова и месторасположения нераспознанных символов в слове. Если в результате выделения в соответствии с приведенными аналитическими соотношениями необходимых количеств компьютерных кодов получают дробные, отрицательные значения и какие-либо другие значения, некорректные исходя из условий возможности их дальнейшего использования, то их исключают из рассмотрения и/или автоматически удаляют. In some cases, in particular, vocabulary control significantly increases the accuracy of recognition of individual characters, while even the complete impossibility of recognizing certain characters allows us to determine their meaning based on the semantic content of the word and the location of unrecognized characters in the word. If, as a result of the allocation, in accordance with the given analytical ratios, of the necessary quantities of computer codes, fractional, negative values and any other values that are incorrect based on the conditions for their possible further use are obtained, they are excluded from consideration and / or automatically deleted.

Затем группируют после фильтрации одинаково распознанные компьютерные коды общим числом N_выд таким образом, что в каждую группу включают N_выд ^C одинаковых компьютерных кодов символов, где C - числовое значение верифицируемого компьютерного кода из всего множества допустимых значений, выбирая значение N_выд ^C в пределах: 1 ≤(N_выд ^C + N_выд)/N_выд ≤ 2. Определение числового значения C может быть произвольным или в результате, например, последовательно выбора из множества его допустимых значений. При этом в каждую группу N_выд ^C включают правильно распознанные компьютерные коды символов в количестве N_выд _пр ^C и неправильно распознанные компьютерные коды символов в количестве N_выд _непр ^C, а соотношение между N_выд _пр ^C и N_выд _непр ^C выбирают в пределах: -0.5≤ (N_выд _пр ^C + N_выд _непр ^C - bN_выд ^C)/N_выд _пр ^C≤1.5, где b - экспериментальный коэффициент, выбираемый в зависимости от четкости и контрастности исходного графического изображения в пределах: 10^-9≤b≤1. Для верификации выбирают количество N_гр сгруппированных одинаковых компьютерных кодов в пределах: N_гр= βγN $\binom{C}{выд}$ , где γ - экспериментальный коэффициент, в зависимости от количества группируемых выделенных компьютерных кодов и включаемых дополнительно эталонных и/или вспомогательных и/или информационных кодов выбираемый в пределах 10^-5≤ γ ≤ 10⁶, β - экспериментальный вероятностный коэффициент уверенности в достоверности распознавания, выбираемый на основании статистической обработки и результатов оценки качества исходных графических изображений на материальном носителе в пределах 0,01 ≤ β ≤ 1. Качество исходных графических изображений определяется, в частности, тем, что предъявляют для распознавания, например, изготовленное на ксерокопировальном аппарате изображение, факсограмму, машинописный или рукописный текст.Then, after filtering, identically recognized computer codes are grouped with a total number of N _output so that each group includes N _output ^{C of the} same computer character codes, where C is the numerical value of the verified computer code from the entire set of valid values, choosing the value of N _output ^C within: 1 ≤ (N _output ^C + N _output ) / N _output ≤ 2. The determination of the numerical value of C can be arbitrary or as a result of, for example, sequentially choosing from the set of its valid values. In this case, each group of N _vyd ^C include correctly recognized by the computer codes of characters in an amount of N _vyd _straight ^C and incorrectly recognized computer character codes in a quantity N _vyd _indirect ^C, and the ratio N _vyd _straight ^C and N _vyd _indirect ^C is selected between: - 0.5≤ (N _vyd _straight ^C + N _vyd _indirect ^C - bN _vyd ^C) / N _vyd _straight ^C ≤1.5, where b - experimental factor chosen depending on the clarity and contrast of the original picture in the range 10 ^-9 ≤b≤ 1. For verification, select the number of N _gr grouped by the same computer codes within: N _gr = βγN $\binom{C}{out}$ , where γ is the experimental coefficient, depending on the number of allocated computer codes to be grouped and additional reference and / or auxiliary and / or information codes included, chosen within 10 ^-5 ≤ γ ≤ 10 ⁶ , β is the experimental probabilistic coefficient of confidence in the recognition accuracy, selected on the basis of statistical processing and the results of assessing the quality of the original graphic images on a tangible medium within 0.01 ≤ β ≤ 1. The quality of the original graphic images is determined In particular, they present for recognition, for example, an image made on a photocopy machine, a facsimogram, typewritten or handwritten text.

Группы одинаково распознанных компьютерных кодов выводят для их верификации специализированным устройством или оператором, например, в случайном порядке либо в порядке убывания весовой W^C значимости группы компьютерных кодов, которую определяют экспериментально на основе статистической обработки больших массивов информации в зависимости от алфавитного порядка и/или размера группы компьютерных кодов, и/или степени важности данного компьютерного кода для содержания документа и др., исходя из практической значимости достоверности верификации компьютерных кодов, и выбирают в пределах: 10^-8≤W^C/N_выд ^C ≤10¹⁶. Производят верификацию, сличая, например, показанное на устройстве отображения визуальной информации изображение, введенное в компьютер, с изображением компьютерного кода символа, для чего одновременно в устройство отображения визуальной информации вводят N_экр разных графических изображений, предоставляя при этом на верификацию одного изображения промежуток Т_вер времени, который по отношению к N_экр выбирают в экспериментально найденных пределах: -20 ≤ log₂(αT_верN_экр)≤ 37, где α - экспериментальный коэффициент, выбираемый в зависимости от кинетических характеристик устройства ввода символьной информации в компьютер в пределах 0.2c^-1≤ α ≤ 10^-1. Как следует из соотношения, размерность коэффициента α равна величине, обратной секунде.Groups of identically recognized computer codes are derived for verification by a specialized device or operator, for example, in random order or in decreasing order of weight W ^{C of the} significance of a group of computer codes, which is determined experimentally based on statistical processing of large amounts of information depending on the alphabetical order and / or size groups of computer codes, and / or the degree of importance of a given computer code for the content of a document, etc., based on the practical significance of the reliability of ver identification of computer codes, and choose between: 10 ^-8 ≤W ^C / N _output ^C ≤10 ¹⁶ . Verification is performed by comparing, for example, the image entered on the computer displayed on the visual information display device with the image of the computer symbol code, for which N _{screens of} different graphic images are simultaneously input into the visual information display device, while providing a verification interval of _Ver time, which in relation to the N _scr selected experimentally found within: -20 ≤ log ₂ (αT _ver _scr N) ≤ 37, wherein α - experimental factor chosen depending on the kinetic characteristics of the device the character information input in computer within 0.2c ^-1 ≤ α ≤ 10 ^-1. As follows from the relation, the dimension of the coefficient α is equal to the reciprocal of the second.

Достигаемый технический результат, как показали данные экспериментов, может быть реализован только взаимосвязанной совокупностью всех существенных признаков заявленного объекта, отраженных в формуле изобретения. Указанные в ней отличия дают основание сделать вывод о новизне данного технического решения, а совокупность испрашиваемых притязаний в связи с их неочевидностью - о его изобретательском уровне, что доказывается также вышеприведенным их детальным описанием. Соответствие критерию "промышленная применимость" предложенного способа доказывается как его реализацией, так и отсутствием в заявленных притязаниях каких-либо практически трудно реализуемых в промышленных масштабах признаков. Нижние и верхние значения заявленных пределов были получены на основе статистической обработки результатов экспериментальных исследований, анализа и обобщения их и известных из опубликованных источников данных, а также с использованием изобретательской интуиции, исходя из условия достижения указанного технического результата. The achieved technical result, as shown by the data of experiments, can be realized only by an interconnected set of all the essential features of the claimed object, reflected in the claims. The differences indicated in it give reason to conclude that the technical solution is new, and the totality of the claimed claims in connection with their non-obviousness is about its inventive step, which is also proved by their detailed description above. Compliance with the criterion of "industrial applicability" of the proposed method is proved both by its implementation and by the absence in the claimed claims of any features that are practically not practicable on an industrial scale. The lower and upper values of the declared limits were obtained on the basis of statistical processing of the results of experimental studies, analysis and generalization of them and known from published data sources, as well as using inventive intuition, based on the conditions for achieving the specified technical result.

Кроме указанного выше технического результата практическое осуществление заявленного объекта позволяет существенно расширить возможности его использования применительно, например, к различным документам, заполняемым рукописными символами. In addition to the above technical result, the practical implementation of the claimed object allows you to significantly expand the possibilities of its use in relation, for example, to various documents filled with handwritten characters.

Claims

1. A group method for verifying computer codes with their originals, including converting the original symbolic information of the original document into a set of computer codes adequate to it in the found and selected fields of the document and matching the computer codes with the original, characterized in that the computer codes are distinguished by defining recognition accuracy of each symbol based on the results of image recognition of this symbol by known methods, comparing these results s with each other and with the result of vocabulary control, and selecting from the original sequence of computer codes a total number N _ref, in an amount _vyd N = F - aN _nevyd _.nepr where a - experimental factor chosen depending on the price of accuracy errors and averaged system recognition within 10 ^-12 ≤ a ≤ 10 ¹⁵ , F is an experimental parameter selected depending on the accuracy of the recognition system and the number of characters to be recognized in the document within 1 ≤ F ≤ 10 ¹⁶ , N is not _valid. _{uncomprehended} is the number of undetected incorrectly recognized characters, groups of identically recognized computer codes are grouped with a total number of N _output so that each group includes N _output ^C identical computer codes, where C is the numerical value of the verified computer code from the entire set of valid values, choosing the value of N _output ^C in the range 1 ≤ ^(C + N _vyd _vyd N) / N ≤ _vyd 2, wherein in each group of N _vyd ^C include computer correctly recognized character codes in an amount of ^C and N _vyd.pr incorrectly recognized computer codes in quantitative e N _vyd.nepr ^C, and the ratio N between ^C and N _vyd.pr _vyd.nepr ^C is selected in the following ranges: 0.5 ≤ ^(C + N _vyd.pr N _vyd _indirect ^C - bN _{_vyd.)} / N _{vyd. pr} ^C ≤ 1.5, where b is the experimental coefficient selected depending on the clarity and contrast of the original graphic image within 10 ^-9 ≤ b ≤ 1, choosing the number N _gr grouped for verification of the same computer codes within N _gr = βγN

\binom{C}{out}

, where γ is the experimental coefficient, depending on the number of grouped selected computer codes and additionally included reference, and / or auxiliary, and / or information codes, selected within 10 ^-5 ≤ γ ≤ 10 ⁶ , β is the experimental probabilistic coefficient of confidence in recognition confidence, selected on the basis of statistical processing and the results of assessing the quality of the original graphic images on a tangible medium, within 0.01 ≤ β ≤ 1, of a group of identically recognized computer codes Displayed for their verification specialized device or operator, e.g., in random order, producing verifying, collating, for example, shown on the display of visual information image input to the computer, with the image of the computer character code by simultaneously on the display device a visual information introduced N _{screen of} different graphic images, while providing for verification of one image the interval T _ver time, which in relation to N _{screen is} chosen experimentally on found limits: -20 ≤ log ₂ (αT _ver N _eqr ) ≤ 37, where α is the experimental coefficient selected depending on the kinetic characteristics of the device for inputting symbolic information into the computer within 0.2c ^-1 ≤ α ≤ 10c ^-1 .
2. The method according to claim 1, characterized in that the groups of identically recognized computer codes are output for verification by a specialized device or operator in descending order of the weight value W ^{C of the} group of computer codes, which is determined experimentally based on the statistical processing of large amounts of information depending on the alphabetical the order and / or size of the group of computer codes, and / or the importance of this computer code for the content of the document, based on the practical significance of the reliability of the ver katsii computer codes, and is selected in the range of 10 ^-8 ≤ W ^C / N _vyd ^C ≤ ^{10, 16th.}