OCR API を利用した RPA の PHP プログラムサンプル

前回では Google の Cloud Vision と Microsoft の Computer Vision これら２つのサービスを利用した画像解析プログラムの実装例を紹介しました。
その続きとして、OCR API を利用して RPA【Robotic Process Automation】のサンプルプログラムを紹介します。

なお、こちらの記事が前回の内容となります。

参考： Google と Microsoft の OCR API の PHP プログラムサンプル

機能設計
例えば見積書や請求書などの同じフォーマットの画像を API 呼び出しやキャッシュで読み込みながら、画像の違いや OCR サービスの違いでテキストの読み取り具合を比較できるような機能を作ります。

・あらかじめ定められた座標（発行日、請求先、見積金額）にあるテキストを取得する
・画像をマウスでドラッグすることにより設定座標を変更できる
・変更された座標はセッションに保持し、画面遷移があっても引き継ぐ
・API 呼び出しやキャッシュ読み込みは前回のサンプル同様の機能を持つ

画面設計
こんな感じの画面になります。
（説明文のための吹き出しや赤字で囲んだ部分を除く）

画面左側は解析する画像、下部にデバッグ情報が表示されます。
右側黄色上部は設定された座標を読み取った結果が表示されます。
設定座標は３つで、読み取ったテキストと座標（左上、右下）が表示されます。
画像の読み取り範囲をマウスドラッグで指定することにより、設定座標を変更できます。
あらかじめ変更したい項目をラジオボタンで選択してから画像内の座標をマウスで指定します。
設定座標はセションに保持されるため、異なる画像を読み込んでも同じ座標が読み取られます。
その下の水色の部分はマウスで選択された部分の切り抜き画像です。
その下に前回のサンプルと同じ API 呼び出しとキャッシュ読み込みの機能があります。
あらかじめ見積書のサンプルがキャッシュされており、発行日、請求先、見積金額の座標が設定されています。

呼び出せる API は Google Cloud Vision API と Microsoft Computer Vision API になります。
Microsoft Computer Vision は、OCR API と Read API の２種類があります。
OCR API は少量のテキスト向きで、Read API はより正答率の高い解析が可能な半面、応答に少し時間がかかります。

動きが確認できるデモ画面を作りました。
API 呼び出しは動きませんが、その他の機能は動作します。

デモ画面を開く

クラス設計
指定された座標の範囲にあるテキストを返すメソッドを RPA 用に実装します。
前回作成した各サービスクラスが実装するインターフェースにこのメソッドを追加定義します。

<?php
/**
 * OCRサービスインターフェース
 */

interface OCRService{

// ～ 中略 ～

	/**
	 * 指定された座標範囲の単語を抽出する。
	 *
	 * @param int $sx 左上のX座標
	 * @param int $sy 左上のY座標
	 * @param int $mx 右下のX座標
	 * @param int $my 右下のY座標
	 * @return array 指定された座標範囲にある単語を文字列連結したテキスト ["text"=>"単語1単語2・・・", "debug"=>"デバッグ文字列"]
	 */
	public function getPointedWords(int $sx, int $sy, int $mx, int $my): array;}
?>

getPointedWords()
左上と右下の座標を指定すると、その範囲内にある単語を読み取って返します。
読み取った単語が複数ある場合は単純に文字列連結して返します。
サンプルということでデバッグ文もついでに返します。

サービスのテンプレートを作成
このメソッドも扱うデータ型を統一しておけば各サービスで共通のロジックが組めそうです。
まずはデータ型を定義します。

<?php
/**
 * 単語座標エンティティ
 */

class WordCoordinates{
	public $sx; // 左上X座標
	public $sy; // 左上Y座標
	public $mx; // 右下X座標
	public $my; // 右下Y座標
	public $text; // 単語

	public function __construct(array $coords, string $text=null){
		$this->sx = $coords['sx'];
		$this->sy = $coords['sy'];
		$this->mx = $coords['mx'];
		$this->my = $coords['my'];
		$this->text = $text;
	}

	public function toString(): string{
		$str = $this->sx.','.$this->sy.'<br>'.$this->mx.','.$this->my;
		return $str;
	}
}
?>

左上、右下の座標とその中にあるテキストを保持するデータクラス（単語座標クラス）を作成しました。
各サービスは読み込んだデータをこのクラスオブジェクトのリストに変換する部分を実装してもらいます。
テンプレートではこのリストを利用して指定範囲の単語を取得するロジックを実装します。

<?php
/**
 * OCRサービステンプレート
 */

abstract class BaseService implements OCRService{

// ～ 中略 ～

	protected $aWordCoords = array(); // 単語座標データのリスト [[WordCoordinates1, WordCoordinates2, ・・・]]

// ～ 中略 ～

	/**
	 * 画像解析結果より単語座標を取得する。
	 *
	 * @param string $result 画像解析結果（レスポンス結果）
	 * @return array|null 単語座標のリスト [[WordCoordinates1, WordCoordinates2, ・・・]]
	 */
	abstract protected function getWordsWithCoords(string $result): ?array;

// ～ 中略 ～

	// @Implements - OCRService
	final public function callAPI(string $imgPath): void{
		$this->sResult = $this->api($imgPath);
		$this->parse();
		$this->aWordCoords = $this->getWordsWithCoords($this->sResult); // 追記
	}

	// @Implements - OCRService
	final public function setResult(string $result): void{
		$this->sResult = $result;
		$this->parse();
		$this->aWordCoords = $this->getWordsWithCoords($this->sResult); // 追記
	}

	// @Implements - OCRService
	final public function getPointedWords(int $sx, int $sy, int $mx, int $my): array{
		$text = '';
		$debugStr = 'pointed: '.$sx.','.$sy.' '.$mx.','.$my.'<br>';
		foreach($this->aWordCoords as $wc){
			if(($sx <= $wc->mx)&&($sy <= $wc->my)&&($mx >= $wc->sx)&&($my >= $wc->sy)){
				$text .= $wc->text;
				$debugStr .= 'words: '.$wc->sx.','.$wc->sy.' '.$wc->mx.','.$wc->my.' '.$wc->text.'<br>';
			}
		}
		$res = array('text'=>$text, 'debug'=>$debugStr);
		return $res;
	}

	final public function getCoords(): array{
		return $this->aWordCoords;
	}
}
?>

getWordsWithCoords()
画像の解析結果を単語座標クラスの配列に変換して返します。
ここでは定義のみで、中身の実装は各サービスのクラスが行います。

callAPI()
setResult()
レスポンス結果をgetWordsWithCoords()に渡して単語座標リストを取得しインスタンス変数に保持するよう追記します。

getPointedWords()
単語座標リストを利用して指定された座標範囲にある単語を取得します。
複数の単語がヒットする場合は文字列連結します。
デバッグ情報としてヒットした単語の文字列と座標を採っておきます。

getCoords()
getWordsWithCoords()より取得した単語座標リストを返します。
こちらはデバッグ用です。

サービスクラスの実装（Google Cloud Vision API 用）
Google Cloud Vision API 用のサービスクラスです。

<?php
/**
 * Google Cloud Vision APIサービス
 */

class GoogleService extends BaseService{

// ～ 中略 ～

	// @Override - BaseService
	protected function getWordsWithCoords(string $result): ?array{
		$data = json_decode($result, true);
		$blocks = $data['responses'][0]['fullTextAnnotation']['pages'][0]['blocks'];
		if(!$blocks) return null;

		$wordcoords = array();
		foreach($blocks as $block){
			if(!array_key_exists('paragraphs', $block)) continue;
			$paragraphs = $block['paragraphs'];
			foreach($paragraphs as $paragraph){
				if(!array_key_exists('words', $paragraph)) continue;
				$words = $paragraph['words'];
				foreach($words as $word){
					$coords = $this->_getCoordinates($word);
					if(!array_key_exists('symbols', $word)) continue;
					$symbols = $word['symbols'];
					$text = '';
					foreach($symbols as $symbol){
						$scds = $this->_getCoordinates($symbol);
						if(($scds['sx'] < $coords['sx'])||($scds['sy'] < $coords['sy'])
							||($scds['mx'] > $coords['mx'])||($scds['my'] > $coords['my'])){
							//print '範囲外？'.$coords['sx'].' '.$coords['sy'].' '.$coords['mx'].' '.$coords['my']."<br>\n";
						}
						$text .= $symbol['text'];
					}
					$wordcoords[] = new WordCoordinates($coords, $text);
				}
			}
		}
		return $wordcoords;
	}

	/**
	 * boundingBoxの四隅の座標を取得する。
	 *
	 * @param array $items wordsやsymbolsなどの要素
	 * @return array boundingBoxの四隅の座標 ["sx"=>左上X, "sy"=>左上Y, "mx"=>右下X, "my"=>右下Y];
	 */
	private function _getCoordinates($items): array{
		$sx=10000;$sy=10000;$mx=0;$my=0;
		if(\PR\InKey($items,array('boundingBox', 'vertices'))){
			$vertices = $items['boundingBox']['vertices'];
			foreach($vertices as $vertice){
				if($vertice['x'] < $sx) $sx = $vertice['x'];
				if($vertice['x'] > $mx) $mx = $vertice['x'];
				if($vertice['y'] < $sy) $sy = $vertice['y'];
				if($vertice['y'] > $my) $my = $vertice['y'];
			}
		}
		$coords = array('sx'=>$sx, 'sy'=>$sy, 'mx'=>$mx, 'my'=>$my);
		return $coords;
	}
}
?>

getWordsWithCoords()
画像の解析結果を単語座標クラスの配列に変換して返します。
テンプレートの実装メソッドです。

サービスクラスの実装（Microsoft Computer Vision OCR API 用）
Microsoft Computer Vision の OCR API 用のサービスクラスです。

<?php
/**
 * Microsoft Computer Vision OCR APIサービス
 */

class MSOCRService extends BaseService{

// ～ 中略 ～

	// @Override - BaseService
	protected function getWordsWithCoords(string $result): ?array{
		$data = json_decode($result, true);
		$regions = $data['regions'];
		if(!$regions) return null;

		$wordcoords = array();
		foreach($regions as $region){
			if(!array_key_exists('lines', $region)) continue;
			$lines = $region['lines'];
			foreach($lines as $line){
				if(!array_key_exists('words', $line)) continue;
				$words = $line['words'];
				foreach($words as $word){
					$coords = $this->_getCoordinates($word);
					$wordcoords[] = new WordCoordinates($coords, $word['text']);
				}
			}
		}
		return $wordcoords;
	}

	/**
	 * boundingBoxの四隅の座標を取得する。
	 *
	 * @param array $items wordsなどの要素
	 * @return array boundingBoxの四隅の座標 ["sx"=>左上X, "sy"=>左上Y, "mx"=>右下X, "my"=>右下Y];
	 */
	private function _getCoordinates($items): array{
		$sx=10000;$sy=10000;$mx=0;$my=0;
		if(array_key_exists('boundingBox', $items)){
			$line = $items['boundingBox'];
			$vertices = explode(',', $items['boundingBox']);
			$sx = $vertices[0];
			$mx = $sx + $vertices[2];
			$sy = $vertices[1];
			$my = $sy + $vertices[3];
		}
		$coords = array('sx'=>$sx, 'sy'=>$sy, 'mx'=>$mx, 'my'=>$my);
		return $coords;
	}
}
?>

getWordsWithCoords()
画像の解析結果を単語座標クラスの配列に変換して返します。
テンプレートの実装メソッドです。

サービスクラスの実装（Microsoft Computer Vision Read API 用）
Microsoft Computer Vision の Read API 用のサービスクラスです。

<?php
/**
 * Microsoft Computer Vision Read APIサービス
 */

class MSReadService extends BaseService{

// ～ 中略 ～

	// @Override - BaseService
	protected function getWordsWithCoords(string $result): ?array{
		$data = json_decode($result, true);
		$regions = $data['analyzeResult']['readResults'];
		if(!$regions) return null;

		$wordcoords = array();
		foreach($regions as $region){
			if(!array_key_exists('lines', $region)) continue;
			$lines = $region['lines'];
			foreach($lines as $line){
				if(!array_key_exists('words', $line)) continue;
				$words = $line['words'];
				foreach($words as $word){
					$coords = $this->_getCoordinates($word);
					$wordcoords[] = new WordCoordinates($coords, $word['text']);
				}
			}
		}
		return $wordcoords;
	}

	/**
	 * boundingBoxの四隅の座標を取得する。
	 *
	 * @param array $items wordsなどの要素
	 * @return array boundingBoxの四隅の座標 ["sx"=>左上X, "sy"=>左上Y, "mx"=>右下X, "my"=>右下Y];
	 */
	private function _getCoordinates($items): array{
		$sx=10000;$sy=10000;$mx=0;$my=0;
		if(array_key_exists('boundingBox', $items)){
			$line = $items['boundingBox'];
			$vertices = $items['boundingBox'];
			$sx = $vertices[0];
			$sy = $vertices[1];
			$mx = $vertices[4];
			$my = $vertices[5];
		}
		$coords = array('sx'=>$sx, 'sy'=>$sy, 'mx'=>$mx, 'my'=>$my);
		return $coords;
	}
}
?>

getWordsWithCoords()
画像の解析結果を単語座標クラスの配列に変換して返します。
テンプレートの実装メソッドです。

単語パーサ
読み取った単語は正解が多いのですが、座標の精度が思ったより悪かったので、座標と単語のセットを再構築する独自のパーサをかましてみます（結果はあまり変わりませんでしたが…）。
後から色々なアルゴリズムのパーサが作れそうなので、インターフェースにします。

<?php
/**
 * 独自単語パーサ
 */

interface OriginalParser{

	/**
	 * 座標情報から単語座標を再構築する。
	 *
	 * @param array $wordcoords 座標付単語リスト配列 [[WordCoordinates1, WordCoordinates2, ・・・]]
	 * @return array|null 座標付単語リスト配列 [[WordCoordinates1, WordCoordinates2, ・・・]]
	 */
	public function reparse(array $wordcoords): ?array;
}
?>

reparse()
座標付単語リストの配列を受け取り、再構築して同じリストの形で返します。

パーサの組み込みはテンプレートのレイヤでできそうです。
コンストラクタでパーサを受け取っていれば再構築を実行するよう修正します。

<?php
/**
 * OCRサービステンプレート
 */

abstract class BaseService implements OCRService{

// ～ 中略 ～

	private $oParser_ = null; // 単語精査パーサ（OriginalParser）

// ～ 中略 ～

	/**
	 * コンストラクタ。
	 *
	 * @param OriginalParser $parser 単語パーサ。不要であれば設定しない。
	 */
	public function __construct(OriginalParser $parser=null){
		$this->oParser_ = $parser;
	}

	// @Implements - OCRService
	final public function callAPI(string $imgPath): void{
		$this->sResult = $this->api($imgPath);
		$this->parse();
		$this->aWordCoords = $this->getWordsWithCoords($this->sResult);
		if($this->oParser_) $this->aWordCoords = $this->oParser_->reparse($this->aWordCoords); // 追記
	}

	// @Implements - OCRService
	final public function setResult(string $result): void{
		$this->sResult = $result;
		$this->parse();
		$this->aWordCoords = $this->getWordsWithCoords($this->sResult);
		if($this->oParser_) $this->aWordCoords = $this->oParser_->reparse($this->aWordCoords); // 追記
	}

// ～ 後略 ～

}
?>

このパーサは Y 座標より閾値の範囲内にある単語を同じ行とみなし、行を構成します。
そして同じ行にある単語の X 座標をもとに閾値の範囲内にあるものを同じ単語とみなして単語を再構成します。
最終的に単語座標クラスのリストの形に戻してデータを返します。

<?php
/**
 * 行把握単語パーサ
 * 行の把握と同一行内の単語精査
 */

class LinerWordParser implements OriginalParser{

	const ABS_TOLERANCE_X = 10; // X座標差
	const ABS_TOLERANCE_Y = 10; // Y座標差

	private $aSrcWordCoords_ = array(); // 元データ（単語座標データのリスト） [[WordCoordinates1, WordCoordinates2, ・・・]]
	private $aWordCoords_ = array(); // パース後の単語座標データのリスト [[WordCoordinates1, WordCoordinates2, ・・・]]
	private $aLines_ = array(); // 座標付きの単語配列を行でまとめたもの [sy1=>[sx1=>WC, sx2=>WC, ・・・],sy2=>[sx1=>wc, ・・・]]

	// @Implements - OriginalParser
	public function reparse(array $wordcoords): ?array{
		$this->aSrcWordCoords_ = $wordcoords;
		$lines = $this->_lineup($wordcoords);
		$this->aLines_ = $this->_assortWords($lines);

		// 元の形に戻す
		$this->aWordCoords_ = array();
		foreach($this->aLines_ as $words){
			foreach($words as $word){
				$this->aWordCoords_[] = $word;
			}
		}
		return $this->aWordCoords_;
	}

	/**
	 * 単語座標を行でまとめた結果を返す。
	 *
	 * @return array 座標付きの行単語配列を行でまとめたもの [sy1=>[sx1=>WC, sx2=>WC, ・・・],sy2=>[sx1=>wc, ・・・]]
	 */
	public function getLines(){
		return $this->aLines_;
	}

	/**
	 * 座標情報から単語を行ごとにまとめる。
	 *
	 * @param array $vals 座標付単語リスト配列 [[WordCoordinates1, WordCoordinates2, ・・・]]
	 * @return array 座標付きの行単語配列
	 */
	private function _lineup(array $vals): array{
		$lines = array();
		$sy = 0;
		for($i=0; $i<count($vals); $i++){
			$wc = $vals[$i];
			if($wc->chk == 1) continue;
			$sy = $wc->sy;
			$lines[$sy][$wc->sx] = $wc;
			$vals[$i]->chk = 1;
			for($j=$i+1; $j<count($vals); $j++){
				$crrt = $vals[$j];
				if($crrt->chk == 1) continue;
				$abs = abs($wc->sy - $crrt->sy);
				if($abs < self::ABS_TOLERANCE_Y){
					$lines[$sy][$crrt->sx] = $crrt;
					$vals[$j]->chk = 1;
				}
			}
			ksort($lines[$sy]);
		}
		ksort($lines);
		return $lines;
	}

	/**
	 * 座標情報から同じ行の単語をまとめる。
	 *
	 * @param array $lines 座標付きの行単語配列
	 * @return array 座標付きの行単語配列
	 */
	private function _assortWords(array $lines): array{
		$res = array();
		foreach($lines as $y=>$words){
			$res[$y] = array();
			$crrtx = -1;
			$chk = 0;
			$pos = 0;
			foreach($words as $x=>$wc){ // $x = $wc->sx
				if($crrtx == -1){
					$res[$y][$x] = $wc;
					$crrtx = $x;
					$chk = $wc->mx;
					continue;
				}
				$abs = abs($x - $chk);
				if($abs < self::ABS_TOLERANCE_X){
					$res[$y][$crrtx]->text .= $wc->text;
					$chk = $wc->mx;
				}else{
					$res[$y][$x] = $wc;
					$crrtx = $x;
					$chk = $wc->mx;
				}
			}
		}
		return $res;
	}
}
?>

reparse()
座標付単語リストの配列を受け取り、再構築して同じリストの形で返します。
インターフェースの実装メソッドです。

getLines()
パースの過程で作られた行の構成を返します。
特に必要というわけではありませんが、デバッグで使用できます。

サンプルプログラム一式
下記からダウンロードできます。
ダウンロードした zip ファイルをドキュメントルート上に展開し、defines.incの以下の define 値を設定すると動くかと思います。
キャッシュディレクトリのアクセス権に注意してください。
zip パスワードは testocr です。

// API Key & カスタムドメイン
define('OCR_GOOGLE_APIKEY', '');
define('OCR_MICROSOFT_APIKEY', '');
define('OCR_MICROSOFT_CUSTOMDOMAIN', '');

// キャッシュデータを保存するディレクトリ
define('OCR_CACHE_PATH', 'cache');