使用 Pytesseract 进行文本定位、检测和识别
Pytesseract或Python-tesseract是Python的光学字符识别 (OCR) 工具。它将读取和识别图像、车牌等中的文本。Python-tesseract 实际上是Google 的 Tesseract-OCR Engine的包装类或包。它也很有用,被视为 tesseract 的独立调用脚本,因为它可以轻松读取Pillow和Leptonica成像库支持的所有图像类型,主要包括 -
- jpg
- PNG
- gif
- bmp
- tiff 等
此外,如果将其用作脚本,Python-tesseract 还将打印识别的文本,而不是将其写入文件。可以使用 pip 安装 Python-tesseract,如下所示 -
pip install pytesseract
如果您使用的是 Anaconda Cloud,可以安装 Python-tesseract,如下所示:-
conda install -c conda-forge/label/cf202003 pytesseract
或者
conda install -c conda-forge pytesseract
注意:在运行以下脚本之前,应该在系统中安装 tesseract。
下面是实现。
Python3
from pytesseract import*
import argparse
import cv2
# We construct the argument parser
# and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-i", "--image",
required=True,
help="path to input image to be OCR'd")
ap.add_argument("-c", "--min-conf",
type=int, default=0,
help="minimum confidence value to filter weak text detection")
args = vars(ap.parse_args())
# We load the input image and then convert
# it to RGB from BGR. We then use Tesseract
# to localize each area of text in the input
# image
images = cv2.imread(args["image"])
rgb = cv2.cvtColor(images, cv2.COLOR_BGR2RGB)
results = pytesseract.image_to_data(rgb, output_type=Output.DICT)
# Then loop over each of the individual text
# localizations
for i in range(0, len(results["text"])):
# We can then extract the bounding box coordinates
# of the text region from the current result
x = results["left"][i]
y = results["top"][i]
w = results["width"][i]
h = results["height"][i]
# We will also extract the OCR text itself along
# with the confidence of the text localization
text = results["text"][i]
conf = int(results["conf"][i])
# filter out weak confidence text localizations
if conf > args["min_conf"]:
# We will display the confidence and text to
# our terminal
print("Confidence: {}".format(conf))
print("Text: {}".format(text))
print("")
# We then strip out non-ASCII text so we can
# draw the text on the image We will be using
# OpenCV, then draw a bounding box around the
# text along with the text itself
text = "".join(text).strip()
cv2.rectangle(images,
(x, y),
(x + w, y + h),
(0, 0, 255), 2)
cv2.putText(images,
text,
(x, y - 10),
cv2.FONT_HERSHEY_SIMPLEX,
1.2, (0, 255, 255), 3)
# After all, we will show the output image
cv2.imshow("Image", images)
cv2.waitKey(0)
输出:
执行以下命令查看输出
python ocr.py --image ocr.png
除了输出,我们将看到置信度和命令提示符中的文本,如下所示 -
Confidence: 93
Text: I
Confidence: 93
Text: LOVE
Confidence: 91
Text: TESSERACT