[問題] 使用pytesseract 做ocr

看板Python作者PHONm (USA~USA)時間9年前 (2016/08/08 17:32)推噓2(2推 0噓 0→)

留言2則, 2人參與討論串1/1

我想要做字元辨識，但是字元圖像有些破裂，有些字元會變成亂碼，所以就用OpenCV先進行一些前處理，然後存成新檔後再進行一次OCR，但是會有UnicodeDecodeError，可是程式碼都沒有用到中文啊@@! 不曉得是否是OpenCV轉檔那邊出問題， =====================Result===================== <PIL.PngImagePlugin.PngImageFile image mode=RGBA size=397x112 at 0x3C0DF28> 24-D 1813f-ml 1-1 154?Dbb <PIL.BmpImagePlugin.BmpImageFile image mode=RGB size=397x112 at 0x1131080> Traceback (most recent call last): File "C:/Users/cash.chien/PycharmProjects/OCR/OCRv1.1.py", line 19, in <module> str2 = image_to_string(img2) File "C:\Anaconda3\lib\site-packages\pytesseract\pytesseract.py", line 167, in image_to_string return f.read().strip() UnicodeDecodeError: 'cp950' codec can't decode byte 0xe2 in position 11: illegal multibyte sequence ========================以下為原始碼======================= from pytesseract import image_to_string from PIL import Image import time import cv2 import numpy as np img = Image.open('12.bmp') print(img) str = image_to_string(img) print(str) img1 = cv2.imread('12.bmp',1) kernel = np.ones((3,3)) opening = cv2.morphologyEx(img1, cv2.MORPH_OPEN, kernel) cv2.imwrite('opening.bmp',opening) img2 = Image.open('opening.bmp') print(img2) str2 = image_to_string(img2) print(str2) 感謝! -- ※ 發信站: 批踢踢實業坊(ptt.cc), 來自: 59.124.131.189 ※ 文章網址: https://www.ptt.cc/bbs/Python/M.1470648770.A.C41.html

推

Sunal

08/09 02:16, , 1^F

08/09 02:16, 1^F

推

goldflower

08/11 00:22, , 2^F

08/11 00:22, 2^F

‣ 返回看板[ Python ] 程設

‣ 更多 PHONm 的文章

文章代碼(AID): #1Ng572n1 (Python)