[問題] 使用pytesseract 做ocr

看板Python作者 (USA~USA)時間9年前 (2016/08/08 17:32), 編輯推噓2(200)
留言2則, 2人參與, 最新討論串1/1
我想要做字元辨識,但是字元圖像有些破裂,有些字元會變成亂碼, 所以就用OpenCV先進行一些前處理,然後存成新檔後再進行一次OCR, 但是會有UnicodeDecodeError,可是程式碼都沒有用到中文啊@@! 不曉得是否是OpenCV轉檔那邊出問題, =====================Result===================== <PIL.PngImagePlugin.PngImageFile image mode=RGBA size=397x112 at 0x3C0DF28> 24-D 1813f-ml 1-1 154?Dbb <PIL.BmpImagePlugin.BmpImageFile image mode=RGB size=397x112 at 0x1131080> Traceback (most recent call last): File "C:/Users/cash.chien/PycharmProjects/OCR/OCRv1.1.py", line 19, in <module> str2 = image_to_string(img2) File "C:\Anaconda3\lib\site-packages\pytesseract\pytesseract.py", line 167, in image_to_string return f.read().strip() UnicodeDecodeError: 'cp950' codec can't decode byte 0xe2 in position 11: illegal multibyte sequence ========================以下為原始碼======================= from pytesseract import image_to_string from PIL import Image import time import cv2 import numpy as np img = Image.open('12.bmp') print(img) str = image_to_string(img) print(str) img1 = cv2.imread('12.bmp',1) kernel = np.ones((3,3)) opening = cv2.morphologyEx(img1, cv2.MORPH_OPEN, kernel) cv2.imwrite('opening.bmp',opening) img2 = Image.open('opening.bmp') print(img2) str2 = image_to_string(img2) print(str2) 感謝! -- ※ 發信站: 批踢踢實業坊(ptt.cc), 來自: 59.124.131.189 ※ 文章網址: https://www.ptt.cc/bbs/Python/M.1470648770.A.C41.html

08/09 02:16, , 1F
應該是cmd的輸出問題 改成utf8試試
08/09 02:16, 1F

08/11 00:22, , 2F
先轉str的編碼 不過你直接把str命名掉不太好吧XD
08/11 00:22, 2F
文章代碼(AID): #1Ng572n1 (Python)