[問題] 關於正則匹配已刪文

看板Python作者eryu (*+Red)時間2年前 (2021/07/05 23:57)推噓1(1推 0噓 0→)

留言1則, 1人參與討論串1/1

def tokenlize(content): content=re.sub("<.*?>"," ", content) filters= ['/t','/n','/x97','/x96','$','#','&','$','"','\"','\''] content= re.sub("|".join(filters)," ",content) tokens= [i.strip() for i in content.split()] return tokens 想跟大家請教一下關於分詞的問題, 假如把讀出的文字檔簡單分詞的話如果檔案中出現多次的單/雙引號我的filters該怎麼寫才能把他們替換掉啊？ str=' It is good 'to' hear about "you".' -- ※ 發信站: 批踢踢實業坊(ptt.cc), 來自: 36.224.181.213 (臺灣) ※ 文章網址: https://www.ptt.cc/bbs/Python/M.1625500626.A.474.html

推

joseph0911

07/06 00:31, 2年前 , 1^F

07/06 00:31, 1^F

‣ 返回看板[ Python ] 程設

‣ 更多 eryu 的文章

文章代碼(AID): #1WuolIHq (Python)