Re: [問題] 請問有什麼辦法加快這個 for loop 嗎？

看板Python作者f496328mm (123)時間7年前 (2018/02/28 15:47)推噓8(8推 0噓 14→)

留言22則, 8人參與討論串2/2 (看更多)

※ 引述《CaptPlanet (ep)》之銘言： : 有list_a, list_b兩個list : list_a 有大約 70000 個 elements : list_b 大約 3 million 個 elements : 程式大致如下： : res_li = [] : for x in list_b: : try: : res_li.append(list_a.index(x)) : except: : res_li.append("") : 對 list_b 中的每一個 element : 在 list_a 中找到一樣 element 把他的 index 加到新的 list 中 : 隨著 iteration 增加速度變得越來越慢， : 想請教各位為何會有這個現象以及有什麼方法加速這個 for loop 呢？ : 謝謝各位高手雖然這是 Python 版我用 R 來比較一下速度先講結論使用小 data 測試速度, list_a = 7,000筆, list_b = 300,000筆 python 耗時 : 24.7 秒 R 使用平行運算(mclappy) 耗時 : 1.2 秒 R 使用單核運算( sapply ) 耗時 : 2.9 秒 #========================================== data 數量改為與原 po 相同, list_a = 70,000筆, list_b = 3,000,000筆 R 使用平行運算(mclappy) 耗時 : 69 秒以下提供 code #========================================== # Python 版本 import numpy as np import random import time import datetime list_a = random.sample(range(0,10000),7000) list_b = random.sample(range(0,500000),300000) res_li = [] s = datetime.datetime.now() for x in list_b: try: res_li.append( list_a.index( x ) ) except: res_li.append("") t = datetime.datetime.now() - s print(t) # 0:00:24.748111 # 耗時 24s #========================================== # R 版本 library(data.table) library(dplyr) library(parallel) list_a = sample(c(0:10000),7000,replace = FALSE)# 7,000 list_b = sample(c(0:500000),300000,replace = FALSE)# 300,000 # case 1, 這裡使用 R 的多核心運算 res_li = c() s = Sys.time() res_li = mclapply(c(list_b),function(x){ if( x %in% list_a ){ map = which(list_a==x) #res_li = c(res_li,map) }else{ map = '' #res_li = c(res_li,map) } return(map) }, mc.cores=8, mc.preschedule = T) res_li = do.call(c,res_li) t = Sys.time() - s print(t) # Time difference of 1.229357 secs #=============================================== # case 2, 這裡使用一般單核運算 res_li = c() s = Sys.time() res_li = sapply(c(list_b),function(x){ if( x %in% list_a ){ map = which(list_a==x) #res_li = c(res_li,map) }else{ map = '' #res_li = c(res_li,map) } return(map) }) t = Sys.time() - s print(t) # Time difference of 2.913066 secs #=========================================== # 使用多核心, data 數與原 po 相同 list_a = sample(c(0:100000),70000,replace = FALSE)# 70,000 list_b = sample(c(0:5000000),3000000,replace = FALSE)# 3,000,000 res_li = c() s = Sys.time() res_li = mclapply(c(list_b),function(x){ if( x %in% list_a ){ map = which(list_a==x) #res_li = c(res_li,map) }else{ map = '' #res_li = c(res_li,map) } return(map) }, mc.cores=8, mc.preschedule = T) res_li = do.call(c,res_li) t = Sys.time() - s print(t) # Time difference of 1.151484 mins 提供不同的觀點參考參考 -- ※ 發信站: 批踢踢實業坊(ptt.cc), 來自: 36.229.89.102 ※ 文章網址: https://www.ptt.cc/bbs/Python/M.1519804063.A.817.html

→

celestialgod

02/28 16:55, 7年前 , 1^F

02/28 16:55, 1^F

→

celestialgod

02/28 17:00, 7年前 , 2^F

02/28 17:00, 2^F

→

celestialgod

02/28 17:00, 7年前 , 3^F

02/28 17:00, 3^F

只是做個簡單的比較，用比較相近的 R and Python code，感謝大大提供更進階的方法其實 R 速度也不輸給 Python

推

vfgce

02/28 18:00, 7年前 , 4^F

02/28 18:00, 4^F

推

Sunal

02/28 18:43, 7年前 , 5^F

02/28 18:43, 5^F

推

vfgce

02/28 19:23, 7年前 , 6^F

02/28 19:23, 6^F

→

vfgce

02/28 19:24, 7年前 , 7^F

02/28 19:24, 7^F

根據 celestialgod 大跟 vfgce 大的意見進行修正以下是 R and Python code #================================== # python import random import datetime list_a = random.sample(range(0,100000),70000) list_b = random.sample(range(0,5000000),3000000) list_a = { i:list_a[i] for i in range(len(list_a))} res_li = [] s = datetime.datetime.now() for x in list_b: res_li.append( list_a.get(x,'') ) t = datetime.datetime.now() - s print(t) # 0:00:01.056265 #================================== # R install.packages('fastmatch') library(fastmatch) list_a = sample(c(0:100000),70000,replace = FALSE) list_b = sample(c(0:5000000),3000000,replace = FALSE) s = Sys.time() res_li = fmatch(list_b,list_a, nomatch = -1) res_li[res_li==-1]='' t = Sys.time() - s print(t) # Time difference of 0.5497556 secs PS : 單純做個比較，兩個語言各有優缺點，多會一點也不壞 ※ 編輯: f496328mm (36.229.89.102), 02/28/2018 19:48:51

推

Sunal

02/28 22:43, 7年前 , 8^F

02/28 22:43, 8^F

→

Sunal

02/28 22:44, 7年前 , 9^F

02/28 22:44, 9^F

→

Sunal

02/28 22:48, 7年前 , 10^F

02/28 22:48, 10^F