[問題] pandas 條件加總

看板Python作者a5170040 (Andy)時間5年前 (2019/01/11 21:09)推噓1(1推 0噓 5→)

留言6則, 2人參與討論串1/1

我有一組資料如下 Vendor Document Date Clearing Date Invoice_Amount 0 A 09/13/2016 11/04/2016 2,007,324.85 1 A 04/18/2016 07/11/2016 631,714.68 2 A 09/13/2016 09/16/2016 4,000,000.00 3 A 07/11/2017 09/23/2017 5,000,000.00 4 A 05/03/2016 06/17/2016 2,000,000.00 --------------------------------------------------------------- Vendor Document Date Clearing Date Invoice_Amount 1158 H 2017-04-21 2017-06-28 3,000,000.00 1159 H 2017-04-25 2017-05-19 1,000,000.00 1160 H 2017-11-03 2017-12-11 4,500,000.00 1161 H 2018-03-15 2018-05-27 3,500,000.00 1162 H 2018-02-21 2018-05-03 1,500,000.00 想要新增一個欄位，這個欄位的每一列會加總過去6個月內已經付款的數目(相同的Vendor) 每一個row i 1. 要去比較Document Date[i]有沒有大於整個資料的'Clearing Date' 2. 要去篩出在Document Date[i]建立以前的六個月內，有那些樣本 3. 要去篩出Vendor[i]在整個樣本的Vendor有哪些目前寫法如下，是可以正確算出答案的，但實際的資料有10萬多筆，計算時間非常久想請問是否有更快的方法？目前想說用df.apply(lambda...)，但一直寫不出來 import pandas as pd df = pd.read_csv('E:\data.csv') df['Document Date'] = pd.to_datetime(df['Document Date'],format="%m/%d/%Y") df['Clearing Date'] = pd.to_datetime(df['Clearing Date'],format="%m/%d/%Y") df["Sum_Paid"] = "" for i in df.index: Vendor= df.loc[i,"Vendor"] Doc_Date= df.loc[i,"Document Date"] Six_Month = Doc_Date - pd.Timedelta(days=180) df.loc[i,"Sum_Paid"] = df.loc[(df["Vendor"] == Vendor) & (df["Clearing Date"] < Doc_Date) & (df["Document Date"] >= Six_Month),"Invoice_Amount"].count() -- ※ 發信站: 批踢踢實業坊(ptt.cc), 來自: 36.226.91.237 ※ 文章網址: https://www.ptt.cc/bbs/Python/M.1547212189.A.464.html

推

Luluemiko

01/11 22:08, 5年前 , 1^F

01/11 22:08, 1^F

→

Luluemiko

01/11 22:09, 5年前 , 2^F

01/11 22:09, 2^F

→

Luluemiko

01/11 22:10, 5年前 , 3^F

01/11 22:10, 3^F