[問題] frame-pointer 與 performance

看板CompilerDev作者shane87123 (陽光大肥宅)時間4年前 (2021/11/08 22:11)推噓1(1推 0噓 13→)

留言14則, 2人參與討論串1/1

大家好，最近發現 llvm ir 有一個 attribute 叫做 frame-pointer，它會影響 performance 目前 O3 預設是 none 的，而如果是用 clang -emit-llvm -Xclang -disable-O0-optnone 這樣的方式取得沒有優化過的 llvm ir，則會是 all 據我說知，他會消除 frame pointer 的儲存（如果是 none 的話）， "理論上"會讓程式的 performance 好一點，畢竟會減少 register 的使用經過測試，確實如果同是使用 O3 sequence，frame-pointer=none performance確實比較好但是！！我用我自己的優化順序， frame-pointer=none 得到的 runtime = 8 sec 左右 frame-pointer=all 得到的 runtime = 3.8 sec 差非常多！然後我把他們轉成 Assembly code，確實不太一樣，但 none 程式碼比較短，而且減少很多存取卻讓 performance 更差勁可以明白指令的多寡與 performance 無關，但據我說知，frame-pointer 不去儲存與使用，應該會更快吧？甚至我自己有些 IR 從 all 改 none 會更好唯獨某幾個 IR code 會更差。我測試的 source code 的是 insertion sort https://imgur.com/nqexaZb

https://imgur.com/hKigtrh

https://imgur.com/FR3qETS

https://imgur.com/5119ek8

https://imgur.com/g3RHx98

這些是 Assembly code 的差異，感覺與 insertion sort 本身的邏輯無關補上 perf 之後的結果： frame-pointer=all Performance counter stats for './190_all' (10 runs): 142666 cache-misses # 0.020 % of all cache refs ( +- 5.01% ) 698701320 cache-references ( +- 0.71% ) 234781 branch-misses ( +- 0.44% ) 13059296783 cycles ( +- 0.16% ) 59991967735 instructions # 4.59 insn per cycle ( +- 0.05% ) 3.417880975 seconds time elapsed ( +- 0.26% ) frame-pointer=none Performance counter stats for './190_none' (10 runs): 352932 cache-misses # 0.046 % of all cache refs ( +- 2.58% ) 770977710 cache-references ( +- 0.81% ) 260282 branch-misses ( +- 0.33% ) 30052057516 cycles ( +- 0.05% ) 60037013675 instructions # 2.00 insn per cycle ( +- 0.05% ) 7.921856465 seconds time elapsed ( +- 0.05% ) 看起來branch-misses 高大概10% Insn per cycle 直接慢一半.. -- ※ 發信站: 批踢踢實業坊(ptt.cc), 來自: 114.43.59.118 (臺灣) ※ 文章網址: https://www.ptt.cc/bbs/CompilerDev/M.1636380668.A.B00.html

推

sonicyang

11/09 00:50, 4年前 , 1^F

11/09 00:50, 1^F

抱歉，我秒數弄錯了

→

Lipraxde

11/09 01:55, 4年前 , 2^F

11/09 01:55, 2^F

→

Lipraxde

11/09 01:55, 4年前 , 3^F

11/09 01:55, 3^F

→

Lipraxde

11/09 01:55, 4年前 , 4^F

11/09 01:55, 4^F

→

Lipraxde

11/09 01:55, 4年前 , 5^F

11/09 01:55, 5^F

→

Lipraxde

11/09 01:55, 4年前 , 6^F

11/09 01:55, 6^F

→

Lipraxde

11/09 02:04, 4年前 , 7^F

11/09 02:04, 7^F

→

Lipraxde

11/09 02:04, 4年前 , 8^F

11/09 02:04, 8^F

→

Lipraxde

11/09 02:04, 4年前 , 9^F

11/09 02:04, 9^F

剛剛使用 Linux 的工具 perf 分析兩者差異，在 cache misses, cache reference 上沒有差異，但在 instrcutions per cycle 上有著顯著的差異： frame-pointer=all 的有 4.56 instruction num per cycles, frame-pointer=none 的則只有 1.99 instruction num per cycles. ※ 編輯: shane87123 (114.43.59.118 臺灣), 11/09/2021 02:17:11 ※ 編輯: shane87123 (114.43.59.118 臺灣), 11/09/2021 02:32:19

→

Lipraxde

11/09 09:19, 4年前 , 10^F

11/09 09:19, 10^F

補上了！謝謝大大 ※ 編輯: shane87123 (101.12.89.21 臺灣), 11/09/2021 13:28:27 ※ 編輯: shane87123 (101.12.89.21 臺灣), 11/09/2021 13:28:51

→

Lipraxde

11/09 16:50, 4年前 , 11^F

11/09 16:50, 11^F

→

Lipraxde

11/09 16:50, 4年前 , 12^F