用10000台電腦找中位數
其實這個Mapper/Reducer的問題
給定很多很大的檔, 每個檔各有1TB個數(memory 放不下)
如何用10000個Mapper+Reducer 找所有數的中位數呢?
我自己是想先讓每台若用selection method在Mapper 把每個檔的數分成兩堆
一堆比較大的數 一堆比較小的數, 可能分堆用pivot的個數算第三堆
但在reducer階段要怎麼靠這些訊息找中位數呢
--
※ 發信站: 批踢踢實業坊(ptt.cc)
◆ From: 207.151.93.199
推
05/10 08:57, , 1F
05/10 08:57, 1F
推
05/10 21:05, , 2F
05/10 21:05, 2F
→
05/10 21:05, , 3F
05/10 21:05, 3F
→
05/10 21:05, , 4F
05/10 21:05, 4F
→
05/10 21:05, , 5F
05/10 21:05, 5F
→
05/11 00:02, , 6F
05/11 00:02, 6F
→
05/11 00:03, , 7F
05/11 00:03, 7F
→
05/11 00:06, , 8F
05/11 00:06, 8F
→
05/11 00:06, , 9F
05/11 00:06, 9F
→
05/11 00:06, , 10F
05/11 00:06, 10F
→
05/11 07:04, , 11F
05/11 07:04, 11F
→
05/11 07:20, , 12F
05/11 07:20, 12F
→
05/11 07:20, , 13F
05/11 07:20, 13F
→
05/11 07:21, , 14F
05/11 07:21, 14F
→
05/11 07:21, , 15F
05/11 07:21, 15F
→
05/11 07:21, , 16F
05/11 07:21, 16F
→
05/11 19:21, , 17F
05/11 19:21, 17F
→
05/12 11:09, , 18F
05/12 11:09, 18F
→
05/14 04:34, , 19F
05/14 04:34, 19F
推
05/15 20:01, , 20F
05/15 20:01, 20F
→
05/18 07:30, , 21F
05/18 07:30, 21F
→
05/18 07:30, , 22F
05/18 07:30, 22F
討論串 (同標題文章)
以下文章回應了本文:
完整討論串 (本文為第 1 之 2 篇):