pyspark的一個問題

看板Python作者left (881 forever)時間7年前 (2016/11/17 11:57)推噓1(1推 0噓 0→)

留言1則, 1人參與討論串1/1

不曉得版上有沒有人在玩pyspark的大大目前在看線上文件遇到一個問題，網址如下： http://spark.apache.org/docs/latest/programming-guide.html#understanding-closures-a-nameclosureslinka 裡面有一個程式碼範例以及說明如下 Consider the naive RDD element sum below, which may behave differently depending on whether execution is happening within the same JVM. A common example of this is when running Spark in local mode (--master = local[n]) versus deploying a Spark application to a cluster (e.g. via spark-submit to YARN): 他的意思是，如果在local mode下跑就可以改變counter的值，然後在cluster上跑就無法改變counter的值？我在local mode下跑下面這段程式碼counter的值完全都不會改變啊是我會錯意？還是需要在設定什麼啊？ counter = 0 rdd = sc.parallelize(data) # Wrong: Don't do this!! def increment_counter(x): global counter counter += x rdd.foreach(increment_counter) print("Counter value: ", counter) -- ※ 發信站: 批踢踢實業坊(ptt.cc), 來自: 61.220.35.20 ※ 文章網址: https://www.ptt.cc/bbs/Python/M.1479355074.A.489.html

推

Agamidae

11/19 18:06, , 1^F

11/19 18:06, 1^F

‣ 返回看板[ Python ] 程設

‣ 更多 left 的文章

文章代碼(AID): #1OBIh2I9 (Python)