[北美] big data/data engineer referral消失
過去六年從事web開發相關的工作
PHP javascript為主
目前自學Spark/Scala 以及 hadoop kafka 半年左右
想申請關於 big data / data engineer相關的職位
預計在big data能做個兩到三年後
未來想從事 跟機器學習 或 data science的工作
因為我無大數據相關經驗 找起來真的很沮喪
若有前輩能幫我referral 或 給些建議
我人在德州 有身分
謝謝
--
※ 發信站: 批踢踢實業坊(ptt.cc), 來自: 72.182.109.27
※ 文章網址: https://www.ptt.cc/bbs/Oversea_Job/M.1517876051.A.F64.html
推
02/06 09:07, , 1F
02/06 09:07, 1F
→
02/06 09:07, , 2F
02/06 09:07, 2F
→
02/06 09:08, , 3F
02/06 09:08, 3F
推
02/06 09:11, , 4F
02/06 09:11, 4F
推
02/06 10:12, , 5F
02/06 10:12, 5F
→
02/06 10:13, , 6F
02/06 10:13, 6F
→
02/06 10:13, , 7F
02/06 10:13, 7F
→
02/06 10:14, , 8F
02/06 10:14, 8F
推
02/06 10:52, , 9F
02/06 10:52, 9F
推
02/06 10:54, , 10F
02/06 10:54, 10F
→
02/06 10:55, , 11F
02/06 10:55, 11F
→
02/06 11:08, , 12F
02/06 11:08, 12F
→
02/06 11:10, , 13F
02/06 11:10, 13F
推
02/06 11:45, , 14F
02/06 11:45, 14F
Hello~ 就我的認知~
1) 將data pipeline 管線裡的big dataset, 能夠透過distributed computing的方式,
比如使用spark or hadoop, 有效率地的運算 資料的分析 (from row data to structured
data via mapreduce), then stored into better format, such as parquet,
and, distributed database, such as Hbase, Cassandra,
若存在hdfs, 可將data schema存在hive matastore,
可能hive table creating時, 建立特定columns partitions,
每個partitions會存入buckets (folders),增加查詢效率.
整體來說 上面所提及的, 主要目地是,
以利日後
SQL, BI tools, datawarehouse的查詢.
查詢可透過 Hive, or Impala, or SparkSQL, etc.
2) 處理及時串流的large dataset, such as clickstream, user/product rating,
fraud alert, etc.
由於來自不同的producer/consumer, downstream可以是dashboard UI,
datawarehouse,
這可能需 integrate Kafka with database or Spark, 主要確保message不會流失,
以及透過Kafka streaming內部以lambda functions mapreduce先處理資料.
3) 許多big data/data engineer工作 都需 涉及 機器學習, such as product
recommendation, 以現有的features and labels to build models for
training/testing/cross-validating data, and prediction with different
machine learning algorithm.
請前輩在指點 您認為大數據在做什麼?
推
02/06 11:53, , 15F
02/06 11:53, 15F
→
02/06 11:53, , 16F
02/06 11:53, 16F
→
02/06 11:53, , 17F
02/06 11:53, 17F
→
02/06 11:53, , 18F
02/06 11:53, 18F
→
02/06 11:53, , 19F
02/06 11:53, 19F
推
02/06 14:29, , 20F
02/06 14:29, 20F
→
02/06 14:29, , 21F
02/06 14:29, 21F
推
02/06 20:32, , 22F
02/06 20:32, 22F
→
02/06 20:32, , 23F
02/06 20:32, 23F
推
02/06 22:30, , 24F
02/06 22:30, 24F
※ 編輯: Mayday6 (72.182.109.27), 02/07/2018 00:04:45
推
02/07 09:45, , 25F
02/07 09:45, 25F
→
02/07 09:45, , 26F
02/07 09:45, 26F
推
02/07 14:21, , 27F
02/07 14:21, 27F
推
02/07 15:52, , 28F
02/07 15:52, 28F
→
02/07 15:53, , 29F
02/07 15:53, 29F
→
02/07 15:54, , 30F
02/07 15:54, 30F
→
02/08 19:10, , 31F
02/08 19:10, 31F
→
02/08 19:10, , 32F
02/08 19:10, 32F
→
02/08 23:22, , 33F
02/08 23:22, 33F
→
02/08 23:23, , 34F
02/08 23:23, 34F
→
02/08 23:24, , 35F
02/08 23:24, 35F
推
03/01 15:28, , 36F
03/01 15:28, 36F