스파크 스터디를 하게되서...
노트북에 설치해도 되겠지만
언제든지 접속할 수 있게 PI에 설치를 진행했습니다
다행히도 JAVA기반이라 설치 및 실행에 크게 무리가 없었습니다
JAVA 설치는 여기 참고
사이트에 접속해 Download 메뉴에 들어가면
최신버전을 제공합니다
책을 기준으로 스터디를 하게 되니 책(1.3.0)이랑 버전을 맞췄습니다
다운로드 링크를 확인하니 버전만 바꿔주면 될것 같아서 테스트해보니 다운이 되네요
진행은 2.1.0 최신으로 했습니다
한마디로 2개다 받았다는 소리
PI에서 설치를 진행합니다
pi@rasp2-dev:~ $ cd utils/ pi@rasp2-dev:~/utils $ wget http://d3kbcqa49mib13.cloudfront.net/spark-2.1.0-bin-hadoop2.7.tgz --2017-04-11 01:29:19-- http://d3kbcqa49mib13.cloudfront.net/spark-2.1.0-bin-hadoop2.7.tgz Resolving d3kbcqa49mib13.cloudfront.net (d3kbcqa49mib13.cloudfront.net)... 52.84.186.132, 52.84.186.207, 52.84.186.108, ... Connecting to d3kbcqa49mib13.cloudfront.net (d3kbcqa49mib13.cloudfront.net)|52.84.186.132|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 195636829 (187M) [application/x-tar] Saving to: ‘spark-2.1.0-bin-hadoop2.7.tgz.1’
spark-2.1.0-bin-had 100%[=====================>] 186.57M 11.0MB/s in 49s
2017-04-11 01:30:08 (3.79 MB/s) - ‘spark-2.1.0-bin-hadoop2.7.tgz.1’ saved [195636829/195636829]
pi@rasp2-dev:~/utils $ |
언제나 그렇듯 utils 디렉토리에서 작업했습니다
pi@rasp2-dev:~/utils $ tar zxvf spark-2.1.0-bin-hadoop2.7.tgz spark-2.1.0-bin-hadoop2.7/ spark-2.1.0-bin-hadoop2.7/NOTICE spark-2.1.0-bin-hadoop2.7/jars/ spark-2.1.0-bin-hadoop2.7/jars/bonecp-0.8.0.RELEASE.jar spark-2.1.0-bin-hadoop2.7/jars/commons-net-2.2.jar spark-2.1.0-bin-hadoop2.7/jars/javax.servlet-api-3.1.0.jar spark-2.1.0-bin-hadoop2.7/jars/hadoop-annotations-2.7.3.jar spark-2.1.0-bin-hadoop2.7/jars/hadoop-hdfs-2.7.3.jar spark-2.1.0-bin-hadoop2.7/jars/oro-2.0.8.jar spark-2.1.0-bin-hadoop2.7/jars/xercesImpl-2.9.1.jar spark-2.1.0-bin-hadoop2.7/jars/antlr-runtime-3.4.jar spark-2.1.0-bin-hadoop2.7/jars/parquet-jackson-1.8.1.jar spark-2.1.0-bin-hadoop2.7/jars/spark-unsafe_2.11-2.1.0.jar
. . .
spark-2.1.0-bin-hadoop2.7/bin/load-spark-env.cmd spark-2.1.0-bin-hadoop2.7/yarn/ spark-2.1.0-bin-hadoop2.7/yarn/spark-2.1.0-yarn-shuffle.jar spark-2.1.0-bin-hadoop2.7/README.md pi@rasp2-dev:~/utils $ |
앞축을 풀고
이제 샘플을 실행해봅니다
pi@rasp2-dev:~/utils $ ls -al total 257068 drwxr-xr-x 5 pi pi 4096 Apr 11 15:57 . drwxr-xr-x 35 pi pi 4096 Apr 10 18:26 .. drwxr-xr-x 12 pi pi 4096 Dec 16 11:18 spark-2.1.0-bin-hadoop2.7 -rw-r--r-- 1 pi pi 195636829 Dec 29 09:49 spark-2.1.0-bin-hadoop2.7.tgz
pi@rasp2-dev:~/utils $ ./spark-2.1.0-bin-hadoop2.7/bin/run-example SparkPi 10 Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 17/04/11 16:11:38 INFO SparkContext: Running Spark version 2.1.0 17/04/11 16:11:41 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 17/04/11 16:11:42 WARN Utils: Your hostname, rasp2-dev resolves to a loopback address: 127.0.1.1; using 192.168.0.25 instead (on interface eth0) 17/04/11 16:11:42 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address 17/04/11 16:11:42 INFO SecurityManager: Changing view acls to: pi 17/04/11 16:11:42 INFO SecurityManager: Changing modify acls to: pi 17/04/11 16:11:42 INFO SecurityManager: Changing view acls groups to: 17/04/11 16:11:42 INFO SecurityManager: Changing modify acls groups to: 17/04/11 16:11:42 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(pi); groups with view permissions: Set(); users with modify permissions: Set(pi); groups with modify permissions: Set()
. . .
OutputCommitCoordinator stopped! 17/04/11 16:12:02 INFO SparkContext: Successfully stopped SparkContext 17/04/11 16:12:02 INFO ShutdownHookManager: Shutdown hook called 17/04/11 16:12:02 INFO ShutdownHookManager: Deleting directory /tmp/spark-8f798ccc-9e48-43e8-a5fd-634b53628285 pi@rasp2-dev:~/utils $ |
뭔가 로그가 엄청 나오네요
pi@rasp2-dev:~/utils $ ls -al total 257072 drwxr-xr-x 6 pi pi 4096 Apr 11 16:11 . drwxr-xr-x 35 pi pi 4096 Apr 10 18:26 .. drwxr-xr-x 12 pi pi 4096 Dec 16 11:18 spark-2.1.0-bin-hadoop2.7 -rw-r--r-- 1 pi pi 195636829 Dec 29 09:49 spark-2.1.0-bin-hadoop2.7.tgz drwxr-xr-x 2 pi pi 4096 Apr 11 16:11 spark-warehouse
pi@rasp2-dev:~/utils $ cd spark-warehouse/ pi@rasp2-dev:~/utils/spark-warehouse $ ls -al total 8 drwxr-xr-x 2 pi pi 4096 Apr 11 16:11 . drwxr-xr-x 6 pi pi 4096 Apr 11 16:11 .. pi@rasp2-dev:~/utils/spark-warehouse $
|
예제를 실행했더니
spark-warehous 라는 디렉토리가 생겼는데
뭐에 쓰는진 모르겠네요 안에 내용도 없고..
일단 utils 디렉토리에서 /usr/local/로 옮길거에요
개인 취향이니 안옮겨도 됩니다
pi@rasp2-dev:~/utils $ sudo mv spark-2.1.0-bin-hadoop2.7 /usr/local/ pi@rasp2-dev:~/utils $ cd /usr/local/ pi@rasp2-dev:/usr/local $ ls -al | grep spark drwxr-xr-x 10 pi pi 4096 Mar 6 2015 spark-1.3.0-bin-hadoop2.4 drwxr-xr-x 12 pi pi 4096 Dec 16 11:18 spark-2.1.0-bin-hadoop2.7 pi@rasp2-dev:/usr/local $ |
2개다 작업해놨습니다
이제 명령어 기반으로 실행도 해봅니다
java, python, scala 등 다양한 언어를 지원하는것 같긴하지만
java는 build도 해야하고, R은 2버전에 추가 된 걸로 보이는데 R이 설치가 안되서 실행이 안되고
실행 테스트만 진행할거니 python, scala 만 진행했습니다
sql은 hadoop을 안써봐서 아직 모르겠네요
conf/log4j.properties.template 파일이 있거든요 log4j.properties로 파일명 바꿔서 쓰면 될것 같네요
안해봤다는 소리죠
pi@rasp2-dev:/usr/local $ cd spark-2.1.0-bin-hadoop2.7/ pi@rasp2-dev:/usr/local/spark-2.1.0-bin-hadoop2.7 $ bin/pyspark Python 2.7.9 (default, Sep 17 2016, 20:26:04) [GCC 4.9.2] on linux2 Type "help", "copyright", "credits" or "license" for more information. Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). 17/06/12 00:15:31 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 17/06/12 00:15:32 WARN Utils: Your hostname, rasp2-dev resolves to a loopback address: 127.0.1.1; using 192.168.0.25 instead (on interface eth0) 17/06/12 00:15:32 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address 17/06/12 00:17:13 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0 17/06/12 00:17:14 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException 17/06/12 00:17:35 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /__ / .__/\_,_/_/ /_/\_\ version 2.1.0 /_/
Using Python version 2.7.9 (default, Sep 17 2016 20:26:04) SparkSession available as 'spark'. >>> |
pi@rasp2-dev:/usr/local $ cd spark-2.1.0-bin-hadoop2.7/ pi@rasp2-dev:/usr/local/spark-2.1.0-bin-hadoop2.7 $ bin/spark-shell Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). 17/06/12 00:26:02 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 17/06/12 00:26:03 WARN Utils: Your hostname, rasp2-dev resolves to a loopback address: 127.0.1.1; using 192.168.0.25 instead (on interface eth0) 17/06/12 00:26:03 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address 17/06/12 00:27:18 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException Spark context Web UI available at http://192.168.0.25:4040 Spark context available as 'sc' (master = local[*], app id = local-1497194769703). Spark session available as 'spark'. Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 2.1.0 /_/
Using Scala version 2.11.8 (Java HotSpot(TM) Client VM, Java 1.8.0_65) Type in expressions to have them evaluated. Type :help for more information.
scala> |
1.3.0 버전과 다르게 2.1.0버전은 초기 실행이 매우 느립니다
hadoop 버전이 올라가서 그런것 같기도..?
참고
Raspberry Pi 라즈베리파이 - kernel panic - not syncing vfs (0) | 2018.07.29 |
---|---|
Raspberry Pi Minergate Litecoin 라즈베리파이 마이너게이트 라이트코인 채굴 (0) | 2018.01.07 |
Raspberry Pi Jessie 라즈베리파이 - WordPress 4.7 설치 (0) | 2017.05.08 |
Raspberry Pi jessie - install git (0) | 2017.03.15 |
Raspberry Pi jessie - VNC Autostart VNC 자동시작 (0) | 2017.01.18 |
댓글 영역