상세 컨텐츠

본문 제목

Raspberry Pi Jessie 라즈베리파이 - apache-spark 아파치 스파크 설치

raspberrypi/etc

by ZelKun 2017. 7. 3. 12:30

본문

반응형

 

스파크 스터디를 하게되서...

노트북에 설치해도 되겠지만

언제든지 접속할 수 있게 PI에 설치를 진행했습니다

다행히도 JAVA기반이라 설치 및 실행에 크게 무리가 없었습니다

JAVA 설치는 여기 참고





출처: http://spark.apache.org

 

사이트에 접속해 Download 메뉴에 들어가면

최신버전을 제공합니다

 

책을 기준으로 스터디를 하게 되니 책(1.3.0)이랑 버전을 맞췄습니다

다운로드 링크를 확인하니 버전만 바꿔주면 될것 같아서 테스트해보니 다운이 되네요

진행은 2.1.0 최신으로 했습니다

한마디로 2개다 받았다는 소리

 

PI에서 설치를 진행합니다

 

pi@rasp2-dev:~ $ cd utils/

pi@rasp2-dev:~/utils $ wget http://d3kbcqa49mib13.cloudfront.net/spark-2.1.0-bin-hadoop2.7.tgz

--2017-04-11 01:29:19--  http://d3kbcqa49mib13.cloudfront.net/spark-2.1.0-bin-hadoop2.7.tgz

Resolving d3kbcqa49mib13.cloudfront.net (d3kbcqa49mib13.cloudfront.net)... 52.84.186.132, 52.84.186.207, 52.84.186.108, ...

Connecting to d3kbcqa49mib13.cloudfront.net (d3kbcqa49mib13.cloudfront.net)|52.84.186.132|:80... connected.

HTTP request sent, awaiting response... 200 OK

Length: 195636829 (187M) [application/x-tar]

Saving to: ‘spark-2.1.0-bin-hadoop2.7.tgz.1’

 

spark-2.1.0-bin-had 100%[=====================>] 186.57M  11.0MB/s   in 49s    

 

2017-04-11 01:30:08 (3.79 MB/s) - ‘spark-2.1.0-bin-hadoop2.7.tgz.1’ saved [195636829/195636829]

 

pi@rasp2-dev:~/utils $ 

언제나 그렇듯 utils 디렉토리에서 작업했습니다

 

  • tar zxvf spark-2.1.0-bin-hadoop2.7.tgz 

pi@rasp2-dev:~/utils $ tar zxvf spark-2.1.0-bin-hadoop2.7.tgz 

spark-2.1.0-bin-hadoop2.7/

spark-2.1.0-bin-hadoop2.7/NOTICE

spark-2.1.0-bin-hadoop2.7/jars/

spark-2.1.0-bin-hadoop2.7/jars/bonecp-0.8.0.RELEASE.jar

spark-2.1.0-bin-hadoop2.7/jars/commons-net-2.2.jar

spark-2.1.0-bin-hadoop2.7/jars/javax.servlet-api-3.1.0.jar

spark-2.1.0-bin-hadoop2.7/jars/hadoop-annotations-2.7.3.jar

spark-2.1.0-bin-hadoop2.7/jars/hadoop-hdfs-2.7.3.jar

spark-2.1.0-bin-hadoop2.7/jars/oro-2.0.8.jar

spark-2.1.0-bin-hadoop2.7/jars/xercesImpl-2.9.1.jar

spark-2.1.0-bin-hadoop2.7/jars/antlr-runtime-3.4.jar

spark-2.1.0-bin-hadoop2.7/jars/parquet-jackson-1.8.1.jar

spark-2.1.0-bin-hadoop2.7/jars/spark-unsafe_2.11-2.1.0.jar

 

 . . .

 

spark-2.1.0-bin-hadoop2.7/bin/load-spark-env.cmd

spark-2.1.0-bin-hadoop2.7/yarn/

spark-2.1.0-bin-hadoop2.7/yarn/spark-2.1.0-yarn-shuffle.jar

spark-2.1.0-bin-hadoop2.7/README.md

pi@rasp2-dev:~/utils $ 

앞축을 풀고

 

이제 샘플을 실행해봅니다

  • ./spark-2.1.0-bin-hadoop2.7/bin/run-example SparkPi 10

 

pi@rasp2-dev:~/utils $ ls -al

total 257068

drwxr-xr-x  5 pi pi      4096 Apr 11 15:57 .

drwxr-xr-x 35 pi pi      4096 Apr 10 18:26 ..

drwxr-xr-x 12 pi pi      4096 Dec 16 11:18 spark-2.1.0-bin-hadoop2.7

-rw-r--r--  1 pi pi 195636829 Dec 29 09:49 spark-2.1.0-bin-hadoop2.7.tgz

 

pi@rasp2-dev:~/utils $ ./spark-2.1.0-bin-hadoop2.7/bin/run-example SparkPi 10

Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties

17/04/11 16:11:38 INFO SparkContext: Running Spark version 2.1.0

17/04/11 16:11:41 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

17/04/11 16:11:42 WARN Utils: Your hostname, rasp2-dev resolves to a loopback address: 127.0.1.1; using 192.168.0.25 instead (on interface eth0)

17/04/11 16:11:42 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address

17/04/11 16:11:42 INFO SecurityManager: Changing view acls to: pi

17/04/11 16:11:42 INFO SecurityManager: Changing modify acls to: pi

17/04/11 16:11:42 INFO SecurityManager: Changing view acls groups to: 

17/04/11 16:11:42 INFO SecurityManager: Changing modify acls groups to: 

17/04/11 16:11:42 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(pi); groups with view permissions: Set(); users  with modify permissions: Set(pi); groups with modify permissions: Set()

 

   . . .

 

OutputCommitCoordinator stopped!

17/04/11 16:12:02 INFO SparkContext: Successfully stopped SparkContext

17/04/11 16:12:02 INFO ShutdownHookManager: Shutdown hook called

17/04/11 16:12:02 INFO ShutdownHookManager: Deleting directory /tmp/spark-8f798ccc-9e48-43e8-a5fd-634b53628285

pi@rasp2-dev:~/utils $ 

뭔가 로그가 엄청 나오네요

 

pi@rasp2-dev:~/utils $ ls -al

total 257072

drwxr-xr-x  6 pi pi      4096 Apr 11 16:11 .

drwxr-xr-x 35 pi pi      4096 Apr 10 18:26 ..

drwxr-xr-x 12 pi pi      4096 Dec 16 11:18 spark-2.1.0-bin-hadoop2.7

-rw-r--r--  1 pi pi 195636829 Dec 29 09:49 spark-2.1.0-bin-hadoop2.7.tgz

drwxr-xr-x  2 pi pi      4096 Apr 11 16:11 spark-warehouse

 

pi@rasp2-dev:~/utils $ cd spark-warehouse/

pi@rasp2-dev:~/utils/spark-warehouse $ ls -al

total 8

drwxr-xr-x 2 pi pi 4096 Apr 11 16:11 .

drwxr-xr-x 6 pi pi 4096 Apr 11 16:11 ..

pi@rasp2-dev:~/utils/spark-warehouse $ 

 

예제를 실행했더니

spark-warehous 라는 디렉토리가 생겼는데

뭐에 쓰는진 모르겠네요 안에 내용도 없고..

 

일단 utils 디렉토리에서 /usr/local/로 옮길거에요

개인 취향이니 안옮겨도 됩니다

  • sudo mv spark-2.1.0-bin-hadoop2.7 /usr/local/

pi@rasp2-dev:~/utils $ sudo mv spark-2.1.0-bin-hadoop2.7 /usr/local/

pi@rasp2-dev:~/utils $ cd /usr/local/

pi@rasp2-dev:/usr/local $ ls -al | grep spark

drwxr-xr-x 10 pi   pi    4096 Mar  6  2015 spark-1.3.0-bin-hadoop2.4

drwxr-xr-x 12 pi   pi    4096 Dec 16 11:18 spark-2.1.0-bin-hadoop2.7

pi@rasp2-dev:/usr/local $ 

2개다 작업해놨습니다

 

이제 명령어 기반으로 실행도 해봅니다

java, python, scala 등 다양한 언어를 지원하는것 같긴하지만

java는 build도 해야하고, R은 2버전에 추가 된 걸로 보이는데 R이 설치가 안되서 실행이 안되고

실행 테스트만 진행할거니 python, scala 만 진행했습니다

sql은 hadoop을 안써봐서 아직 모르겠네요

  • 참고로 logger를 log4j를 사용하니 conf/log4j.properties 를 이용해서 log level을 조절할 수 있습니다

conf/log4j.properties.template 파일이 있거든요 log4j.properties로 파일명 바꿔서 쓰면 될것 같네요

안해봤다는 소리죠

 

  • Python 실행
  • spark-2.1.0-bin-hadoop2.7/bin/pyspark

pi@rasp2-dev:/usr/local $ cd spark-2.1.0-bin-hadoop2.7/

pi@rasp2-dev:/usr/local/spark-2.1.0-bin-hadoop2.7 $ bin/pyspark 

Python 2.7.9 (default, Sep 17 2016, 20:26:04) 

[GCC 4.9.2] on linux2

Type "help", "copyright", "credits" or "license" for more information.

Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties

Setting default log level to "WARN".

To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).

17/06/12 00:15:31 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

17/06/12 00:15:32 WARN Utils: Your hostname, rasp2-dev resolves to a loopback address: 127.0.1.1; using 192.168.0.25 instead (on interface eth0)

17/06/12 00:15:32 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address

17/06/12 00:17:13 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0

17/06/12 00:17:14 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException

17/06/12 00:17:35 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException

Welcome to

      ____              __

     / __/__  ___ _____/ /__

    _\ \/ _ \/ _ `/ __/  '_/

   /__ / .__/\_,_/_/ /_/\_\   version 2.1.0

      /_/

 

Using Python version 2.7.9 (default, Sep 17 2016 20:26:04)

SparkSession available as 'spark'.

>>> 

 

  • Scala 실행
  • spark-2.1.0-bin-hadoop2.7/bin/spark-shell

pi@rasp2-dev:/usr/local $ cd spark-2.1.0-bin-hadoop2.7/

pi@rasp2-dev:/usr/local/spark-2.1.0-bin-hadoop2.7 $ bin/spark-shell 

Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties

Setting default log level to "WARN".

To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).

17/06/12 00:26:02 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

17/06/12 00:26:03 WARN Utils: Your hostname, rasp2-dev resolves to a loopback address: 127.0.1.1; using 192.168.0.25 instead (on interface eth0)

17/06/12 00:26:03 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address

17/06/12 00:27:18 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException

Spark context Web UI available at http://192.168.0.25:4040

Spark context available as 'sc' (master = local[*], app id = local-1497194769703).

Spark session available as 'spark'.

Welcome to

      ____              __

     / __/__  ___ _____/ /__

    _\ \/ _ \/ _ `/ __/  '_/

   /___/ .__/\_,_/_/ /_/\_\   version 2.1.0

      /_/

         

Using Scala version 2.11.8 (Java HotSpot(TM) Client VM, Java 1.8.0_65)

Type in expressions to have them evaluated.

Type :help for more information.

 

scala> 

1.3.0 버전과 다르게 2.1.0버전은 초기 실행이 매우 느립니다

hadoop 버전이 올라가서 그런것 같기도..?

 

 

참고


반응형

관련글 더보기

댓글 영역