Hive, MR Job의 Group별 YARN Queue 사용
0. YARN Queue
LDAP User Group별 Queue 생성
Queue Mappings : g:abiz:abiz,g:adev:adev
설정이 적용되도록 restart
1. Hive
설치
ambari 활용
Hive 실행 및 데이터 생성
sudo su - hive [hive@node01 ~]$ hive hive> create table table1(a int, b int); hive> insert into table1 values( 1,2); hive> insert into table1 values( 1,3); hive> insert into table1 values( 2,4); |
LDAP 계정으로 Hive 실행
[hive@node01 ~]$ beeline Beeline version 1.2.1000.2.5.3.0-37 by Apache Hive beeline> !connect jdbc:hive2://node02:10000/default john Enter password for jdbc:hive2://node02:10000/default: **** (hive) Connected to: Apache Hive (version 1.2.1000.2.5.3.0-37) Driver: Hive JDBC (version 1.2.1000.2.5.3.0-37) Transaction isolation: TRANSACTION_REPEATABLE_READ 0: jdbc:hive2://node02:10000/default> select sum(a) from table; INFO : Tez session hasn't been created yet. Opening session INFO : Dag name: select sum(a) from table(Stage-1) INFO : INFO : Status: Running (Executing on YARN cluster with App id application_1484727707431_0001) INFO : Map 1: -/- Reducer 2: 0/1 INFO : Map 1: 0/1 Reducer 2: 0/1 INFO : Map 1: 0/1 Reducer 2: 0/1 INFO : Map 1: 0(+1)/1 Reducer 2: 0/1 INFO : Map 1: 0/1 Reducer 2: 0/1 INFO : Map 1: 1/1 Reducer 2: 0(+1)/1 INFO : Map 1: 1/1 Reducer 2: 1/1 +------+--+ | _c0 | +------+--+ | 4 | +------+--+ 1 row selected (23.803 seconds) |
2. MR Job (OS계정)
WordCount 실행
hadoop home (/usr/hdp/2.5.3.0-37/hadoop)에 Word Count Example 다운로드 후 압축풀기
[root@node01 hadoop] # wget http://salsahpc.indiana.edu/tutorial/source_code/Hadoop-WordCount.zip [root@node01 hadoop] # unzip Hadoop-WordCount.zip Archive: Hadoop -WordCount .zip creating: Hadoop -WordCount / creating: Hadoop -WordCount /classes/ creating: Hadoop -WordCount /input/ inflating: Hadoop -WordCount /input/Word_Count_input.txt inflating: Hadoop -WordCount /WordCount.java inflating: Hadoop -WordCount /clean.sh inflating: Hadoop -WordCount /build.sh inflating: Hadoop -WordCount /classes/WordCount $Reduce .class inflating: Hadoop -WordCount /classes/WordCount.class inflating: Hadoop -WordCount /classes/WordCount $Map .class inflating: Hadoop -WordCount /wordcount.jar |
adev의 jane으로 실행
[root@node01 Hadoop -WordCount ] # su - hdfs [hdfs@node01 Hadoop -WordCount ]$ hadoop fs -mkdir /user/jane [hdfs@node01 Hadoop -WordCount ]$ hadoop fs -chown jane:adev /user/jane [hdfs@node01 Hadoop -WordCount ]$ exit [root@node01 Hadoop -WordCount ] # su jane [jane@node01 Hadoop -WordCount ]$ hadoop fs -put input/ /user/jane/input |
Word Count jar 실행
[jane@node01 Hadoop -WordCount ]$ hadoop jar /usr/hdp/2.5.3.0-37/hadoop/Hadoop -WordCount /wordcount.jar WordCount input output 17/01/19 02:28:04 INFO impl.TimelineClientImpl: Timeline service address: http://node02:8188/ws/v1/timeline/ 17/01/19 02:28:04 INFO client.RMProxy: Connecting to ResourceManager at node02/172.31.1.255:8050 17/01/19 02:28:04 INFO client.AHSProxy: Connecting to Application History server at node02/172.31.1.255:10200 17/01/19 02:28:05 INFO input.FileInputFormat: Total input paths to process : 1 17/01/19 02:28:05 INFO mapreduce.JobSubmitter: number of splits:1 17/01/19 02:28:05 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1484790795688_0002 17/01/19 02:28:05 INFO impl.YarnClientImpl: Submitted application application_1484790795688_0002 17/01/19 02:28:05 INFO mapreduce.Job: The url to track the job: http://node02:8088/proxy/application_1484790795688_0002/ 17/01/19 02:28:05 INFO mapreduce.Job: Running job: job_1484790795688_0002 17/01/19 02:28:16 INFO mapreduce.Job: Job job_1484790795688_0002 running in uber mode : false 17/01/19 02:28:16 INFO mapreduce.Job: map 0% reduce 0% 17/01/19 02:28:29 INFO mapreduce.Job: map 100% reduce 0% 17/01/19 02:28:35 INFO mapreduce.Job: map 100% reduce 100% 17/01/19 02:28:36 INFO mapreduce.Job: Job job_1484790795688_0002 completed successfully 17/01/19 02:28:36 INFO mapreduce.Job: Counters: 49 File System Counters FILE: Number of bytes read=167524 FILE: Number of bytes written=616439 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=384328 HDFS: Number of bytes written=120766 HDFS: Number of read operations=6 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Launched map tasks=1 Launched reduce tasks=1 Data -local map tasks=1 Total time spent by all maps in occupied slots (ms)=10159 Total time spent by all reduces in occupied slots (ms)=8196 Total time spent by all map tasks (ms)=10159 Total time spent by all reduce tasks (ms)=4098 Total vcore -milliseconds taken by all map tasks=10159 Total vcore -milliseconds taken by all reduce tasks=4098 Total megabyte -milliseconds taken by all map tasks=10402816 Total megabyte -milliseconds taken by all reduce tasks=8392704 Map -Reduce Framework Map input records=9488 Map output records=67825 Map output bytes=643386 Map output materialized bytes=167524 Input split bytes=121 Combine input records=67825 Combine output records=11900 Reduce input groups=11900 Reduce shuffle bytes=167524 Reduce input records=11900 Reduce output records=11900 Spilled Records=23800 Shuffled Maps =1 Failed Shuffles=0 Merged Map outputs=1 GC time elapsed (ms)=144 CPU time spent (ms)=2950 Physical memory (bytes) snapshot=1022894080 Virtual memory (bytes) snapshot=6457335808 Total committed heap usage (bytes)=858783744 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=384207 File Output Format Counters Bytes Written=120766 |
WordCount 결과
[jane@node01 Hadoop -WordCount ]$ hadoop fs -ls /user/jane/ Found 3 items drwx------ - jane adev 0 2017-01-19 02:28 /user/jane/.staging drwxr -xr -x - jane adev 0 2017-01-19 02:17 /user/jane/input drwxr -xr -x - jane adev 0 2017-01-19 02:28 /user/jane/output |
3. MR Job (HADOOP_USER_NAME parameter)
WordCount 실행
HDFS에 user를 위한 폴더를 생성 후 input을 업로드
Word Count jar 실행 (HADOOP_USER_NAME=lucy)
[root@node01 Hadoop -WordCount ] # HADOOP_USER_NAME=lucy hadoop jar /usr/hdp/2.5.3.0-37/hadoop/Hadoop-WordCount/wordcount.jar WordCount input output 17/01/19 04:58:54 INFO impl.TimelineClientImpl: Timeline service address: http://node02:8188/ws/v1/timeline/ 17/01/19 04:58:54 INFO client.RMProxy: Connecting to ResourceManager at node02/172.31.1.255:8050 17/01/19 04:58:54 INFO client.AHSProxy: Connecting to Application History server at node02/172.31.1.255:10200 17/01/19 04:58:55 INFO input.FileInputFormat: Total input paths to process : 1 17/01/19 04:58:55 INFO mapreduce.JobSubmitter: number of splits:1 17/01/19 04:58:56 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1484800564385_0001 17/01/19 04:58:56 INFO impl.YarnClientImpl: Submitted application application_1484800564385_0001 17/01/19 04:58:56 INFO mapreduce.Job: The url to track the job: http://node02:8088/proxy/application_1484800564385_0001/ 17/01/19 04:58:56 INFO mapreduce.Job: Running job: job_1484800564385_0001 17/01/19 04:59:05 INFO mapreduce.Job: Job job_1484800564385_0001 running in uber mode : false 17/01/19 04:59:05 INFO mapreduce.Job: map 0% reduce 0% 17/01/19 04:59:12 INFO mapreduce.Job: map 100% reduce 0% 17/01/19 04:59:19 INFO mapreduce.Job: map 100% reduce 100% 17/01/19 04:59:19 INFO mapreduce.Job: Job job_1484800564385_0001 completed successfully 17/01/19 04:59:19 INFO mapreduce.Job: Counters: 49 File System Counters FILE: Number of bytes read=167524 FILE: Number of bytes written=616439 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=384328 HDFS: Number of bytes written=120766 HDFS: Number of read operations=6 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Launched map tasks=1 Launched reduce tasks=1 Data -local map tasks=1 Total time spent by all maps in occupied slots (ms)=5114 Total time spent by all reduces in occupied slots (ms)=7530 Total time spent by all map tasks (ms)=5114 Total time spent by all reduce tasks (ms)=3765 Total vcore -milliseconds taken by all map tasks=5114 Total vcore -milliseconds taken by all reduce tasks=3765 Total megabyte -milliseconds taken by all map tasks=5236736 Total megabyte -milliseconds taken by all reduce tasks=7710720 Map -Reduce Framework Map input records=9488 Map output records=67825 Map output bytes=643386 Map output materialized bytes=167524 Input split bytes=121 Combine input records=67825 Combine output records=11900 Reduce input groups=11900 Reduce shuffle bytes=167524 Reduce input records=11900 Reduce output records=11900 Spilled Records=23800 Shuffled Maps =1 Failed Shuffles=0 Merged Map outputs=1 GC time elapsed (ms)=148 CPU time spent (ms)=3220 Physical memory (bytes) snapshot=1033814016 Virtual memory (bytes) snapshot=6464356352 Total committed heap usage (bytes)=833617920 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=384207 File Output Format Counters Bytes Written=120766 |
WordCount 결과
[lucy@node01 Hadoop -WordCount ]$ hadoop fs -ls /user/lucy Found 3 items drwx------ - lucy adev 0 2017-01-19 04:59 /user/lucy/.staging drwxr -xr -x - lucy adev 0 2017-01-19 04:48 /user/lucy/input drwxr -xr -x - lucy adev 0 2017-01-19 04:59 /user/lucy/output |
'BigData' 카테고리의 다른 글
Hadoop Security for Multi tenant #4 (0) | 2017.04.03 |
---|---|
Hadoop Security for Multi tenant #3 (0) | 2017.03.24 |
Hadoop Securiy for Multi tenant #1 (0) | 2017.01.17 |
Flume-Kafka-Elasticsearch 테스트 (0) | 2016.03.14 |
Storm특징 및 Spark와의 차이점 (0) | 2014.12.12 |