IT로 세상을 이롭게

0. YARN Queue

LDAP User Group별 Queue 생성

Queue Mappings : g:abiz:abiz,g:adev:adev

설정이 적용되도록 restart

1. Hive

설치

ambari 활용

Hive 실행 및 데이터 생성

sudo su - hive
[hive@node01 ~]$ hive
hive> create table table1(a int, b int);
hive> insert into table1 values( 1,2);
hive> insert into table1 values( 1,3);
hive> insert into table1 values( 2,4);

LDAP 계정으로 Hive 실행

[hive@node01 ~]$ beeline
Beeline version 1.2.1000.2.5.3.0-37 by Apache Hive
beeline> !connect jdbc:hive2://node02:10000/default john
Enter password for jdbc:hive2://node02:10000/default: **** (hive)
Connected to: Apache Hive (version 1.2.1000.2.5.3.0-37)
Driver: Hive JDBC (version 1.2.1000.2.5.3.0-37)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://node02:10000/default> select sum(a) from table;
INFO : Tez session hasn't been created yet. Opening session
INFO : Dag name: select sum(a) from table(Stage-1)
INFO :
INFO : Status: Running (Executing on YARN cluster with App id application_1484727707431_0001)
INFO : Map 1: -/- Reducer 2: 0/1 
INFO : Map 1: 0/1 Reducer 2: 0/1 
INFO : Map 1: 0/1 Reducer 2: 0/1 
INFO : Map 1: 0(+1)/1 Reducer 2: 0/1 
INFO : Map 1: 0/1 Reducer 2: 0/1 
INFO : Map 1: 1/1 Reducer 2: 0(+1)/1 
INFO : Map 1: 1/1 Reducer 2: 1/1 
+------+--+
| _c0 |
+------+--+
| 4 |
+------+--+
1 row selected (23.803 seconds)

2. MR Job (OS계정)

WordCount 실행

hadoop home (/usr/hdp/2.5.3.0-37/hadoop)에 Word Count Example 다운로드 후 압축풀기

[root@node01 hadoop]# wget http://salsahpc.indiana.edu/tutorial/source_code/Hadoop-WordCount.zip
[root@node01 hadoop]# unzip Hadoop-WordCount.zip
Archive:  Hadoop-WordCount.zip
   creating: Hadoop-WordCount/
   creating: Hadoop-WordCount/classes/
   creating: Hadoop-WordCount/input/
  inflating: Hadoop-WordCount/input/Word_Count_input.txt  
  inflating: Hadoop-WordCount/WordCount.java  
  inflating: Hadoop-WordCount/clean.sh  
  inflating: Hadoop-WordCount/build.sh  
  inflating: Hadoop-WordCount/classes/WordCount$Reduce.class  
  inflating: Hadoop-WordCount/classes/WordCount.class  
  inflating: Hadoop-WordCount/classes/WordCount$Map.class  
  inflating: Hadoop-WordCount/wordcount.jar

adev의 jane으로 실행

[root@node01 Hadoop-WordCount]# su - hdfs
[hdfs@node01 Hadoop-WordCount]$ hadoop fs -mkdir /user/jane
[hdfs@node01 Hadoop-WordCount]$ hadoop fs -chown jane:adev /user/jane
[hdfs@node01 Hadoop-WordCount]$ exit
[root@node01 Hadoop-WordCount]# su jane
[jane@node01 Hadoop-WordCount]$ hadoop fs -put input/ /user/jane/input

Word Count jar 실행

[jane@node01 Hadoop-WordCount]$ hadoop jar /usr/hdp/2.5.3.0-37/hadoop/Hadoop-WordCount/wordcount.jar WordCount input output
17/01/19 02:28:04 INFO impl.TimelineClientImpl: Timeline service address: http://node02:8188/ws/v1/timeline/
17/01/19 02:28:04 INFO client.RMProxy: Connecting to ResourceManager at node02/172.31.1.255:8050
17/01/19 02:28:04 INFO client.AHSProxy: Connecting to Application History server at node02/172.31.1.255:10200
17/01/19 02:28:05 INFO input.FileInputFormat: Total input paths to process : 1
17/01/19 02:28:05 INFO mapreduce.JobSubmitter: number of splits:1
17/01/19 02:28:05 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1484790795688_0002
17/01/19 02:28:05 INFO impl.YarnClientImpl: Submitted application application_1484790795688_0002
17/01/19 02:28:05 INFO mapreduce.Job: The url to track the job: http://node02:8088/proxy/application_1484790795688_0002/
17/01/19 02:28:05 INFO mapreduce.Job: Running job: job_1484790795688_0002
17/01/19 02:28:16 INFO mapreduce.Job: Job job_1484790795688_0002 running in uber mode : false
17/01/19 02:28:16 INFO mapreduce.Job:  map 0% reduce 0%
17/01/19 02:28:29 INFO mapreduce.Job:  map 100% reduce 0%
17/01/19 02:28:35 INFO mapreduce.Job:  map 100% reduce 100%
17/01/19 02:28:36 INFO mapreduce.Job: Job job_1484790795688_0002 completed successfully
17/01/19 02:28:36 INFO mapreduce.Job: Counters: 49
    File System Counters
        FILE: Number of bytes read=167524
        FILE: Number of bytes written=616439
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=384328
        HDFS: Number of bytes written=120766
        HDFS: Number of read operations=6
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=2
    Job Counters 
        Launched map tasks=1
        Launched reduce tasks=1
        Data-local map tasks=1
        Total time spent by all maps in occupied slots (ms)=10159
        Total time spent by all reduces in occupied slots (ms)=8196
        Total time spent by all map tasks (ms)=10159
        Total time spent by all reduce tasks (ms)=4098
        Total vcore-milliseconds taken by all map tasks=10159
        Total vcore-milliseconds taken by all reduce tasks=4098
        Total megabyte-milliseconds taken by all map tasks=10402816
        Total megabyte-milliseconds taken by all reduce tasks=8392704
    Map-Reduce Framework
        Map input records=9488
        Map output records=67825
        Map output bytes=643386
        Map output materialized bytes=167524
        Input split bytes=121
        Combine input records=67825
        Combine output records=11900
        Reduce input groups=11900
        Reduce shuffle bytes=167524
        Reduce input records=11900
        Reduce output records=11900
        Spilled Records=23800
        Shuffled Maps =1
        Failed Shuffles=0
        Merged Map outputs=1
        GC time elapsed (ms)=144
        CPU time spent (ms)=2950
        Physical memory (bytes) snapshot=1022894080
        Virtual memory (bytes) snapshot=6457335808
        Total committed heap usage (bytes)=858783744
    Shuffle Errors
        BAD_ID=0
        CONNECTION=0
        IO_ERROR=0
        WRONG_LENGTH=0
        WRONG_MAP=0
        WRONG_REDUCE=0
    File Input Format Counters 
        Bytes Read=384207
    File Output Format Counters 
        Bytes Written=120766

WordCount 결과

[jane@node01 Hadoop-WordCount]$ hadoop fs -ls /user/jane/
Found 3 items
drwx------   - jane adev          0 2017-01-19 02:28 /user/jane/.staging
drwxr-xr-x   - jane adev          0 2017-01-19 02:17 /user/jane/input
drwxr-xr-x   - jane adev          0 2017-01-19 02:28 /user/jane/output

3. MR Job (HADOOP_USER_NAME parameter)

WordCount 실행

HDFS에 user를 위한 폴더를 생성 후 input을 업로드

Word Count jar 실행 (HADOOP_USER_NAME=lucy)

[root@node01 Hadoop-WordCount]# HADOOP_USER_NAME=lucy hadoop jar /usr/hdp/2.5.3.0-37/hadoop/Hadoop-WordCount/wordcount.jar WordCount input output
17/01/19 04:58:54 INFO impl.TimelineClientImpl: Timeline service address: http://node02:8188/ws/v1/timeline/
17/01/19 04:58:54 INFO client.RMProxy: Connecting to ResourceManager at node02/172.31.1.255:8050
17/01/19 04:58:54 INFO client.AHSProxy: Connecting to Application History server at node02/172.31.1.255:10200
17/01/19 04:58:55 INFO input.FileInputFormat: Total input paths to process : 1
17/01/19 04:58:55 INFO mapreduce.JobSubmitter: number of splits:1
17/01/19 04:58:56 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1484800564385_0001
17/01/19 04:58:56 INFO impl.YarnClientImpl: Submitted application application_1484800564385_0001
17/01/19 04:58:56 INFO mapreduce.Job: The url to track the job: http://node02:8088/proxy/application_1484800564385_0001/
17/01/19 04:58:56 INFO mapreduce.Job: Running job: job_1484800564385_0001
17/01/19 04:59:05 INFO mapreduce.Job: Job job_1484800564385_0001 running in uber mode : false
17/01/19 04:59:05 INFO mapreduce.Job:  map 0% reduce 0%
17/01/19 04:59:12 INFO mapreduce.Job:  map 100% reduce 0%
17/01/19 04:59:19 INFO mapreduce.Job:  map 100% reduce 100%
17/01/19 04:59:19 INFO mapreduce.Job: Job job_1484800564385_0001 completed successfully
17/01/19 04:59:19 INFO mapreduce.Job: Counters: 49
    File System Counters
        FILE: Number of bytes read=167524
        FILE: Number of bytes written=616439
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=384328
        HDFS: Number of bytes written=120766
        HDFS: Number of read operations=6
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=2
    Job Counters 
        Launched map tasks=1
        Launched reduce tasks=1
        Data-local map tasks=1
        Total time spent by all maps in occupied slots (ms)=5114
        Total time spent by all reduces in occupied slots (ms)=7530
        Total time spent by all map tasks (ms)=5114
        Total time spent by all reduce tasks (ms)=3765
        Total vcore-milliseconds taken by all map tasks=5114
        Total vcore-milliseconds taken by all reduce tasks=3765
        Total megabyte-milliseconds taken by all map tasks=5236736
        Total megabyte-milliseconds taken by all reduce tasks=7710720
    Map-Reduce Framework
        Map input records=9488
        Map output records=67825
        Map output bytes=643386
        Map output materialized bytes=167524
        Input split bytes=121
        Combine input records=67825
        Combine output records=11900
        Reduce input groups=11900
        Reduce shuffle bytes=167524
        Reduce input records=11900
        Reduce output records=11900
        Spilled Records=23800
        Shuffled Maps =1
        Failed Shuffles=0
        Merged Map outputs=1
        GC time elapsed (ms)=148
        CPU time spent (ms)=3220
        Physical memory (bytes) snapshot=1033814016
        Virtual memory (bytes) snapshot=6464356352
        Total committed heap usage (bytes)=833617920
    Shuffle Errors
        BAD_ID=0
        CONNECTION=0
        IO_ERROR=0
        WRONG_LENGTH=0
        WRONG_MAP=0
        WRONG_REDUCE=0
    File Input Format Counters 
        Bytes Read=384207
    File Output Format Counters 
        Bytes Written=120766

WordCount 결과

[lucy@node01 Hadoop-WordCount]$ hadoop fs -ls /user/lucy
Found 3 items
drwx------   - lucy adev          0 2017-01-19 04:59 /user/lucy/.staging
drwxr-xr-x   - lucy adev          0 2017-01-19 04:48 /user/lucy/input
drwxr-xr-x   - lucy adev          0 2017-01-19 04:59 /user/lucy/output

Spark에서 Hive ACID Table 접근시 오류 (0)	2017.10.12
Apache hive - transaction (0)	2017.09.26
Hadoop Security for Multi tenant #3 (0)	2017.03.24
Hadoop Securiy for Multi tenant #2 (0)	2017.01.20
Hadoop Securiy for Multi tenant #1 (0)	2017.01.17

Apache hive - transaction (0)	2017.09.26
Hadoop Security for Multi tenant #4 (0)	2017.04.03
Hadoop Securiy for Multi tenant #2 (0)	2017.01.20
Hadoop Securiy for Multi tenant #1 (0)	2017.01.17
Flume-Kafka-Elasticsearch 테스트 (0)	2016.03.14

사례로 배워보는 디자인패턴 #1 - 기본적인 MVC (0)	2019.10.17
Azure IoT Hub를 이용한 Firmware Update기능 + Java enum을 활용한 디자인패턴 적용 (0)	2019.10.08
1. 객체의 창조 (0)	2015.06.29
IT시스템과 현실세계의 관계 (0)	2015.06.29
Fine-Grained vs Coarse-Grained (0)	2014.12.08

Spring MVC를 이용한 Test 중 Controller Layer 의 Exception처리 (0)	2019.06.03
Spring JPA + Azure Cosmos DB 연결 (라이브러리 변경관련 트러블슈팅) (0)	2019.03.14
FTP connection reset에 관련된 오류해결 (0)	2016.04.05
Spring 과 Java의 ThreadPool 구현차이 (0)	2015.06.30
volatile키워드와 동기화, Atomic 타입 (0)	2014.11.07

Azure Everywhere 2019 후기 (0)	2019.01.12
Red Hat Forum 2018 Seoul 후기 (0)	2018.11.07

분류 전체보기

Oozie 와의 연동

1. Ambari로 Oozie 설치 후 아래와 같이 설정

- custom oozie-site

- Advanced oozie-site

2. oozie관련 xml을 작성하여 지정된 위치에 업로드

(hdfs://node01:8020/user/john/oozie-sample.xml)

3. xml에서 사용할 property 설정

(/home/root/apps/config/example.properties)

4. oozie 실행

5. 참고

'BigData' 카테고리의 다른 글

Livy오픈소스를 활용한 Spark impersonation

1. Hadoop core-site.xml

2. Livy configuration

3. Rest API - Test

'BigData' 카테고리의 다른 글

'Application Design' 카테고리의 다른 글

Class HttpPutFormContentFilter

'Rest API' 카테고리의 다른 글

'Java' 카테고리의 다른 글

'컨퍼런스' 카테고리의 다른 글

0. YARN Queue

LDAP User Group별 Queue 생성

1. Hive

설치

Hive 실행 및 데이터 생성

LDAP 계정으로 Hive 실행

2. MR Job (OS계정)

WordCount 실행

WordCount 결과

3. MR Job (HADOOP_USER_NAME parameter)

WordCount 실행

WordCount 결과

'BigData' 카테고리의 다른 글

티스토리툴바