IT로 세상을 이롭게

0. YARN Queue

LDAP User Group별 Queue 생성

Queue Mappings : g:abiz:abiz,g:adev:adev

설정이 적용되도록 restart

1. Hive

설치

ambari 활용

Hive 실행 및 데이터 생성

sudo su - hive
[hive@node01 ~]$ hive
hive> create table table1(a int, b int);
hive> insert into table1 values( 1,2);
hive> insert into table1 values( 1,3);
hive> insert into table1 values( 2,4);

LDAP 계정으로 Hive 실행

[hive@node01 ~]$ beeline
Beeline version 1.2.1000.2.5.3.0-37 by Apache Hive
beeline> !connect jdbc:hive2://node02:10000/default john
Enter password for jdbc:hive2://node02:10000/default: **** (hive)
Connected to: Apache Hive (version 1.2.1000.2.5.3.0-37)
Driver: Hive JDBC (version 1.2.1000.2.5.3.0-37)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://node02:10000/default> select sum(a) from table;
INFO : Tez session hasn't been created yet. Opening session
INFO : Dag name: select sum(a) from table(Stage-1)
INFO :
INFO : Status: Running (Executing on YARN cluster with App id application_1484727707431_0001)
INFO : Map 1: -/- Reducer 2: 0/1 
INFO : Map 1: 0/1 Reducer 2: 0/1 
INFO : Map 1: 0/1 Reducer 2: 0/1 
INFO : Map 1: 0(+1)/1 Reducer 2: 0/1 
INFO : Map 1: 0/1 Reducer 2: 0/1 
INFO : Map 1: 1/1 Reducer 2: 0(+1)/1 
INFO : Map 1: 1/1 Reducer 2: 1/1 
+------+--+
| _c0 |
+------+--+
| 4 |
+------+--+
1 row selected (23.803 seconds)

2. MR Job (OS계정)

WordCount 실행

hadoop home (/usr/hdp/2.5.3.0-37/hadoop)에 Word Count Example 다운로드 후 압축풀기

[root@node01 hadoop]# wget http://salsahpc.indiana.edu/tutorial/source_code/Hadoop-WordCount.zip
[root@node01 hadoop]# unzip Hadoop-WordCount.zip
Archive:  Hadoop-WordCount.zip
   creating: Hadoop-WordCount/
   creating: Hadoop-WordCount/classes/
   creating: Hadoop-WordCount/input/
  inflating: Hadoop-WordCount/input/Word_Count_input.txt  
  inflating: Hadoop-WordCount/WordCount.java  
  inflating: Hadoop-WordCount/clean.sh  
  inflating: Hadoop-WordCount/build.sh  
  inflating: Hadoop-WordCount/classes/WordCount$Reduce.class  
  inflating: Hadoop-WordCount/classes/WordCount.class  
  inflating: Hadoop-WordCount/classes/WordCount$Map.class  
  inflating: Hadoop-WordCount/wordcount.jar

adev의 jane으로 실행

[root@node01 Hadoop-WordCount]# su - hdfs
[hdfs@node01 Hadoop-WordCount]$ hadoop fs -mkdir /user/jane
[hdfs@node01 Hadoop-WordCount]$ hadoop fs -chown jane:adev /user/jane
[hdfs@node01 Hadoop-WordCount]$ exit
[root@node01 Hadoop-WordCount]# su jane
[jane@node01 Hadoop-WordCount]$ hadoop fs -put input/ /user/jane/input

Word Count jar 실행

[jane@node01 Hadoop-WordCount]$ hadoop jar /usr/hdp/2.5.3.0-37/hadoop/Hadoop-WordCount/wordcount.jar WordCount input output
17/01/19 02:28:04 INFO impl.TimelineClientImpl: Timeline service address: http://node02:8188/ws/v1/timeline/
17/01/19 02:28:04 INFO client.RMProxy: Connecting to ResourceManager at node02/172.31.1.255:8050
17/01/19 02:28:04 INFO client.AHSProxy: Connecting to Application History server at node02/172.31.1.255:10200
17/01/19 02:28:05 INFO input.FileInputFormat: Total input paths to process : 1
17/01/19 02:28:05 INFO mapreduce.JobSubmitter: number of splits:1
17/01/19 02:28:05 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1484790795688_0002
17/01/19 02:28:05 INFO impl.YarnClientImpl: Submitted application application_1484790795688_0002
17/01/19 02:28:05 INFO mapreduce.Job: The url to track the job: http://node02:8088/proxy/application_1484790795688_0002/
17/01/19 02:28:05 INFO mapreduce.Job: Running job: job_1484790795688_0002
17/01/19 02:28:16 INFO mapreduce.Job: Job job_1484790795688_0002 running in uber mode : false
17/01/19 02:28:16 INFO mapreduce.Job:  map 0% reduce 0%
17/01/19 02:28:29 INFO mapreduce.Job:  map 100% reduce 0%
17/01/19 02:28:35 INFO mapreduce.Job:  map 100% reduce 100%
17/01/19 02:28:36 INFO mapreduce.Job: Job job_1484790795688_0002 completed successfully
17/01/19 02:28:36 INFO mapreduce.Job: Counters: 49
    File System Counters
        FILE: Number of bytes read=167524
        FILE: Number of bytes written=616439
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=384328
        HDFS: Number of bytes written=120766
        HDFS: Number of read operations=6
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=2
    Job Counters 
        Launched map tasks=1
        Launched reduce tasks=1
        Data-local map tasks=1
        Total time spent by all maps in occupied slots (ms)=10159
        Total time spent by all reduces in occupied slots (ms)=8196
        Total time spent by all map tasks (ms)=10159
        Total time spent by all reduce tasks (ms)=4098
        Total vcore-milliseconds taken by all map tasks=10159
        Total vcore-milliseconds taken by all reduce tasks=4098
        Total megabyte-milliseconds taken by all map tasks=10402816
        Total megabyte-milliseconds taken by all reduce tasks=8392704
    Map-Reduce Framework
        Map input records=9488
        Map output records=67825
        Map output bytes=643386
        Map output materialized bytes=167524
        Input split bytes=121
        Combine input records=67825
        Combine output records=11900
        Reduce input groups=11900
        Reduce shuffle bytes=167524
        Reduce input records=11900
        Reduce output records=11900
        Spilled Records=23800
        Shuffled Maps =1
        Failed Shuffles=0
        Merged Map outputs=1
        GC time elapsed (ms)=144
        CPU time spent (ms)=2950
        Physical memory (bytes) snapshot=1022894080
        Virtual memory (bytes) snapshot=6457335808
        Total committed heap usage (bytes)=858783744
    Shuffle Errors
        BAD_ID=0
        CONNECTION=0
        IO_ERROR=0
        WRONG_LENGTH=0
        WRONG_MAP=0
        WRONG_REDUCE=0
    File Input Format Counters 
        Bytes Read=384207
    File Output Format Counters 
        Bytes Written=120766

WordCount 결과

[jane@node01 Hadoop-WordCount]$ hadoop fs -ls /user/jane/
Found 3 items
drwx------   - jane adev          0 2017-01-19 02:28 /user/jane/.staging
drwxr-xr-x   - jane adev          0 2017-01-19 02:17 /user/jane/input
drwxr-xr-x   - jane adev          0 2017-01-19 02:28 /user/jane/output

3. MR Job (HADOOP_USER_NAME parameter)

WordCount 실행

HDFS에 user를 위한 폴더를 생성 후 input을 업로드

Word Count jar 실행 (HADOOP_USER_NAME=lucy)

[root@node01 Hadoop-WordCount]# HADOOP_USER_NAME=lucy hadoop jar /usr/hdp/2.5.3.0-37/hadoop/Hadoop-WordCount/wordcount.jar WordCount input output
17/01/19 04:58:54 INFO impl.TimelineClientImpl: Timeline service address: http://node02:8188/ws/v1/timeline/
17/01/19 04:58:54 INFO client.RMProxy: Connecting to ResourceManager at node02/172.31.1.255:8050
17/01/19 04:58:54 INFO client.AHSProxy: Connecting to Application History server at node02/172.31.1.255:10200
17/01/19 04:58:55 INFO input.FileInputFormat: Total input paths to process : 1
17/01/19 04:58:55 INFO mapreduce.JobSubmitter: number of splits:1
17/01/19 04:58:56 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1484800564385_0001
17/01/19 04:58:56 INFO impl.YarnClientImpl: Submitted application application_1484800564385_0001
17/01/19 04:58:56 INFO mapreduce.Job: The url to track the job: http://node02:8088/proxy/application_1484800564385_0001/
17/01/19 04:58:56 INFO mapreduce.Job: Running job: job_1484800564385_0001
17/01/19 04:59:05 INFO mapreduce.Job: Job job_1484800564385_0001 running in uber mode : false
17/01/19 04:59:05 INFO mapreduce.Job:  map 0% reduce 0%
17/01/19 04:59:12 INFO mapreduce.Job:  map 100% reduce 0%
17/01/19 04:59:19 INFO mapreduce.Job:  map 100% reduce 100%
17/01/19 04:59:19 INFO mapreduce.Job: Job job_1484800564385_0001 completed successfully
17/01/19 04:59:19 INFO mapreduce.Job: Counters: 49
    File System Counters
        FILE: Number of bytes read=167524
        FILE: Number of bytes written=616439
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=384328
        HDFS: Number of bytes written=120766
        HDFS: Number of read operations=6
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=2
    Job Counters 
        Launched map tasks=1
        Launched reduce tasks=1
        Data-local map tasks=1
        Total time spent by all maps in occupied slots (ms)=5114
        Total time spent by all reduces in occupied slots (ms)=7530
        Total time spent by all map tasks (ms)=5114
        Total time spent by all reduce tasks (ms)=3765
        Total vcore-milliseconds taken by all map tasks=5114
        Total vcore-milliseconds taken by all reduce tasks=3765
        Total megabyte-milliseconds taken by all map tasks=5236736
        Total megabyte-milliseconds taken by all reduce tasks=7710720
    Map-Reduce Framework
        Map input records=9488
        Map output records=67825
        Map output bytes=643386
        Map output materialized bytes=167524
        Input split bytes=121
        Combine input records=67825
        Combine output records=11900
        Reduce input groups=11900
        Reduce shuffle bytes=167524
        Reduce input records=11900
        Reduce output records=11900
        Spilled Records=23800
        Shuffled Maps =1
        Failed Shuffles=0
        Merged Map outputs=1
        GC time elapsed (ms)=148
        CPU time spent (ms)=3220
        Physical memory (bytes) snapshot=1033814016
        Virtual memory (bytes) snapshot=6464356352
        Total committed heap usage (bytes)=833617920
    Shuffle Errors
        BAD_ID=0
        CONNECTION=0
        IO_ERROR=0
        WRONG_LENGTH=0
        WRONG_MAP=0
        WRONG_REDUCE=0
    File Input Format Counters 
        Bytes Read=384207
    File Output Format Counters 
        Bytes Written=120766

WordCount 결과

[lucy@node01 Hadoop-WordCount]$ hadoop fs -ls /user/lucy
Found 3 items
drwx------   - lucy adev          0 2017-01-19 04:59 /user/lucy/.staging
drwxr-xr-x   - lucy adev          0 2017-01-19 04:48 /user/lucy/input
drwxr-xr-x   - lucy adev          0 2017-01-19 04:59 /user/lucy/output

Spark에서 Hive ACID Table 접근시 오류 (0)	2017.10.12
Apache hive - transaction (0)	2017.09.26
Hadoop Security for Multi tenant #4 (0)	2017.04.03
Hadoop Security for Multi tenant #3 (0)	2017.03.24
Hadoop Securiy for Multi tenant #2 (0)	2017.01.20

Spark를 YARN에 수행할 때 메모리 세팅 추천 #1 (0)	2017.10.12
Apache hive - transaction (0)	2017.09.26
Hadoop Security for Multi tenant #4 (0)	2017.04.03
Hadoop Security for Multi tenant #3 (0)	2017.03.24
Hadoop Securiy for Multi tenant #2 (0)	2017.01.20

Spark를 YARN에 수행할 때 메모리 세팅 추천 #1 (0)	2017.10.12
Spark에서 Hive ACID Table 접근시 오류 (0)	2017.10.12
Hadoop Security for Multi tenant #4 (0)	2017.04.03
Hadoop Security for Multi tenant #3 (0)	2017.03.24
Hadoop Securiy for Multi tenant #2 (0)	2017.01.20

Spark에서 Hive ACID Table 접근시 오류 (0)	2017.10.12
Apache hive - transaction (0)	2017.09.26
Hadoop Security for Multi tenant #3 (0)	2017.03.24
Hadoop Securiy for Multi tenant #2 (0)	2017.01.20
Hadoop Securiy for Multi tenant #1 (0)	2017.01.17

Apache hive - transaction (0)	2017.09.26
Hadoop Security for Multi tenant #4 (0)	2017.04.03
Hadoop Securiy for Multi tenant #2 (0)	2017.01.20
Hadoop Securiy for Multi tenant #1 (0)	2017.01.17
Flume-Kafka-Elasticsearch 테스트 (0)	2016.03.14

BigData

<개요>

<내용>

A. 위의 그림을 참고로 해서 Spark와 YARN Configuration값을 살펴보면

B. 예제를 통해서 살펴보자

<기타>

<참고>

'BigData' 카테고리의 다른 글

<개요>

- Apache Kylo에서 Transformation 이나 Visual Query 수행시 Hive ACID Table에 접근할 경우 오류 발생

<내용>

- Hive ACID Table의 경우 ORC 포멧을 이용하여 BASE File과 Delta File의 형태로 처리한다.

(http://icthuman.tistory.com/entry/Apache-hive-transaction 참고)

- Spark에서 파일을 읽으려고 하는데 최초에는 Base File이 존재하지 않는다.

(테이블 생성후 insert를 하면 Delta File만 생성이 된다.)

<해결방안>

- 수동으로 Major Compaction을 수행하면 Base File이 만들어지기 때문에 정상적으로 Spark에서 처리가 가능하다.

<참고>

https://issues.apache.org/jira/browse/HIVE-15189

https://issues.apache.org/jira/browse/SPARK-16996

'BigData' 카테고리의 다른 글

<개요>

Apache Hive는 HDFS에 저장되어 있는 파일데이터를 SQL 기반으로 처리할 수 있도록 하는 오픈소스이다. (모든 SQL을 지원하는 것은 아니며, 파일시스템 특성상 UPDATE, DELETE는 권장하지 않는다. )

그러나 지속적으로 DataWareHouse 트랜잭션 처리에 대한 요구사항이 꾸준히 생겨서 Hive에서도 트랜잭션을 지원하기 위한 기능이 개발되었다.

이에 대해서 내부구조를 간략히 살펴본다. (원문해석 + 개인이해/경험추가 )

<ACID>

Atomicity(원자성)

Consistency(일관성)

Isolation(고립성)

Durability(지속성)

<Hive-ACID>

1. 제약사항

- BEGIN, COMMIT, ROLLBACK을 지원하지 않는다. 모두다 auto-commit이다

- ORC포멧만 지원한다.

- Bucket설정이 되어야 한다. 또한 External Table의 경우 compactor가 제어할 수 없기 때문에 ACID테이블로 만들 수 없다.

- non-ACID session에서는 ACID Table에 대한 읽기/쓰기를 할 수 없다.

- Dirty read, read committed, repeatable read, serializable의 isolation level은 지원하지 않는다.

2. 기본 설계

<Compactor>

3. Configuration

4. Table Properties

'BigData' 카테고리의 다른 글

Oozie 와의 연동

1. Ambari로 Oozie 설치 후 아래와 같이 설정

- custom oozie-site

- Advanced oozie-site

2. oozie관련 xml을 작성하여 지정된 위치에 업로드

(hdfs://node01:8020/user/john/oozie-sample.xml)

3. xml에서 사용할 property 설정

(/home/root/apps/config/example.properties)

4. oozie 실행

5. 참고

'BigData' 카테고리의 다른 글

Livy오픈소스를 활용한 Spark impersonation

1. Hadoop core-site.xml

2. Livy configuration

3. Rest API - Test

'BigData' 카테고리의 다른 글

0. YARN Queue

LDAP User Group별 Queue 생성

1. Hive

설치

Hive 실행 및 데이터 생성

LDAP 계정으로 Hive 실행

2. MR Job (OS계정)

WordCount 실행

WordCount 결과

3. MR Job (HADOOP_USER_NAME parameter)

WordCount 실행

WordCount 결과

'BigData' 카테고리의 다른 글

'BigData' 카테고리의 다른 글

1. Flume설치

2. kafka 설치

3. elasticsearch 설치

'BigData' 카테고리의 다른 글

티스토리툴바