Hadoop Securiy for Multi tenant #2

0. YARN Queue

LDAP User Group별 Queue 생성

Queue Mappings : g:abiz:abiz,g:adev:adev

설정이 적용되도록 restart

1. Hive

설치

ambari 활용

Hive 실행 및 데이터 생성

sudo su - hive
[hive@node01 ~]$ hive
hive> create table table1(a int, b int);
hive> insert into table1 values( 1,2);
hive> insert into table1 values( 1,3);
hive> insert into table1 values( 2,4);

LDAP 계정으로 Hive 실행

[hive@node01 ~]$ beeline
Beeline version 1.2.1000.2.5.3.0-37 by Apache Hive
beeline> !connect jdbc:hive2://node02:10000/default john
Enter password for jdbc:hive2://node02:10000/default: **** (hive)
Connected to: Apache Hive (version 1.2.1000.2.5.3.0-37)
Driver: Hive JDBC (version 1.2.1000.2.5.3.0-37)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://node02:10000/default> select sum(a) from table;
INFO : Tez session hasn't been created yet. Opening session
INFO : Dag name: select sum(a) from table(Stage-1)
INFO :
INFO : Status: Running (Executing on YARN cluster with App id application_1484727707431_0001)
INFO : Map 1: -/- Reducer 2: 0/1 
INFO : Map 1: 0/1 Reducer 2: 0/1 
INFO : Map 1: 0/1 Reducer 2: 0/1 
INFO : Map 1: 0(+1)/1 Reducer 2: 0/1 
INFO : Map 1: 0/1 Reducer 2: 0/1 
INFO : Map 1: 1/1 Reducer 2: 0(+1)/1 
INFO : Map 1: 1/1 Reducer 2: 1/1 
+------+--+
| _c0 |
+------+--+
| 4 |
+------+--+
1 row selected (23.803 seconds)

2. MR Job (OS계정)

WordCount 실행

hadoop home (/usr/hdp/2.5.3.0-37/hadoop)에 Word Count Example 다운로드 후 압축풀기

[root@node01 hadoop]# wget http://salsahpc.indiana.edu/tutorial/source_code/Hadoop-WordCount.zip
[root@node01 hadoop]# unzip Hadoop-WordCount.zip
Archive:  Hadoop-WordCount.zip
   creating: Hadoop-WordCount/
   creating: Hadoop-WordCount/classes/
   creating: Hadoop-WordCount/input/
  inflating: Hadoop-WordCount/input/Word_Count_input.txt  
  inflating: Hadoop-WordCount/WordCount.java  
  inflating: Hadoop-WordCount/clean.sh  
  inflating: Hadoop-WordCount/build.sh  
  inflating: Hadoop-WordCount/classes/WordCount$Reduce.class  
  inflating: Hadoop-WordCount/classes/WordCount.class  
  inflating: Hadoop-WordCount/classes/WordCount$Map.class  
  inflating: Hadoop-WordCount/wordcount.jar

adev의 jane으로 실행

[root@node01 Hadoop-WordCount]# su - hdfs
[hdfs@node01 Hadoop-WordCount]$ hadoop fs -mkdir /user/jane
[hdfs@node01 Hadoop-WordCount]$ hadoop fs -chown jane:adev /user/jane
[hdfs@node01 Hadoop-WordCount]$ exit
[root@node01 Hadoop-WordCount]# su jane
[jane@node01 Hadoop-WordCount]$ hadoop fs -put input/ /user/jane/input

Word Count jar 실행

[jane@node01 Hadoop-WordCount]$ hadoop jar /usr/hdp/2.5.3.0-37/hadoop/Hadoop-WordCount/wordcount.jar WordCount input output
17/01/19 02:28:04 INFO impl.TimelineClientImpl: Timeline service address: http://node02:8188/ws/v1/timeline/
17/01/19 02:28:04 INFO client.RMProxy: Connecting to ResourceManager at node02/172.31.1.255:8050
17/01/19 02:28:04 INFO client.AHSProxy: Connecting to Application History server at node02/172.31.1.255:10200
17/01/19 02:28:05 INFO input.FileInputFormat: Total input paths to process : 1
17/01/19 02:28:05 INFO mapreduce.JobSubmitter: number of splits:1
17/01/19 02:28:05 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1484790795688_0002
17/01/19 02:28:05 INFO impl.YarnClientImpl: Submitted application application_1484790795688_0002
17/01/19 02:28:05 INFO mapreduce.Job: The url to track the job: http://node02:8088/proxy/application_1484790795688_0002/
17/01/19 02:28:05 INFO mapreduce.Job: Running job: job_1484790795688_0002
17/01/19 02:28:16 INFO mapreduce.Job: Job job_1484790795688_0002 running in uber mode : false
17/01/19 02:28:16 INFO mapreduce.Job:  map 0% reduce 0%
17/01/19 02:28:29 INFO mapreduce.Job:  map 100% reduce 0%
17/01/19 02:28:35 INFO mapreduce.Job:  map 100% reduce 100%
17/01/19 02:28:36 INFO mapreduce.Job: Job job_1484790795688_0002 completed successfully
17/01/19 02:28:36 INFO mapreduce.Job: Counters: 49
    File System Counters
        FILE: Number of bytes read=167524
        FILE: Number of bytes written=616439
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=384328
        HDFS: Number of bytes written=120766
        HDFS: Number of read operations=6
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=2
    Job Counters 
        Launched map tasks=1
        Launched reduce tasks=1
        Data-local map tasks=1
        Total time spent by all maps in occupied slots (ms)=10159
        Total time spent by all reduces in occupied slots (ms)=8196
        Total time spent by all map tasks (ms)=10159
        Total time spent by all reduce tasks (ms)=4098
        Total vcore-milliseconds taken by all map tasks=10159
        Total vcore-milliseconds taken by all reduce tasks=4098
        Total megabyte-milliseconds taken by all map tasks=10402816
        Total megabyte-milliseconds taken by all reduce tasks=8392704
    Map-Reduce Framework
        Map input records=9488
        Map output records=67825
        Map output bytes=643386
        Map output materialized bytes=167524
        Input split bytes=121
        Combine input records=67825
        Combine output records=11900
        Reduce input groups=11900
        Reduce shuffle bytes=167524
        Reduce input records=11900
        Reduce output records=11900
        Spilled Records=23800
        Shuffled Maps =1
        Failed Shuffles=0
        Merged Map outputs=1
        GC time elapsed (ms)=144
        CPU time spent (ms)=2950
        Physical memory (bytes) snapshot=1022894080
        Virtual memory (bytes) snapshot=6457335808
        Total committed heap usage (bytes)=858783744
    Shuffle Errors
        BAD_ID=0
        CONNECTION=0
        IO_ERROR=0
        WRONG_LENGTH=0
        WRONG_MAP=0
        WRONG_REDUCE=0
    File Input Format Counters 
        Bytes Read=384207
    File Output Format Counters 
        Bytes Written=120766

WordCount 결과

[jane@node01 Hadoop-WordCount]$ hadoop fs -ls /user/jane/
Found 3 items
drwx------   - jane adev          0 2017-01-19 02:28 /user/jane/.staging
drwxr-xr-x   - jane adev          0 2017-01-19 02:17 /user/jane/input
drwxr-xr-x   - jane adev          0 2017-01-19 02:28 /user/jane/output

3. MR Job (HADOOP_USER_NAME parameter)

WordCount 실행

HDFS에 user를 위한 폴더를 생성 후 input을 업로드

Word Count jar 실행 (HADOOP_USER_NAME=lucy)

[root@node01 Hadoop-WordCount]# HADOOP_USER_NAME=lucy hadoop jar /usr/hdp/2.5.3.0-37/hadoop/Hadoop-WordCount/wordcount.jar WordCount input output
17/01/19 04:58:54 INFO impl.TimelineClientImpl: Timeline service address: http://node02:8188/ws/v1/timeline/
17/01/19 04:58:54 INFO client.RMProxy: Connecting to ResourceManager at node02/172.31.1.255:8050
17/01/19 04:58:54 INFO client.AHSProxy: Connecting to Application History server at node02/172.31.1.255:10200
17/01/19 04:58:55 INFO input.FileInputFormat: Total input paths to process : 1
17/01/19 04:58:55 INFO mapreduce.JobSubmitter: number of splits:1
17/01/19 04:58:56 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1484800564385_0001
17/01/19 04:58:56 INFO impl.YarnClientImpl: Submitted application application_1484800564385_0001
17/01/19 04:58:56 INFO mapreduce.Job: The url to track the job: http://node02:8088/proxy/application_1484800564385_0001/
17/01/19 04:58:56 INFO mapreduce.Job: Running job: job_1484800564385_0001
17/01/19 04:59:05 INFO mapreduce.Job: Job job_1484800564385_0001 running in uber mode : false
17/01/19 04:59:05 INFO mapreduce.Job:  map 0% reduce 0%
17/01/19 04:59:12 INFO mapreduce.Job:  map 100% reduce 0%
17/01/19 04:59:19 INFO mapreduce.Job:  map 100% reduce 100%
17/01/19 04:59:19 INFO mapreduce.Job: Job job_1484800564385_0001 completed successfully
17/01/19 04:59:19 INFO mapreduce.Job: Counters: 49
    File System Counters
        FILE: Number of bytes read=167524
        FILE: Number of bytes written=616439
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=384328
        HDFS: Number of bytes written=120766
        HDFS: Number of read operations=6
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=2
    Job Counters 
        Launched map tasks=1
        Launched reduce tasks=1
        Data-local map tasks=1
        Total time spent by all maps in occupied slots (ms)=5114
        Total time spent by all reduces in occupied slots (ms)=7530
        Total time spent by all map tasks (ms)=5114
        Total time spent by all reduce tasks (ms)=3765
        Total vcore-milliseconds taken by all map tasks=5114
        Total vcore-milliseconds taken by all reduce tasks=3765
        Total megabyte-milliseconds taken by all map tasks=5236736
        Total megabyte-milliseconds taken by all reduce tasks=7710720
    Map-Reduce Framework
        Map input records=9488
        Map output records=67825
        Map output bytes=643386
        Map output materialized bytes=167524
        Input split bytes=121
        Combine input records=67825
        Combine output records=11900
        Reduce input groups=11900
        Reduce shuffle bytes=167524
        Reduce input records=11900
        Reduce output records=11900
        Spilled Records=23800
        Shuffled Maps =1
        Failed Shuffles=0
        Merged Map outputs=1
        GC time elapsed (ms)=148
        CPU time spent (ms)=3220
        Physical memory (bytes) snapshot=1033814016
        Virtual memory (bytes) snapshot=6464356352
        Total committed heap usage (bytes)=833617920
    Shuffle Errors
        BAD_ID=0
        CONNECTION=0
        IO_ERROR=0
        WRONG_LENGTH=0
        WRONG_MAP=0
        WRONG_REDUCE=0
    File Input Format Counters 
        Bytes Read=384207
    File Output Format Counters 
        Bytes Written=120766

WordCount 결과

[lucy@node01 Hadoop-WordCount]$ hadoop fs -ls /user/lucy
Found 3 items
drwx------   - lucy adev          0 2017-01-19 04:59 /user/lucy/.staging
drwxr-xr-x   - lucy adev          0 2017-01-19 04:48 /user/lucy/input
drwxr-xr-x   - lucy adev          0 2017-01-19 04:59 /user/lucy/output