今回も、VMware と Hadoop の話です。
vSphere Big Data Extensions(BDE) のこれまでのポスト
PowerCLIでCPUアフィニティ設定(ESXi 5.x)
今回は、Serengeti CLI の話です。
BDE には、serengeti という CLI が用意されていて、
BDE の管理サーバ(management-server VM)に SSH でアクセスして
「serengeti」 と実行すると、起動できます。
テキストだとこんな感じです。
[serengeti@192 ~]$ serengeti
=================================================
* _____ _ _ *
* / ____| ___ _ __ ___ _ __ __ _ ___| |_(_) *
* \____ \ / _ \ '__/ _ \ '_ \ / _` |/ _ \ __| | *
* ____) | __/ | | __/ | | | (_| | __/ |_| | *
* |_____/ \___|_| \___|_| |_|\__, |\___|\__|_| *
* |___/ *
* *
=================================================
Version: 2.0.0
Welcome to Serengeti CLI
serengeti>
BDE 管理サーバのサービス(localhost:8443)に接続することで
vSphere BDE 環境の Hadoop を確認したり、操作したりすることができます。
※ここのログインは、Web Client と同様に、vCenter SSO の認証を利用しています。
serengeti>connect --host localhost:8443
Enter the username: vmad\administrator ※vCenterSSOの認証でログイン
Enter the password: **********
Connected
たとえば、前回までに作成した Hadoop クラスタを表示してみます。
serengeti>cluster list --name hdp_cluster01
============================================================================
CLUSTER NAME : hdp_cluster01
AGENT VERSION : 2.0.0
DISTRO : apache
TOPOLOGY : HOST_AS_RACK
AUTO ELASTIC : N/A
MIN COMPUTE NODES NUM : N/A
MAX COMPUTE NODES NUM : N/A
IO SHARES : NORMAL
STATUS : RUNNING
GROUP NAME ROLES INSTANCE CPU MEM(MB) TYPE SIZE(GB)
------------------------------------------------------------------------------------------------
DataMaster [hadoop_namenode] 1 1 3748 SHARED 10
ComputeMaster [hadoop_jobtracker] 1 1 3748 SHARED 10
Worker [hadoop_datanode, hadoop_tasktracker] 2 1 3748 SHARED 20
Client [hadoop_client, pig, hive, hive_server] 1 1 3748 SHARED 20
============================================================================
実際にターミナルで表示する場合はこのようになります。
このクラスタは、BDE の Web Client プラグインから下記のように見えていたものです。
BDE にリソースとして登録した、データストアや、ポートグループなどを
表示することもできます。
serengeti>datastore list
NAME TYPE REG EX
--------------------------------
defaultDSLocal LOCAL FALSE
ds_nfs_hadoop_01 SHARED FALSE
serengeti>network list
NAME PORTGROUP TYPE IP_RANGES DNS1 DNS2 GATEWAY MASK
--------------------------------------------------------------------------
defaultNetwork pg-vlan-0005 dhcp
bde-vlan-0005 dvpg-vlan-0005 dhcp
クラスタに接続して操作することもできます。
ためしに、BDE で構築した hdp_cluster01 というクラスタの HDFS 領域を参照してみます。
serengeti>cluster target --name hdp_cluster01
serengeti>cfg info
Hadoop [1.2.1 rev.1503152][fs=hdfs://192.168.5.128:8020][jt=192.168.5.125:8021]
serengeti>fs ls /tmp
Found 3 items
drwxrwxrwx - hdfs hadoop 0 2014-09-17 23:14 /tmp/hadoop-mapred
drwxrwxrwx - hdfs hadoop 0 2014-09-17 23:13 /tmp/hadoop-yarn
drwxrwxrwx - root hadoop 0 2014-09-21 23:38 /tmp/test
serengeti>fs ls /tmp/test
Found 2 items
drwxrwxrwx - root hadoop 0 2014-09-21 23:37 /tmp/test/input
drwxrwxrwx - root hadoop 0 2014-09-21 23:39 /tmp/test/output
serengeti CLI のヘルプは、下記のような感じです。
serengeti>help
* ! - Allows execution of operating system (OS) commands
* // - Inline comment markers (start of line only)
* ; - Inline comment markers (start of line only)
* cfg fs - Sets the Hadoop namenode - can be 'local' or <namenode:port>
* cfg info - Returns basic info about the Hadoop configuration
* cfg jt - Sets the Hadoop job tracker - can be 'local' or <jobtracker:port>
* cfg load - Loads the Hadoop configuration from the given resource
* cfg props get - Returns the value of the given Hadoop property
* cfg props list - Returns (all) the Hadoop properties
* cfg props set - Sets the value for the given Hadoop property - <name=value>
* clear - Clears the console
* cls - Clears the console
* cluster config - Config an existing cluster
* cluster create - Create a hadoop cluster
* cluster delete - Delete a cluster
* cluster export - Export cluster specification
* cluster fix - Fix a cluster failure
* cluster list - Get cluster information
* cluster resetParam - reset cluster parameters
* cluster resize - Resize a cluster
* cluster setParam - set cluster parameters
* cluster start - Start a cluster
* cluster stop - Stop a cluster
* cluster target - Set or query target cluster to run commands
* cluster upgrade - Upgrade an old cluster
* connect - Connect a serengeti server
* datastore add - Add new datastore(s)
* datastore delete - Delete an unused datastore
* datastore list - Display datastore list.
* date - Displays the local date and time
* disconnect - Disconnect a serengeti server
* distro list - Get distro information
* exit - Exits the shell
* fs cat - Copy source paths to stdout
* fs chgrp - Change group association of files
* fs chmod - Change the permissions of files
* fs chown - Change the owner of files
* fs copyFromLocal - Copy single src, or multiple srcs from local file system to the destination file system. Same as put
* fs copyMergeToLocal - Takes a source directory and a destination file as input and concatenates files in src into the destination local file
* fs copyToLocal - Copy files to the local file system. Same as get
* fs count - Count the number of directories, files, bytes, quota, and remaining quota
* fs cp - Copy files from source to destination. This command allows multiple sources as well in which case the destination must be a directory
* fs du - Displays sizes of files and directories contained in the given directory or the length of a file in case its just a file
* fs expunge - Empty the trash
* fs get - Copy files to the local file system
* fs ls - List files in the directory
* fs mkdir - Create a new directory
* fs moveFromLocal - Similar to put command, except that the source localsrc is deleted after it's copied
* fs mv - Move source files to destination in the HDFS
* fs put - Copy single src, or multiple srcs from local file system to the destination file system
* fs rm - Remove files in the HDFS
* fs setrep - Change the replication factor of a file
* fs tail - Display last kilobyte of the file to stdout
* fs text - Take a source file and output the file in text format
* fs touchz - Create a file of zero length
* help - List all commands usage
* hive cfg - Configures Hive
* hive script - Executes a Hive script
* loggedConnect - Connect a serengeti server with username/password as options and get logged into cli history
* mr jar - Run Map Reduce job in the jar
* mr job counter - Print the counter value of the MR job
* mr job events - Print the events' detail received by jobtracker for the given range
* mr job history - Print job details, failed and killed job details
* mr job kill - Kill the Map Reduce job
* mr job list - List the Map Reduce jobs
* mr job set priority - Change the priority of the job
* mr job status - Query Map Reduce job status.
* mr job submit - Submit a Map Reduce job defined in the job file
* mr task fail - Fail the Map Reduce task
* mr task kill - Kill the Map Reduce task
* network add - Add a network to Serengeti
* network delete - Delete a network from Serengeti by name
* network list - Get network information from Serengeti
* network modify - Modify a network from Serengeti by name
* pig cfg - Configures Pig
* pig script - Executes a Pig script
* quit - Exits the shell
* resourcepool add - Add a new resource pool
* resourcepool delete - Delete an unused resource pool
* resourcepool list - Get resource pool information
* script - Parses the specified resource file and executes its commands
* system properties - Shows the shell's properties
* topology list - List a rack-->hosts mapping topology
* topology upload - Upload a rack-->hosts mapping topology file
* version - Displays shell version
Web Client よりも細かい設定をする場合は、
この serengeti CLI を使用します。
マニュアルもあります。英語ですが・・・
VMware vSphere Big Data Extensions
Command-Line Interface Guide
vSphere Big Data Extensions 2.0
以上、BDE のコマンドラインツールの話でした。