Monday, May 25, 2015

Logs useful for HAWQ troubleshooting

This article will assist you to locate the logs which are required while troubleshooting a HAWQ related issues. 

Let's quickly list out the different logs available which are very helpful for troubleshooting. There are primarily 5 logs which can provide us insights to the problem:

1. HAWQ Master Logs - Entry to the HAWQ database users from client perspective is via the HAWQ master. All the activities related to HAWQ access, query processing or related operations is logged in HAWQ master logs.

2. HAWQ Segment Logs - HAWQ Master dispatches query execution task to its segments. Each segment is a database in itself and logs related to any operation on the segment database are tracked under HAWQ segment logs.
There is a correlation between HAWQ master logs and HAWQ segment logs. If a query is triggered on the HAWQ master, and if logging is enabled at segment, master and segment will store information for query processing. Often, there are events which are specific to the segment executing the query and logs for it will be written only in HAWQ segment logs but can be easily correlated using a unique identified "session id".
Every session to the HAWQ database is assigned a unique session id, thus any query run in that session will use the session id as it's identifier and master and segment logs can be correlated using it. Ex: con9866 can be used to identify related activity on the segments.
2014-07-15 09:40:22.899924 PDT,"gpadmin","gpadmin",p91220,th699058016,"[local]",,2014-07-15 09:38:35 PDT,33851,con9866,cmd16,seg-1,,,x33851,sx1,"LOG","00000","2014-07-15 09:40:22:898757 PDT,THD000,TRACE,""CAutoTrace.cpp:53: [OPT]: Search terminated at stage 1/1"",",,,,,,"select gp.hostname, pe.fselocation from pg_filespace_entry pe, gp_segment_configuration gp where pe.fsedbid = gp.dbid and fselocation !~ 'hdfs://';",0,,"COptTasks.cpp",463,

3. Namenode Logs - HAWQ is a MPP database engine which works on hadoop, thus namenode is a critical entity in HAWQ ecosystem. When there is a request to execute a HDFS operation, HAWQ segments connect to the namenode to retrieve the location of the files on HDFS and use the retrieved information for further processing.

4. Datanode Logs - Datanode are the actual daemons which store HAWQ user tables data. If there is a failure during an HDFS operation, HAWQ queries will fail, thus datanodes logs helps identifying critical information during query execution failure.

5. HAWQ Administration logs - Any maintenance operation performed using HAWQ management utilities including on HAWQ database is tracked under these logs. Ex: start, stop, etc.

How to locate the logs:
HAWQ master logs.
1. Go the HAWQ master. (ssh <hawq_master>)
2. Verify if HAWQ master process is running.
[gpadmin@hawq-master ~]$ ps -ef | egrep silent | egrep master
gpadmin 480392 1 0 Jul04 ? 00:00:05 /usr/local/hawq-1.2.0.1/bin/postgres -D /data1/master/gpseg-1 -p 5432 -b 1 -z 6 --silent-mode=true -i -M master -C -1 -x 8 -E
3. The value for option -D in above example is "/data1/master/gpseg-1" which is known as the master data directory for HAWQ. Often the term, MASTER_DATA_DIRECTORY is used to reference it.
4. Based on the above identified MASTER_DATA_DIRECTORY, HAWQ Master Logs can be located under:
 "/data1/master/gpseg-1/pg_log
Or you can login to the database and run the below command:
[gpadmin@hdw3 pg_log]$ psql
gpadmin=# show data_directory ;
 data_directory
-----------------------
 /data1/master/gpseg-1
(1 row)
5. Based on the date of the issue identified, HAWQ log files can be easily located. HAWQ logs files have a format gpdb-YYYY-MM-DD_<timestamp>.log which easily identifies the log file dates.  

HAWQ segment logs:
1. Login to HAWQ database (psql <databasename>). Catalog files to locate HAWQ segment directories are common to all the database, so you can login to any database.
2. Below sql can be used to identify the hostname, status of the segment, content id (an identifier for the segment) and location
gpadmin=# select gp.hostname,gp.status,gp.content, pe.fselocation from pg_filespace_entry pe, gp_segment_configuration gp where pe.fsedbid = gp.dbid and fselocation !~ 'dfs';
 hostname | status | content | fselocation
--------------------+--------+---------+-----------------------
hdw3.phd.local | u | -1 | /data1/master/gpseg-1
hdw1.phd.local | u | 2 | /data1/primary/gpseg2
pccadmin.phd.local | u | -1 | /data1/master/gpseg-1
Translating above into a statement:
Log for content id 2 are located on host hdw1.phd.local under /data1/primary/gpseg2/pg_log directory, currently the segment is marked up.
Note: In the above example, content id = -1 is repeated 2 times, they both represent HAWQ master (active and standby). But, you can only login to the database using active master.

Namenode
You can use PCC UI -> Topology tab to identify the namenode and datanode hostname. On namenode and datanode, under /etc/default you can find respective file names which holds parameters to specify log location for a PHD cluster. If your cluster is secured, use the location specified by HADOOP_SECURE_DN_LOG_DIR for datanodes.
[gpadmin@hdm1 ~]$  egrep LOG /etc/default/hadoop-hdfs-namenode
export HADOOP_LOG_DIR=/var/log/gphd/hadoop-hdfs


Datanode
[gpadmin@hdw3 pg_log]$ egrep LOG /etc/default/hadoop-hdfs-datanode
export HADOOP_LOG_DIR=/var/log/gphd/hadoop-hdfs
# export HADOOP_SECURE_DN_LOG_DIR=$HADOOP_LOG_DIR/hdfs

Note: You can also run ps command to grep for hadoop namenode/datanode daemon process which can also show the location of the logs.

HAWQ administration logs
These logs are available under on the HAWQ master and segment directories and the name of the log depends on the type of operation performed.
/home/gpadmin/gpAdminLogs
Log ex:
gpstart_YYYYMMDD.log
gpstop_YYYYMMDD.log

No comments:

Post a Comment