Skip to content

Mudrod Log Ingestion

fgreg edited this page Mar 29, 2017 · 5 revisions

Introduction

Mudrod log ingestion is a parallel process which is optimized to run on a distributed processing engine such as Apache Spark. This document explains exactly how to do that. Please note that you should have already follow the Mudrod installation instructions prior to following this page.

The Anatomy of the Mudrod Log Ingestion Process

Executing Log Ingestion

The primary mechanism for executing a log ingestion workflow is via the mudrod-engine utility tool. This tool is available in core/target/appassembler/bin after completing the installation instructions. If one were to execute the log ingestion task incorrectly, Mudrod will print the following help for you.

usage: MudrodEngine: 'logDir' argument is mandatory. User must also
       provide an ingest method. [-a] [-esHost <host_name>] [-esPort
       <port_num>] [-esTCPPort <port_num>] [-f] [-h] [-l] -logDir
       </path/to/log/directory> [-p] [-s] [-v]
 -a,--addSimFromMetadataAndOnto                          begin adding
                                                         metadata and
                                                         ontology results
 -esHost,--elasticSearchHost <host_name>                 elasticsearch
                                                         cluster unicast
                                                         host
 -esPort,--elasticSearchHTTPPort <port_num>              elasticsearch
                                                         HTTP/REST port
 -esTCPPort,--elasticSearchTransportTCPPort <port_num>   elasticsearch
                                                         transport TCP
                                                         port
 -f,--fullIngest                                         begin full ingest
                                                         Mudrod workflow
 -h,--help                                               show this help
                                                         message
 -l,--logIngest                                          begin log ingest
                                                         without any
                                                         processing only
 -logDir,--logDirectory </path/to/log/directory>         the log directory
                                                         to be processed
                                                         by Mudrod
 -p,--processingWithPreResults                           begin processing
                                                         with
                                                         preprocessing
                                                         results
 -s,--sessionReconstruction                              begin session
                                                         reconstruction
 -v,--vocabSimFromLog                                    begin similarity
                                                         calulation from
                                                         web log Mudrod
                                                         workflow

A correct submission therefore looks as follows

$ core/target/appassembler/bin/mudrod-engine -f -logDir /path/to/log/directory/

Sample Data Ingestion

There is a set of testing data available for download that can be used to ingest some sample data to a locally running instance of mudrod. There are two sample data sets available.

Sample Data 1

Testing_Data_1_3dayLog+Meta+Onto is made for testing log ingestion function. Run it with the "-l" option. To use this Sample Data:

  1. Download Testing_Data_1_3dayLog+Meta+Onto.zip
  2. Unzip to /path/to/Testing_Data_1_3dayLog+Meta+Onto
  3. Run core/target/appassembler/bin/mudrod-engine -l -logDir/path/to/Testing_Data_1_3dayLog+Meta+Onto

Sample Data 2

Testing_Data_2_ProcessedLog.Meta.Onto is made for testing the data mining functions including web services. Run it with the "-p" option. To use this Sample Data:

  1. Download Testing_Data_2_ProcessedLog.Meta.Onto.zip
  2. Unzip to /path/to/Testing_Data_2_ProcessedLog.Meta.Onto
  3. Run core/target/appassembler/bin/mudrod-engine -p -logDir/path/to/Testing_Data_2_ProcessedLog.Meta.Onto
Clone this wiki locally