-
Notifications
You must be signed in to change notification settings - Fork 15
Mudrod Log Ingestion
Mudrod log ingestion is a parallel process which is optimized to run on a distributed processing engine such as Apache Spark. This document explains exactly how to do that. Please note that you should have already follow the Mudrod installation instructions prior to following this page.
The primary mechanism for executing a log ingestion workflow is via the mudrod-engine utility tool. This tool is available in core/target/appassembler/bin
after completing the installation instructions. If one were to execute the log ingestion task incorrectly, Mudrod will print the following help for you.
usage: MudrodEngine: 'logDir' argument is mandatory. User must also
provide an ingest method. [-a] [-esHost <host_name>] [-esPort
<port_num>] [-esTCPPort <port_num>] [-f] [-h] [-l] -logDir
</path/to/log/directory> [-p] [-s] [-v]
-a,--addSimFromMetadataAndOnto begin adding
metadata and
ontology results
-esHost,--elasticSearchHost <host_name> elasticsearch
cluster unicast
host
-esPort,--elasticSearchHTTPPort <port_num> elasticsearch
HTTP/REST port
-esTCPPort,--elasticSearchTransportTCPPort <port_num> elasticsearch
transport TCP
port
-f,--fullIngest begin full ingest
Mudrod workflow
-h,--help show this help
message
-l,--logIngest begin log ingest
without any
processing only
-logDir,--logDirectory </path/to/log/directory> the log directory
to be processed
by Mudrod
-p,--processingWithPreResults begin processing
with
preprocessing
results
-s,--sessionReconstruction begin session
reconstruction
-v,--vocabSimFromLog begin similarity
calulation from
web log Mudrod
workflow
A correct submission therefore looks as follows
$ core/target/appassembler/bin/mudrod-engine -f -logDir /path/to/log/directory/
There is a set of testing data available for download that can be used to ingest some sample data to a locally running instance of mudrod. There are two sample data sets available.
Testing_Data_1_3dayLog+Meta+Onto
is made for testing log ingestion function. Run it with the "-l" option. To use this Sample Data:
- Download Testing_Data_1_3dayLog+Meta+Onto.zip
- Unzip to
/path/to/Testing_Data_1_3dayLog+Meta+Onto
- Run
core/target/appassembler/bin/mudrod-engine -l -logDir/path/to/Testing_Data_1_3dayLog+Meta+Onto
Testing_Data_2_ProcessedLog.Meta.Onto
is made for testing the data mining functions including web services. Run it with the "-p" option. To use this Sample Data:
- Download Testing_Data_2_ProcessedLog.Meta.Onto.zip
- Unzip to
/path/to/Testing_Data_2_ProcessedLog.Meta.Onto
- Run
core/target/appassembler/bin/mudrod-engine -p -logDir/path/to/Testing_Data_2_ProcessedLog.Meta.Onto