This repository was archived by the owner on Dec 15, 2021. It is now read-only.

Description
We're facing multiple issues while using ELK stack. We suspect they're Logstash Configuration issues. Issues are as follows:
-
Logstash connected to Dynamodb streams isn't showing real-time changes. We even have an explicit perform_stream=>true in our Logstash configuration. Note: We do get the latest data if we restart the logstash (which is running in a docker container). Could this be cross-region issue? Dynamodb (in us-east-1) while Logstash & Elasticsearch (in us-west-1)?
-
Upon restarting Logstash the entire Dynamodb table data is presumably duplicated in ElasticSearch. Dynamodb has around 70K+ Item Count while ElasticSearch has more than double Searchable Documents. Could it be because we have perform_stream=>true config?
-
Intermittently the latest data can be seen but it is sandwiched between older records; some kind of random data fetch order. Could it be due to multiple workers trying to log at the same time?
-
We need the json message contents from Dynamodb as is. However, we noticed that when we run Logstash the output shows the data in "Stream Records". When we use log_format=>"json_binary_as_text", we can see the json message as we require. Is this sufficient?
Following is our Logstash Configuration:
input {
dynamodb {
endpoint => "dynamodb.us-east-1.amazonaws.com"
streams_endpoint => "streams.dynamodb.us-east-1.amazonaws.com"
view_type => "new_image"
perform_scan => true
perform_stream => true
publish_metrics => true
table_name => "here-we-have-dynamodb-table-name"
log_format => "json_binary_as_text"
}
}
output {
elasticsearch {
hosts => "here-we-have-our-elasticsearch-endpoint-which-is-in-us-west-1"
}
}
NOTE: There are no errors in the logs (docker logs --follow container-name).
Any help on these issues is really appreciated.