Skip to content

Conversation

@sriram-atlan
Copy link

@sriram-atlan sriram-atlan commented Oct 26, 2025

Change description

MLH-1477 add dlq replay service and api endpoint to monitor status

Type of change

  • Bug fix (fixes an issue)
  • New feature (adds functionality)

Related issues

https://atlanhq.atlassian.net/browse/MLH-1477

Fix #1

Helm Config Changes for Running Tests (Staging PR)

Does this PR require Helm config changes for testing?

  • Tests are NOT required for this commit. (You can proceed with the PR.) ✅
  • No, Helm config changes are not needed. (You can proceed with the PR.) ✅
  • Yes, I have already updated the config-values on enpla9up36. (You can proceed with the PR.) ✅
  • Yes, but I have NOT updated the config-values. (Please update them before proceeding; or, tests will run with default values.)⚠️

Checklists

Development

  • Lint rules pass locally
  • Application changes have been tested thoroughly
  • Automated tests covering modified code pass

Security

  • Security impact of change has been considered
  • Code follows company security practices and guidelines

Code review

  • Pull request has a descriptive title and context useful to a reviewer. Screenshots or screencasts are attached as necessary
  • "Ready for review" label attached and reviewers assigned
  • Changes have been reviewed by at least one other contributor
  • Pull request linked to task tracker where applicable

Note

Adds a Kafka-based DLQ replay to Elasticsearch with a status API, accompanying tests, and minor build/CI/doc updates.

  • Backend (webapp):
    • DLQ replay service: Introduces DLQReplayService to consume Kafka DLQ (atlas.kafka.dlq.*), replay ES mutations with pause/resume, retries, exponential backoff, metrics, and health tracking.
    • API endpoint: Adds DLQAdminController exposing GET /dlq/replay/status returning replay status.
  • Tests:
    • Add DLQReplayServiceTest and DLQReplayServiceIntegrationTest covering replay success/failure, backoff, retry limits, tracker cleanup, metrics, health, and transaction cleanup.
  • Build/Dependencies:
    • Update Mockito in webapp/pom.xml to mockito-core and add mockito-junit-jupiter (Java 17 compatible).
    • Bump janusgraph.version in root pom.xml to 1.0.2-atlan (remove -SNAPSHOT).
  • CI:
    • Add branch trigger mlh-1477-staging-dlq to .github/workflows/maven.yml.
  • Docs:
    • LOCAL_SETUP.md: add repo/artifact download instructions and note on Colima resource tuning.

Written by Cursor Bugbot for commit b10576d. This will update automatically on new commits. Configure here.

cursor[bot]

This comment was marked as outdated.

cursor[bot]

This comment was marked as outdated.

cursor[bot]

This comment was marked as outdated.

cursor[bot]

This comment was marked as outdated.

cursor[bot]

This comment was marked as outdated.

@sriram-atlan sriram-atlan merged commit f1bd308 into staging Nov 11, 2025
27 of 28 checks passed
sriram-atlan added a commit that referenced this pull request Nov 14, 2025
…5566)

* MLH-1477 dummy commit

* MLH-1477 add branch to build

* MLH-1477 update to trigger a build

* MLH-1477 dummy commit

* MLH-1477 add dlq replay service and api endpoint to monitor status

* MLH-1477 dummy commit

* MLH-1477 wire in only elastic configuration

* MLH-1477 lazily connect to elasticsearch index

* MLH-1477 load indexProvider like janusgraph

* MLH-1477 initialise bootstrap servers

* MLH-1477 increase timeout

* MLH-1477 add more logs for debugging. update serializer

* MLH-1477 reduce poll time

* MLH-1477 use pause and resume

* MLH-1477 improve DLQ handling

* MLH-1477 break on errors

* MLH-1477 seek back when error

* MLH-1477 retry with exponential backoff

* MLH-1477 add tests

* MLH-1477 optimise imports

* MLH-1477 remove comment on latest

* MLH-1477 remove option to start the dlq manually

* MLH-1477 change to non daemon thread and improve destroy and cleanup
sriram-atlan added a commit that referenced this pull request Nov 24, 2025
…stores (#5660)

* MLH-1477 add dlq replay service and api endpoint to monitor status (#5566)

* MLH-1477 dummy commit

* MLH-1477 add branch to build

* MLH-1477 update to trigger a build

* MLH-1477 dummy commit

* MLH-1477 add dlq replay service and api endpoint to monitor status

* MLH-1477 dummy commit

* MLH-1477 wire in only elastic configuration

* MLH-1477 lazily connect to elasticsearch index

* MLH-1477 load indexProvider like janusgraph

* MLH-1477 initialise bootstrap servers

* MLH-1477 increase timeout

* MLH-1477 add more logs for debugging. update serializer

* MLH-1477 reduce poll time

* MLH-1477 use pause and resume

* MLH-1477 improve DLQ handling

* MLH-1477 break on errors

* MLH-1477 seek back when error

* MLH-1477 retry with exponential backoff

* MLH-1477 add tests

* MLH-1477 optimise imports

* MLH-1477 remove comment on latest

* MLH-1477 remove option to start the dlq manually

* MLH-1477 change to non daemon thread and improve destroy and cleanup

* MLH-1477 add dependency (#5619)

* MLH-1477 add dependency

* MLH-1477 add dependency as test

* MLH-1477 refactor dlq to use repair flow to keep it idempotent (#5659)

* MLH-1477 refactor dlq to use repair flow to keep it idempotent

* MLH-1477 handle NPE in mutations map

* MLH-1477 set kafka property in atlas startup for DLQ

* MLH-1477 not use management system

* MLH-1477 remove unused method. also increase poll timeout seconds

* MLH-1477 remove cleanUpTransaction since it is not required

* MLH-1477 remove cleanUpTransaction since it is not required

* MLH-1477 handle NPQ in reindex method
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants