-
Notifications
You must be signed in to change notification settings - Fork 2.9k
NIFI-9770 Added nifi-cql-bundle to bring support for Cassandra 4.X an… #10006
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
@exceptionfactory @mattyb149 ScyllaDB aims for 100% compatibility at the interface level on Java drivers, and the basic testcontainers-powered testing I did seems to support it being a drop-in replacement for the Java DataStax driver. The two "codec" Java files that are in the Cassandra provider package were generated with ChatGPT. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for putting this together @MikeThomsen.
This will take some time to review, but on a quick scan, I observed a number of issues related to Property Descriptor naming, logging, and use of legacy classes where more modern options are available.
The convention issues can be addressed, but one more fundamental concern is the use of the com.datadata.oss driver, which is over a year old. As mentioned on the DataStax Java Driver page, the driver is now maintained under the Apache Cassandra project.
| @@ -0,0 +1,51 @@ | |||
| <?xml version="1.0" encoding="UTF-8"?> | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
given 'nifi-cql-bundle' is the top level concept it really should say the word 'cassandra' fully written out.
so 'nifi-cassandra-bundle' at the top level at least.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The bundle includes a driver NAR for ScyllaDB support, so I think calling it nifi-cql-bundle makes more sense here. That said, it's been a while AFAICT since Cassandra and ScyllaDB support were discussed, so I'm open to splitting this into two separate bundles because we can't predict where Cassandra and ScyllaDB will be in the future.
| Expert Group and released to the public domain, as explained at | ||
| http://creativecommons.org/publicdomain/zero/1.0/ | ||
|
|
||
| (ASLv2) The Netty Project |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please look for other exmaples of how we reference netty. all this text should not be needed
| </dependency> | ||
| <dependency> | ||
| <groupId>org.apache.nifi</groupId> | ||
| <artifactId>nifi-cql-nar</artifactId> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not in favor of including cassandra by default in our convenenience binary. It makes great sense to build a great supportable cassandra interface again but it needs to be really well written and should not be bundled by default until we find out more about usage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For all external system integrations I'm in favor of a (perhaps set of) interface(s) that separate the external system's API/SDK from the NiFi components in order to BYO library so we can try to be future-proof for upcoming versions and such. This is often a lot more difficult and time-consuming than when we used to just accept all contributions that connected to these systems.
Having said that, I don't want to set a negative tone for PR contributions for external system integrations or else everyone will end up with their own private repos and NiFi's bread and butter of universal data distribution will suffer.
|
@exceptionfactory We should be good to go now on the dependency update. If you, @joewitt and @mattyb149 could weight in on keeping this as a single bundle vs two (one Cassandra, one ScyllaDB) I'd like to know your thoughts. I see more advantages in having a single, unified CQL bundle, but the disadvantage is that API divergence between the two databases could necessitate breaking compatibility later. |
|
I don't know what CQL is but i'm guessing Cassandra Query Language. Im not familiar with Cassandra vs ScyllaDB But I am very familiar with complex dependencies, maintenance, and vulnerability considerations. This one just makes me more attentive than others because it tends to tick those boxes of concern. To be clear...I do recognize Cassandra is very popular and an excellent tool. And the nifi community is better off when we give them a great integration. I get it. I just dont have a good enough sense of this contrib in doing that. I also noted the AI generated looking code and thought that was a potential flag. I'm not sure we can use that as is but maybe... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updating the driver references @MikeThomsen. I plan to take a closer at the dependency tree, as it appears to include a number of libraries.
As far as the code itself, there are several areas that need to be addressed to align with current conventions including the following:
- Remove use of
displayName()from Property Descriptors - Use human-readable Title Case names for the
name()in Property Descriptors - Define supported Property Descriptors statically using
List.of() - Defined Relationships statically using
Set.of() - Rewrite log messages to avoid use of
due to {}with anException, since theExceptionis not an argument for the log message itself - Use an
enumthat implementsDescribedValuefor defining allowable values and read theenumfrom the Context - Standardize naming using either
CQLorCqlin some class names. It looks like the driver usesCqlSession, although NiFi components useSQL, so usingCQLseems better in places - Align Java package names more closely with module names, such as
processors.cqlandservices.scylladbin applicable modules
If you can work through these items across the board, that would address a number of things and enable focus on the implementation itself.
TL;DR, yes CQL = Cassandra Query Language. ScyllaDB is an increasingly popular, C++ rewrite of Apache Cassandra that aims for (or seems to aim for) 100% driver API compatibility so it is a drop-in replacement as far as Cassandra users are concerned.
I understand the concern, but when I reviewed the output from ChatGPT the TypeCodec implementations looked like what any reasonable person would write by hand to implement those mappings so I'm not sure there is enough novelty here to raise red flags. That said, I can try to do some googling on Cassandra to see if OpenAI was taking liberties with someone else's code. |
Yeah. Understood. Thanks for this. FWIW, the implementation is directly inspired by how the graph bundle generally handles the interaction between its controller services and processors (since Cypher and Gremin are so radically different as protocols) |
| .identifiesControllerService(SSLContextService.class) | ||
| .build(); | ||
|
|
||
| PropertyDescriptor CLIENT_AUTH = new PropertyDescriptor.Builder() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This property should not be used for client operations, it is only relevant to services that implement listening sockets.
| import java.util.stream.Stream; | ||
|
|
||
| public interface CQLExecutionService extends ControllerService { | ||
| PropertyDescriptor CONTACT_POINTS = new PropertyDescriptor.Builder() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These Property Descriptors should not be included on the interface. The Controller Service interface defines contract behavior, but the Property Descriptors are specific to a given implementation, and should never be included on a Controller Service interface definition.
…d ScyllaDB.
Summary
NIFI-00000
Tracking
Please complete the following tracking steps prior to pull request creation.
Issue Tracking
Pull Request Tracking
NIFI-00000NIFI-00000Pull Request Formatting
mainbranchVerification
Please indicate the verification steps performed prior to pull request creation.
Build
mvn clean install -P contrib-checkLicensing
LICENSEandNOTICEfilesDocumentation