-
Notifications
You must be signed in to change notification settings - Fork 349
[SqlClient/EFCore] summary optimization phase 2 #3116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[SqlClient/EFCore] summary optimization phase 2 #3116
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3116 +/- ##
==========================================
+ Coverage 70.04% 70.18% +0.13%
==========================================
Files 439 439
Lines 16889 17086 +197
==========================================
+ Hits 11830 11991 +161
- Misses 5059 5095 +36 Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
data.Add(new() | ||
{ | ||
Name = "alter_table", | ||
Input = new() | ||
{ | ||
DbSystemName = "other_sql", | ||
Query = "ALTER TABLE MyTable ADD Name varchar(255)", | ||
}, | ||
Expected = new() | ||
{ | ||
SanitizedQueryText = ["ALTER TABLE MyTable ADD Name varchar(255)"], | ||
Summary = "ALTER TABLE MyTable", | ||
}, | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alanwest This specific test in the semantic conventions expects the 255 literal to be sanitized. My current implementation handles column definitions such that we retain the value. If you're happy with this, should I update this in the semantic conventions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intent of the original implementation was to aggressively sanitize since I figured any number or string literal in the query added little value to db.query.text
. That said, I think this is ok.
The reason SanitizedQueryText
is an array is to allow alternate acceptable forms of sanitization. I don't feel strongly that we should keep the previous form as an option, but it's a question we can raise when we update the test suite upstream in case others have a preference.
We just want to be extra cautious not to expose sensitive data. One thing the test suite from the semantic conventions repo does not cover are pathological SQL statements. For example, SQL that is not valid. Something like ALTER TABLE MyTable SELECT * FROM Users WHERE (something sensitive here)
. I don't think pathological test cases belong in the test suite from the semantic conventions, but I think we may want to add some coverage separately. I guess ideally it would be nice to not set db.query.text
at all in case of invalid SQL, but I think it may be challenging to detect without implementing a full SQL parser which would be silly. Maybe we could consider unsetting db.query.text
in the case of an exception indicating invalid SQL?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll open a PR upstream for the test and we can discuss there. I think both forms are acceptable if its hard to achieve without overhead in other languages.
Regarding invalid SQL statements. I'm unsure if we want to avoid setting db.query.text
. Firstly, as you say, we end up building a full parser which is complex. Also, if an invalid statement is included, that will fail the DB command and in that case, the user likely wants to know what the bad SQL was. My only real concern would be SQL built from unrestricted/invalidated user input. In that case, there could be a small vector to make something nasty that hangs up our parsing. Realistically though, if the customer code accept unvalidated user input, they have bigger problems!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm unsure if we want to avoid setting db.query.text.
This is kind of a secondary concern to me. Primary concern is that the literal values are sanitized even if the SQL is malformed. I think we should err on the side of being extra aggressive.
So this test case does bring me a little bit of pause. Are you confident that any literal values that might come after the statement will be sanitized? Few examples
ALTER TABLE MyTable ADD Name varchar(255) 'mypassword' 1234
would becomeALTER TABLE MyTable ADD Name varchar(255) ? ?
ALTER TABLE MyTable 'mypassword' ADD 123 Name varchar(255)
would becomeALTER TABLE MyTable ? ADD ? Name varchar(255)
test/OpenTelemetry.Contrib.Shared.Tests/OpenTelemetry.Contrib.Shared.Tests.csproj
Outdated
Show resolved
Hide resolved
a6bfe75
to
f563d5e
Compare
A couple of the tests are failin gon net462 - I'll look into those tomorrow. |
@martincostello There seem to be a bunch of EFCore failures now. I'll take a look into those, hopefully tomorrow and see why this breaks those. |
@martincostello The EF Core tests pass on my machine. Are you able to rerun these please? |
@alanwest What are you thoughts on the revised test (see #3116 (comment)) and additional test cases? I'm on leave next week, so will pick up any changes when I'm back. |
This PR was marked stale due to lack of activity. It will be closed in 7 days. |
}, | ||
"expected": { | ||
"db.query.text": [ | ||
"SELECT * FROM Vecnzotjejucwitzgrfifscuevittljlnrlpbruvkezeptqciyvbjhsytmbucbhwttidayecnthaztxbyppbyztcqccedeirgkxzrfezjxfwbtuqxeusroqgbvulgmsvnelovkxqsaqmlogomkhtjuirzhaocxlrmerihnmwaelullionarkmxwdamhduwrbooknqsnilurutgyerxphokeqnnoumbpcfjtmqrbpukjllofiwaltyoawkp o, OrderDetails od" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The spec for generating db.query.summary states that truncation should not occur within an operation or target.
Instrumentations that parse the query to set db.query.summary SHOULD truncate the summary to 255 characters (ensuring truncation does not occur within an operation name or target).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, I must have scanned over that. I'll update to handle this. Do you think we should a) truncate at the end of the last token that falls before 255, or b) all completion of any tokens starting before 255 chars, so potentially going over the 255 recommended limit? The later is only likely an issue for unusually long target names, but I doubt that's a real issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think always staying <= 255 would be more true to the spec. So in this example db.query.summary
would just be SELECT
.
"db.system.name": "other_sql", | ||
"query": "CREATE UNIQUE CLUSTERED INDEX IX_Employee_Email ON Employees (Email);" | ||
}, | ||
"expected": { | ||
"db.query.text": [ | ||
"CREATE UNIQUE CLUSTERED INDEX IX_Employee_Email ON Employees (Email);" | ||
], | ||
"db.query.summary": "CREATE UNIQUE CLUSTERED INDEX IX_Employee_Email" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is not a strict specification for this, but I think I'd lean towards trying to exclude qualifiers like UNIQUE
and CLUSTERED
when parsing the operation.
Also this appears to be a test that should specifically target MS SQL Server, so db.system.name
should equal microsoft.sql_server
. This is so that the test suite can be filtered down to tests only applicable to certain database systems if necessary. The list of known db.system.name
values are listed here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I was unsure whether we wanted those keywords capture in the summary. It adds context to what the statement does, directly in the span name, but perhaps there's little value in that. I doubt many instrumented apps create indices. I'm happy to update to exclude these for now. I can add the test upstream and perhaps a discussion can occur there on whether to include these (or not). My only concern would be if that discussion delays getting this instrumentation to a stable state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 007f300
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. We can always decide they should be included later if there is demand for it.
test/OpenTelemetry.Contrib.Shared.Tests/SqlProcessorAdditionalTestCases.json
Outdated
Show resolved
Hide resolved
test/OpenTelemetry.Contrib.Shared.Tests/SqlProcessorAdditionalTestCases.json
Outdated
Show resolved
Hide resolved
Co-authored-by: Martin Costello <[email protected]>
Co-authored-by: Martin Costello <[email protected]>
…Shared.Tests.csproj Co-authored-by: Martin Costello <[email protected]>
8e5aeab
to
007f300
Compare
@martincostello I have one more (hopefully) thing to tackle which is the truncation once we decide on option a) or b). |
Contributes to #2238
Changes
This is a rather large refactor of the
SqlProcessor
. The initial work focused on removing all allocation overhead. Further work optimised the non-cached execution time withSpan
-based code. Several iterations have led to the code in this PR which is a balance between readability and performance. In all cases, the execution time is reduced compared to the existing code.Further work has been introduced to support additional SQL keywords and less common statements. This is not exhaustive but builds upon those previously supported.
@alanwest I'll add some PR comments around some specific items that may need discussion. I've included
SqlProcessorAdditionalTestCases.json
with more statements. We could eventually contribute these into the semantic conventions. These will need a careful review to check we are happy the expected summaries align with the intent from the semantic conventions.Before
After
Merge requirement checklist
AppropriateCHANGELOG.md
files updated for non-trivial changesChanges in public API reviewed (if applicable)