-
Notifications
You must be signed in to change notification settings - Fork 269
Refactor SQL module to use PreparedStatement in SQLSpout and IndexerBolt #1611 #1766
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…olt for improved readability and performance
|
@jnioche Could you approve the CI please ? And Can i get review for this PR ? Curious to know your thoughts. Thanks! |
…olt for improved readability and performance
|
Thanks for the PR. I have triggered the CI which results in Didn't look into the code diff yet. |
|
@rzo1 Request to trigger the CI again. |
rzo1
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! I have some questions / suggestions and didn't looked deep into the diff yet:
-
Is it possible to move the prepared statements to the class level? For the StatusUpdaterBolt and the spout, the statements shouldn’t change per tuple, and since Storm bolts are single-threaded per instance, this should help reduce overhead.
-
I think this module is currently untested. It would be great to add some tests (maybe using test containers). I know the old implementation didn’t have tests either, but having them would help ensure everything works as expected.
|
@rzo1 Request to trigger the CI. |
…ndexing functionality
|
@rzo1 I think the PR is in a good shape for review now. Tests have been added and prepared statements have been moved back to class level. Let me know your thoughts. |
rzo1
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, lgtm.
external/sql/src/main/java/org/apache/stormcrawler/sql/IndexerBolt.java
Outdated
Show resolved
Hide resolved
| } | ||
|
|
||
| @Override | ||
| public void cleanup() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a slight risk that this method isn't called in a cluster setup:
The cleanup method is called when a Bolt is being shutdown and should cleanup any resources that were opened. There's no guarantee that this method will be called on the cluster: for example, if the machine the task is running on blows up, there's no way to invoke the method. The cleanup method is intended for when you run topologies in local mode (where a Storm cluster is simulated in a process), and you want to be able to run and kill many topologies without suffering any resource leaks.
Don't think it would be a huge issue for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about this. We should look into making these interfaces extend AutoCloseable for better resources cleanup. This would also help to use these components with try-with-resources block. https://docs.oracle.com/javase/8/docs/api/java/lang/AutoCloseable.html.
Would like to hear your thoughts on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see your point about making these interfaces extend AutoCloseable for better resource cleanup and enabling usage with try-with-resources. That approach makes sense in general for components that are manually managed and need explicit cleanup.
However, in the case of a Storm bolt, it doesn’t quite fit. Bolts are executed by the Storm runtime, and users typically don’t instantiate or manage them directly in a try-with-resources block. Implementing AutoCloseable here wouldn’t provide any practical benefit, since resource management is handled by Storm itself rather than the user.
|
@rzo1 The comments have been addressed. Let me know your thoughts. |
|
@rzo1 Sorry for asking this stupid question but are we waiting for more approvals to merge the PR ? |
|
It’s not a stupid question. Typically, we wait at least 72 hours so that people in different time zones have a chance to review it (the same applies to release votes sent to the dev@ mailing list - if you’re interested in SC, feel free to subscribe). This period can be shorter if necessary. Also, keep in mind that most committers contribute in their spare time and are not paid to work on SC full-time, so reviews and merges Usually may take some additional time. |
This PR refactors parts of the SQL module to use PreparedStatement instead of SQL string concatenation, in accordance with issue #1611.
Thank you for contributing to Apache StormCrawler.
In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:
For all changes
Is there a issue associated with this PR? Is it referenced in the commit message?
Does your PR title start with
#XXXXwhereXXXXis the issue number you are trying to resolve?Has your PR been rebased against the latest commit within the target branch (typically main)?
Is your initial contribution a single, squashed commit?
Is the code properly formatted with
mvn git-code-format:format-code -Dgcf.globPattern="**/*" -Dskip.format.code=false?For code changes
mvn clean verify?Note
Please ensure that once the PR is submitted, you check GitHub Actions for build issues and submit an update to your PR as soon as possible.