Skip to content

Conversation

@rajucomp
Copy link

@rajucomp rajucomp commented Dec 27, 2025

This PR refactors parts of the SQL module to use PreparedStatement instead of SQL string concatenation, in accordance with issue #1611.

Thank you for contributing to Apache StormCrawler.

In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:

For all changes

  • Is there a issue associated with this PR? Is it referenced in the commit message?

  • Does your PR title start with #XXXX where XXXX is the issue number you are trying to resolve?

  • Has your PR been rebased against the latest commit within the target branch (typically main)?

  • Is your initial contribution a single, squashed commit?

  • Is the code properly formatted with mvn git-code-format:format-code -Dgcf.globPattern="**/*" -Dskip.format.code=false?

For code changes

  • Have you ensured that the full suite of tests is executed via mvn clean verify?
  • Have you written or updated unit tests to verify your changes?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE file, including the main LICENSE file?
  • If applicable, have you updated the NOTICE file, including the main NOTICE file?

Note

Please ensure that once the PR is submitted, you check GitHub Actions for build issues and submit an update to your PR as soon as possible.

…olt for improved readability and performance
@rajucomp
Copy link
Author

@jnioche Could you approve the CI please ? And Can i get review for this PR ? Curious to know your thoughts. Thanks!

…olt for improved readability and performance
@rzo1 rzo1 requested review from jnioche and sigee December 27, 2025 19:55
@rzo1
Copy link
Contributor

rzo1 commented Dec 27, 2025

Thanks for the PR. I have triggered the CI which results in

[INFO] Scanning classes for violations...
Error:  Forbidden method invocation: java.lang.String#format(java.lang.String,java.lang.Object[]) [Uses default locale]
Error:    in org.apache.stormcrawler.sql.StatusUpdaterBolt (StatusUpdaterBolt.java:107)
Error:  Forbidden method invocation: java.lang.String#format(java.lang.String,java.lang.Object[]) [Uses default locale]
Error:    in org.apache.stormcrawler.sql.StatusUpdaterBolt (StatusUpdaterBolt.java:114)
Error:  Forbidden method invocation: java.lang.String#format(java.lang.String,java.lang.Object[]) [Uses default locale]
Error:    in org.apache.stormcrawler.sql.SQLSpout (SQLSpout.java:125)
Error:  Forbidden method invocation: java.lang.String#format(java.lang.String,java.lang.Object[]) [Uses default locale]
Error:    in org.apache.stormcrawler.sql.SQLSpout (SQLSpout.java:234)
Error:  Forbidden method invocation: java.lang.String#format(java.lang.String,java.lang.Object[]) [Uses default locale]
Error:    in org.apache.stormcrawler.sql.IndexerBolt (IndexerBolt.java:175)
Error:  Forbidden method invocation: java.lang.String#format(java.lang.String,java.lang.Object[]) [Uses default locale]
Error:    in org.apache.stormcrawler.sql.IndexerBolt (IndexerBolt.java:178)
Error:  Scanned 6 class file(s) for forbidden API invocations (in 0.04s), 6 error(s).

Didn't look into the code diff yet.

@rajucomp
Copy link
Author

@rzo1 Request to trigger the CI again.

Copy link
Contributor

@rzo1 rzo1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! I have some questions / suggestions and didn't looked deep into the diff yet:

  1. Is it possible to move the prepared statements to the class level? For the StatusUpdaterBolt and the spout, the statements shouldn’t change per tuple, and since Storm bolts are single-threaded per instance, this should help reduce overhead.

  2. I think this module is currently untested. It would be great to add some tests (maybe using test containers). I know the old implementation didn’t have tests either, but having them would help ensure everything works as expected.

@rajucomp
Copy link
Author

rajucomp commented Jan 1, 2026

@rzo1 Request to trigger the CI.

@rajucomp
Copy link
Author

rajucomp commented Jan 2, 2026

@rzo1 I think the PR is in a good shape for review now. Tests have been added and prepared statements have been moved back to class level. Let me know your thoughts.

@rajucomp rajucomp requested a review from rzo1 January 3, 2026 22:21
Copy link
Contributor

@rzo1 rzo1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, lgtm.

}

@Override
public void cleanup() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a slight risk that this method isn't called in a cluster setup:

The cleanup method is called when a Bolt is being shutdown and should cleanup any resources that were opened. There's no guarantee that this method will be called on the cluster: for example, if the machine the task is running on blows up, there's no way to invoke the method. The cleanup method is intended for when you run topologies in local mode (where a Storm cluster is simulated in a process), and you want to be able to run and kill many topologies without suffering any resource leaks.

Don't think it would be a huge issue for now.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about this. We should look into making these interfaces extend AutoCloseable for better resources cleanup. This would also help to use these components with try-with-resources block. https://docs.oracle.com/javase/8/docs/api/java/lang/AutoCloseable.html.
Would like to hear your thoughts on this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point about making these interfaces extend AutoCloseable for better resource cleanup and enabling usage with try-with-resources. That approach makes sense in general for components that are manually managed and need explicit cleanup.

However, in the case of a Storm bolt, it doesn’t quite fit. Bolts are executed by the Storm runtime, and users typically don’t instantiate or manage them directly in a try-with-resources block. Implementing AutoCloseable here wouldn’t provide any practical benefit, since resource management is handled by Storm itself rather than the user.

@rajucomp
Copy link
Author

rajucomp commented Jan 7, 2026

@rzo1 The comments have been addressed. Let me know your thoughts.

@rajucomp rajucomp requested a review from rzo1 January 7, 2026 18:55
@rajucomp
Copy link
Author

rajucomp commented Jan 9, 2026

@rzo1 Sorry for asking this stupid question but are we waiting for more approvals to merge the PR ?

@rzo1
Copy link
Contributor

rzo1 commented Jan 9, 2026

It’s not a stupid question.

Typically, we wait at least 72 hours so that people in different time zones have a chance to review it (the same applies to release votes sent to the dev@ mailing list - if you’re interested in SC, feel free to subscribe).

This period can be shorter if necessary. Also, keep in mind that most committers contribute in their spare time and are not paid to work on SC full-time, so reviews and merges Usually may take some additional time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants