Skip to content

Conversation

@wenyanshi-123
Copy link
Contributor

@wenyanshi-123 wenyanshi-123 commented Dec 9, 2025

When there is no User Cache on the DN side, the entire time series will be handed over to CN for authentication, which ultimately leads to CN OOM in some PIPE scenarios with a large number of time series. To solve this problem, it is necessary to prohibit passing all time series to CN during authentication, and instead pull the user and it's corresponding roles to dn when the user cache and role caches fail.

Copy link
Contributor

@CRZbulabula CRZbulabula left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PTAL.

Copy link
Contributor

@CRZbulabula CRZbulabula left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@jt2594838
Copy link
Contributor

And, how is this PR related to PIPE?

@wenyanshi-123 wenyanshi-123 changed the title Fix OOM problem for PIPE. Avoid CN OOM by Pulling User/Roles to DN When Cache Misses. Dec 9, 2025
Comment on lines +181 to 185
Role cachedRole = iAuthorCache.getRoleCache(rolename);
if (cachedRole == null) {
checkRoleFromConfigNode(username, rolename);
cachedRole = iAuthorCache.getRoleCache(rolename);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is still a chance that the role is evicted between line 183 and 184, therefore, cachedRold can still be null.
One possible solution is to make checkRoleFromConfigNode() return the Role it just fetched.
And may rename checkRoleFromConfigNode() to fetchRoleFromConfigNode() which is more precise.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And what if the role is deleted concurrently? You can still not fetch it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have resolved the possible NPE issue, but the concurrency problem requires a more systematic design.

@jt2594838
Copy link
Contributor

Avoid using force-push, which will mix your new changes with the old ones.
This forces me to go through the whole changes again, wasting some time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants