[crdb] set index backfill batch size to 5000, max-sql-memory to 256MiB#9889
Conversation
Created using spr 1.3.6-beta.1
| // See https://github.com/oxidecomputer/omicron-9874-findings for | ||
| // why we set the max SQL memory to be 256MiB. | ||
| .arg("--max-sql-memory=256MiB") |
There was a problem hiding this comment.
This is more a note-to-self: I tweak this value in #9878 too, but only for testing to make reproductions more consistent. We should merge this carefully.
(I'm sure we can figure something out, just calling it out early)
There was a problem hiding this comment.
Oh good catch, I guess we should make the max sql memory an option on the cockroach command builder.
There was a problem hiding this comment.
Done -- also added this to cargo xtask db-dev run
| similar_asserts::assert_eq!( | ||
| observed_cluster_settings, | ||
| expected_cluster_settings, | ||
| "Cluster settings do not match between migrations and dbinit.sql. \ | ||
| If dbinit.sql contains a SET CLUSTER SETTING statement, there must \ | ||
| be a corresponding migration (and vice versa)." |
There was a problem hiding this comment.
Real smart to include this as a check, good call!
| } | ||
|
|
||
| /// Pragma comment that marks a migration step as non-transactional. | ||
| const NON_TRANSACTIONAL_PRAGMA: &str = "-- NON-TRANSACTIONAL"; |
There was a problem hiding this comment.
I hope no one complains that the --NON-TRANSACTIONAL flag isn't working, if they don't realize the -- is syntax for SQL comments and we're checking the space here 😅
(This is probably fine... but I am a little nervous that "almost correct" pragmas will silently do nothing)
There was a problem hiding this comment.
It sure looks like a comment in up.sql 😬
I assume this is not something we expect to come up often, given it's the first time we've needed it. Would it be unreasonable to build a static list of migration files that are nontransactional instead of embedding a special marker in the file itself? Maybe kinda janky, but I'm imagining something like
static NON_TRANSACTIONAL_UPGRADE_STEPS: &[&str] = &[
"index-backfill-batch-size/up.sql",
];There was a problem hiding this comment.
(This is probably fine... but I am a little nervous that "almost correct" pragmas will silently do nothing)
FWIW SET CLUSTER SETTING blows up quite loudly if in a (multi-statement) transaction.
I assume this is not something we expect to come up often, given it's the first time we've needed it. Would it be unreasonable to build a static list of migration files that are nontransactional instead of embedding a special marker in the file itself? Maybe kinda janky, but I'm imagining something like
I considered that but felt that was slightly worse than embedding this knowledge locally in the file. I don't have a very strong preference here though.
There was a problem hiding this comment.
I considered that but felt that was slightly worse than embedding this knowledge locally in the file. I don't have a very strong preference here though.
I don't love the "static list" suggestion I made, but I more strongly dislike these two bits about embedding the knowledge locally in the file:
- We now have to check every schema migration step for a special magic comment (not worried about runtime overhead or anything; more the cognitive/complxity overhead - today we just read the file, and that seems like a nice property to maintain as long as we can)
- It allows any schema migration check to opt into non-transactional mode
The latter one bugs me more, I think? Maybe this is paranoia, but I'm worried about someone having some problem that they can address by opting out of transactions (but ought to address some other way), and code review missing it because the opt-out "flag" is a comment in a migration step, which I think is pretty easy to overlook.
Static list directly addresses the first of these, and maybe makes the second less likely? Hopefully a big scary comment on the static list plus a comment-per-entry works as a wakeup in PR review, and anyone stumbling across that list after changes are merged should be able to easily tell which migration steps have this opt-out and why.
There was a problem hiding this comment.
One downside of static lists, IMO:
- Someone writes a migration with steps "01.sql, 02.sql, 03.sql"
- They mark "02.sql" as "the one needing to be non-transactional"
- They realize they need another change before that, so they add it, and rename all the files
- Now the steps are "01.sql, 02.sql, 03.sql (this is the one which should be non-txn), 04.sql"
- The static list still points to 02.sql as the non-transactional one
I guess this is sorta my argument about the danger of non-locality of labelling
There was a problem hiding this comment.
Maybe bad idea: what about putting it in the filename somehow? It's more local than a static list but doesn't involve sidebanding control information in a comment. (The filename is still a sideband, but it feels less surprising that the filename could be semantically meaningful than a comment.) On the flip side: it seems potentially less visible in code review.
FWIW this is the approach I'm taking with #9878 - I'm auto-generating NN.verify.sql queries to live next to NN.sql files
There was a problem hiding this comment.
Hmm I would rather not encode this in the file name because this isn't an ancillary file the way NN.verify.sql is.
The solution @jgallagher and I came up with is on the update watercooler today is:
- rename
-- NON-TRANSACTIONALto-- DANGER-NON-TRANSACTIONAL-- make it semiotically not a place of honor - remove the documentation from the README, move it to a comment, with a clear warning sign
Thoughts?
There was a problem hiding this comment.
Hmm I would rather not encode this in the file name because this isn't an ancillary file the way
NN.verify.sqlis.The solution @jgallagher and I came up with is on the update watercooler today is:
- rename
-- NON-TRANSACTIONALto-- DANGER-NON-TRANSACTIONAL-- make it semiotically not a place of honor- remove the documentation from the README, move it to a comment, with a clear warning sign
Thoughts?
I'd also be fine with this, especially given your note that you used this to update a cluster setting, and if we somehow beef this process and accidentally make the SQL statement happen in a txn, the migration would blow up in tests.
There was a problem hiding this comment.
Ok I just put up a change to encode this in the filename.
There was a problem hiding this comment.
17d18fb looks good to me; I also think it's reasonable to keep this out of the README given how sketchy non-transactional operations are.
Created using spr 1.3.6-beta.1
Created using spr 1.3.6-beta.1
Created using spr 1.3.6-beta.1
| // We have 2 or more `up*.sql`; they should be numbered | ||
| // exactly 1..=up_sqls.len(). | ||
| if i as u64 + 1 != *up_number { | ||
| if i as u64 + 1 != parsed.up_number { |
There was a problem hiding this comment.
This is where we'd catch "a transactional + non-transactional SQL change both using the same number", right?
There was a problem hiding this comment.
Yeah (and there's a test for this too)
Created using spr 1.3.6-beta.1
Per the analysis in https://github.com/oxidecomputer/omicron-9874-findings.
An interesting note is that we didn't used to set max-sql-memory at all, which meant we were relying on the CockroachDB default. On Linux that was 25% of system memory, but on illumos the detection fell back to a default of 128 MiB. So developers on Linux would effectively have very different SQL memory limits from those on illumos. With this change, the two OSes now have similar behavior.
Fixes #9874.