Skip to content

Argos collector (map_v3)#185

Open
colby-nyce wants to merge 428 commits into
mainfrom
colby-nyce/collector-v3
Open

Argos collector (map_v3)#185
colby-nyce wants to merge 428 commits into
mainfrom
colby-nyce/collector-v3

Conversation

@colby-nyce

@colby-nyce colby-nyce commented May 19, 2026

Copy link
Copy Markdown
Collaborator

ArgosArch.pptx

Since this is a very large PR, please review the files in the following order. If you want to review the easy files first, these are standalone utilities/changes without much Argos context:

Timestamps.hpp

  • Simple wrapper around a uint64_t to add timestamps to collected data
  • Unit tests use backpointer to local variable
  • Sparta uses std::function for Scheduler::getCurrentTime()

StreamBuffer.hpp

  • Wrapper around a byte buffer

CollectedData.hpp

  • Wrapper around a StreamBuffer
  • Collectable ID (CID, uint16_t) always starts off the byte buffer

Dump.hpp

  • Utility to pretty print DB tables
  • Controlled by --simdb-verbose in Sparta
  • Used for debugging only
  • Related changes: App.hpp, AppManager.hpp

Compress.hpp

  • Fix zlib.decompress() bug for empty blobs
  • Bug not encountered during real Argos use; suggested by AI during edge case bashing

TypeTraits.hpp

  • Largely taken directly from Sparta's MetaStructs.hpp (Avi's code for pair collection)
  • New code I added near the bottom: pod_convertible
  • Note that has_ostream_operator / has_sparta_pair_definition_type will be changed when rest of Sparta collection code is moved to SimDB

ValidValue.hpp

  • This whole class could just be identical to Sparta's
  • Can we move sparta::utils::ValidValue to SimDB and just keep an alias in Sparta?

TinyStrings.hpp

  • Switch to in-memory string map instead of requiring DatabaseManager
  • This was needed for some Sparta unit tests that don't have a DB (such as collecting a Queue without SimDB at all)

Now getting into the core changes...

PipelineDataTypes.hpp

  • Data structures used in all the DB pipeline's ConcurrentQueues in between async stages

PipelineStagerInterface.hpp

  • Interface class to receive collected data, notifications, and open/close state changes

EntryPoint.hpp

  • All EntryPoints are talking to the same PipelineStagerInterface
  • Sparta's Collectable/IterableCollector each have their own EntryPoint

ArgosCollector.hpp

  • Main entry point into the Argos collection system
  • Defines schema and DB pipeline
  • Writes metadata before/after simulation

Checkpointer.hpp
Checkpoint.hpp
CheckpointDeltas.hpp
CheckpointEncodings.hpp

  • Implements delta encoding for scalars and containers
  • Done on the 1st async pipeline stage

blob_handlers.py
blob_iterator.py

  • Implements single-pass blob iteration
  • Undoes delta encoding to get final data values at a specific time

Don't really need to review the rest of the python code.

test/argos/*

  • Integration-level testing
  • Largely written by AI
  • Runs 3 fully randomized simulations at heartbeats 1, 3, and 10
  • Invokes python code to compare all deserialized values for all three databases for exact match

…d run_value_compare functions, removing redundant code and improving maintainability.
…. Added checks to prevent duplicate heartbeats for already emitted CIDs and introduced a new PendingFieldTraceSink class for better field event tracking.
…e writeBin methods to support tracing. Clear curr_pair_field_events on writePairs_ to ensure accurate event recording.
…ement fixed byte size calculation for DataTypeNode in Collectables. Enhance collectable interface with new trace methods for improved serialization and container handling.
@colbynyce-mips

Copy link
Copy Markdown
Collaborator

Here are the latest performance numbers compared to the legacy collector:

  • 7.5% faster
  • 90% smaller databases / files

Further perf improvements probably have to wait until the pair collector code from Sparta is moved to SimDB, as nothing really stands out as worth the effort right now.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 76 out of 76 changed files in this pull request and generated 15 comments.

Comment thread include/simdb/utils/ValidValue.hpp
Comment thread include/simdb/apps/argos/ArgosResources.hpp Outdated
Comment thread include/simdb/apps/argos/PipelineStagerInterface.hpp
Comment thread include/simdb/sqlite/Dump.hpp
Comment thread include/simdb/sqlite/Iterator.hpp
Comment thread python/argos/viewer/model/tiny_strings.py Outdated
Comment on lines 78 to 90
for j, field_name in enumerate(self.visible_field_names):
self.grid.SetCellValue(idx, j, str(row_data[field_name]))
self.grid.SetCellValue(idx, j, str(row_data[j][1]))
if auto_colorize_col_idx is not None:
color = widget_renderer.GetAutoColor(row_data[self.visible_field_names[auto_colorize_col_idx]])
auto_colorize_col_name = self.visible_field_names[auto_colorize_col_idx]
color_keyval = None
for key, keyval in row_data:
if key == auto_colorize_col_name:
color_keyval = keyval
break

assert color_keyval is not None
color = widget_renderer.GetAutoColor(color_keyval)
self.grid.SetCellBackgroundColour(idx, j, color)
Comment thread python/argos/viewer/gui/widgets/widget_creator.py
Comment thread include/simdb/utils/ValidValue.hpp
Comment thread include/simdb/apps/argos/ArgosResources.hpp Outdated

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 69 out of 69 changed files in this pull request and generated 5 comments.

Comment on lines +140 to +157
for row in raw_rows:
nid, sid, name, type_name, special_formatter = row
enum_back = None
if type_name in SimpleDeserializer.CONVERTERS:
kind = "pod"
elif type_name == "string":
kind = "pod"
elif type_name != "dynamic":
kind = "enum"
if type_name not in enum_defns:
# All this means is that we never ended up collecting
# anything that uses this enum. All of the enum int:str
# mappings are figured out only when first seen during
# collection.
enum_defns[type_name] = {}
else:
enum_back = enum_backings.get(type_name)

Comment on lines +13 to +18
def GetString(self, string_id, must_exist=False):
if string_id in self._strings_by_id:
return self._strings_by_id[string_id]
if must_exist:
raise Exception(f'String ID does not exist: {string_id}')
return None
Comment on lines +42 to +45
void append(const bool val) { append(static_cast<uint8_t>(val)); }

void append(const std::string& s) { append(tiny_strings_->getStringID(s)); }

Comment on lines +305 to +308
except Exception as ex:
print (f"Error loading user settings. Deleting settings file. Error: '{ex}'")
os.remove(settings_file)
self.__ResetDefaultViewSettings()
Comment thread python/argos/viewer/gui/widgets/widget_creator.py
@colbynyce-mips

colbynyce-mips commented Jun 17, 2026

Copy link
Copy Markdown
Collaborator

After more delta encodings were added in commit aa4d0be, the database dropped in size again. Previously 90% smaller than legacy, now 92.6% smaller. The actual DB sizes:

Legacy: 530 MB
Pre-aa4d0be: 52 MB
Post-aa4d0be: 39 MB

Runtime was not affected since delta encoding is done on the pipeline's background threads. The encoder thread is still sleeping 67% of the time, so we can add more compression algos later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants