Master experiment#111
Open
IrenaRistova wants to merge 17 commits into
Open
Conversation
- .gitignore: add examples/batterymanager/Scripts/.device_state_capabilities/ (per-machine cache keyed by adb serial, written by before_experiment device-state verifier in E0.T8). - requirements-appium.txt: pin Appium-Python-Client>=5.0 alongside the existing AndroidRunner requirements.txt so a single venv satisfies both AndroidRunner and the black-box Appium harness in appium_android_tests/. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…e harness - AndroidRunner/NativeExperiment.py: tweaks to support interaction_covers_duration=true (so per-app Appium scripts that block on a workload thread don't get double-slept by the runner) and the Master Experiment device list. - AndroidRunner/Plugins/batterymanager/Batterymanager.py: small adjustments to play nicely with the rebuilt BatteryManager companion fork (com.example.batterymanager_utility) used post-2026-05-08. - devices.json: register Pixel 3, Pixel 6, Pixel 9 entries for the thesis device matrix. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
E0.T4 (tracking matrix): after_experiment hook appends one row per (app, variant, device, run_id) to specs/tracking_matrix.csv, with dedup-by-run_id semantics so re-runs replace rather than append. E0.T4b (APK provenance hook): before_run writes apk_meta.json into the per-run output dir; update_tracking_matrix.py populates apk_path / apk_sha256 / apk_storage columns from it (no manual --apk-path flag needed when the standard hook chain runs). E0.T7 (three-window energy split): update_tracking_matrix.py post-hoc slices the BatteryManager per-sample CSV into pre_workload / workload_only / whole_window using profiler-start, first-Appium-scenario-ts, and last-Appium-scenario-ts as boundaries. Files: - Scripts/_lib_apk_meta.py: SHA-256 + apk_meta.json writer/reader. - Scripts/update_tracking_matrix.py: full matrix updater (idempotent, stdlib-only, CSV dedup, 3-window slicer). - Scripts/before_run.py: writes apk_meta.json before each run. - Scripts/after_experiment.py: invokes update_tracking_matrix.py; wrapped per-step so failures don't crash the experiment. - Scripts/after_launch.py: minor tweak to fit the chain. - compute_energy_from_sysfs.py: standalone sysfs aggregator (fallback for devices where BatteryManager is gated; unused on the canonical Pixel 3 path but kept for noise-source experiments). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…averaged (E0.T5) Scripts/detect_crash_anr.py classifies each run as one of: none / crash / anr / system_error_dialog / lost_foreground, parsing logcat (or BatteryManager-side logcat dumps when the adb_log persistency strategy is enabled) and force-stopping the package so no zombie process leaks into the next run's energy window. Writes crash_anr_status.json into the per-run output dir with the classification + matching log lines as evidence. Scripts/after_run.py now invokes detect_crash_anr.py once per run. Why this matters for the thesis: a single Bangcle-induced ANR run can compress the median of three reps toward "low energy" because the app died and the device idled. Without explicit isolation, that becomes a silent bias toward protected variants. The matrix updater reads crash_anr_status.json and populates the crash_anr_status column; unknown values surface as notes=crash_anr_status_missing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…0.T8) Scripts/_lib_device_state.py: shared helpers for setting/reading device state — brightness lock, dumpsys battery unplug + status overrides, current_now sampling, per-device capability cache (auto-gitignored). Scripts/before_experiment_apply_device_state.py: once per experiment. Sets screen_brightness_mode=0 + brightness=128 (persists across runs), runs the canonical "dumpsys battery unplug" sequence, sleeps 5 s, then reads /sys/class/power_supply/battery/current_now. Caches the verdict (verified_discharge / suspected_supplying / unknown) at .device_state_capabilities/<serial>.json. Scripts/before_run_record_device_state.py: once per run. Captures battery level, current_now, voltage_now, brightness, airplane mode, third-party package count, BATTERY_STATS grant state. The matrix updater turns abnormal fields into notes-column annotations: energy_invalid_usb_supplying, energy_validity_unknown, battery_stats_not_granted, brightness_drift, etc. Strict mode (MASTEREXP_STRICT_DISCHARGE_CHECK=1) aborts the experiment when discharge is suspect — for the eventual thesis batch where we prefer fail-fast over silently-corrupted data. Default lenient mode lets dev iteration continue and tags the row instead. Software charge-disable is provably ineffective on Pixel 3 + Pixel 9 (charge IC ignores dumpsys battery unplug); hub-ctrl (Epic 1.6) is the canonical mitigation. Until that lands, the discharge verdict tells us which rows are usable for cross-variant energy comparison. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ly (E0.T10) Scripts/before_experiment_grant_battery_stats.py: idempotent pm grant com.example.batterymanager_utility BATTERY_STATS. Constraint 9 post-2026-05-08 supervisor meeting. Warn-not-fail on non-zero exit so a broken companion install can't kill the run; broken state surfaces in the matrix as notes=battery_stats_not_granted. Scripts/before_experiment.py chains, in order: 1. apply_device_state (E0.T8) — physical-world controls first 2. grant_battery_stats (E0.T10) — permission state second 3. (per-app uninstall, separate hook) — subject state third Each link is wrapped in try/except so failure in one doesn't skip the others; SystemExit from strict-mode discharge check propagates. NOTE: the BATTERY_STATS grant was originally believed to unmask BATTERY_PROPERTY_CURRENT_NOW. Controlled A/B 2026-05-08 evening and the AOSP API surface both prove this is not the case — CURRENT_NOW is not gated by BATTERY_STATS. We keep the grant as consistency hygiene (so all three devices report the same permission state in device_state.json) but it does not fix float-charge masking — that is hub-ctrl territory (Epic 1.6). See docs/MEASUREMENT_NOISE_SOURCES.md §1 in the workspace repo for the retraction. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… linkhub CONVENTIONS.md §10 contract: fresh uninstall between APK variants is required because AndroidRunner does not by default uninstall a previous variant — it overlays. That is wrong for cross-variant comparisons because the previous variant's data dir, ART caches, and JIT state can persist. Each before_experiment_uninstall_<slug>.py hardcodes its app's manifest package and calls device.uninstall(PACKAGE) at experiment start. The metronome hook also chains apply_device_state + grant_battery_stats (both idempotent) so the full pre-experiment sequence runs regardless of which entry-point hook the config references. Slug coverage matches the Epic 3 sub-tasks scaffolded as of 2026-05-09: metronome (Pilot A), tipuous, repertoire, linkhub (the three no-INTERNET parallel-subagent outputs). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…(E0.T1 / E0.T2)
Scripts/interaction_appium.py: generic dispatcher reading APPIUM_APP env
var; imports appium_android_tests.<APPIUM_APP> and calls its
run_workload(experiment, device). Writes appium_status.json with a
clean failure_reason string on every failure path (no Python tracebacks
into AndroidRunner). Documented at Scripts/README-appium-hooks.md.
Scripts/interaction_appium_TEMPLATE.py: copy-paste template for new
per-app wrappers. Each wrapper hardcodes APPIUM_APP=<slug> and
delegates to the dispatcher — useful when AndroidRunner configs reference
a script path rather than an env var.
Scripts/interaction_appium_metronome.py +
Scripts/interaction_appium_metronome_espresso_mirror.py: thin wrappers
for Pilot A (Kr0oked Metronome). Espresso-mirror variant sets
APPIUM_WORKLOAD=espresso_mirror to select the Set α scenario suite.
Both predate the generic dispatcher and remain the canonical path for
existing Metronome configs.
Scripts/interaction_appium_{tipuous,repertoire,linkhub}.py: per-app
wrappers for the three no-INTERNET apps scaffolded in Epic 3 wave 1
(2026-05-09 parallel subagents). Each delegates to its own per-app
module under appium_android_tests/<slug>/.
Scripts/interaction_monkey_only.py + Scripts/interaction.py changes:
monkey-only fallback retained for smoke-style runs that don't need
UiAutomator2 / strict-UI scoring. Documented for completeness only;
not part of the thesis-grade comparison.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ofiler (Epic 1.5) Scripts/aux_postprocess.py reads the per-run csv emitted by AndroidRunner's built-in 'android' profiler (Plugins/android/Android.py, A-Mobile 2020 — data_points: [cpu, mem]) and aggregates into aux/aux_summary.json with cpu_avg_pct, cpu_p95_pct, mem_pss_avg_mb, mem_pss_max_mb. The matrix updater (E1.5.T5) reads aux_summary.json to populate the four aux columns in specs/tracking_matrix.csv. Goal: attribute energy deltas to underlying workload deltas (so the thesis can say "Bangcle-packed runs use X% more CPU and Y% more PSS than baseline" alongside "they use Z mWh more"). Without aux, energy deltas have no causal context. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… JSONs (E0.T3)
_templates/ contains the four authoritative templates that all concrete
configs are materialized from:
- _templates/device_pixel{3,6,9}.json — per-device blocks (serial,
adb-path, BatteryManager grants needed, ABI gotchas as _comments).
- _templates/app_variant_2min.json — the locked-invariant 2-minute
per-(app, variant) experiment template with {{APP_ID}},
{{APPLICATION_ID}}, {{APK_PATH}}, {{DEVICE_NAME}},
{{INTERACTION_HOOK}} placeholders.
Concrete configs in two naming styles:
- New per-app convention <slug>_<variant>_<device>.json:
linkhub_baseline_pixel3.json, repertoire_baseline_pixel3.json,
tipuous_baseline_pixel3.json, metronome_bangcle_pixel9.json (the
v2.1.1 protected-build pointer added 2026-05-12).
- Legacy monkey_* family kept for the existing Metronome Pilot A flow
(Pixel 3 / Pixel 6 / Pixel 9 baseline + R8-obfuscated + Bangcle-packed
espresso_mirror configs, plus a couple of plain-monkey smokes).
espresso_compare_apk_paths.json: ad-hoc validator config used to
cross-check that two APK paths point at the same manifest package.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
examples/batterymanager/README-experiments.md: per-config catalog — what each .json runs, on which device, against which APK variant, with which interaction hook. Lookup table for "which config do I copy when adding a new (app, variant, device) cell?" examples/batterymanager/README-templates.md: explains _templates/, the placeholder substitution flow, and the "locked invariants" in app_variant_2min.json that must NOT be changed when materializing concrete configs (duration ceiling, repetitions, interaction_covers_duration, time_between_run, BatteryManager profiler block). examples/batterymanager/ESPRESSO_MIRROR_VALIDATION.md: thesis-defence validation matrix — per-scenario relationship between the upstream Metronome Espresso InstrumentedTest.kt and our Appium espresso_mirror suite (Close / Partial / Touch-only / Not implemented). Updated for the AndroT-snapshot scenario list; predates Set α (which dropped tempoMarkingsWalk and added two audio scenarios — needs a Set α addendum when time permits). Scripts/README-appium-hooks.md: companion to appium_android_tests/CONVENTIONS.md, scoped to the AndroidRunner side: when to use the generic interaction_appium.py dispatcher vs a per-app wrapper, the APPIUM_APP env-var contract, the appium_status.json failure_reason vocabulary, and a 5-step "add a new app" checklist. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…smoke Companion to examples/batterymanager/metronome_bangcle_pixel9.json (committed 2026-05-12 in 420a34d). Same APK target, just devices block swapped to "Pixel 3". Used 2026-05-12 to verify the v2.1.1 packed APK installs and runs the Set α scenarios on Pixel 3 in addition to Pixel 9, proving the cross-device install gate is finally cleared for the protected variant (v1.7.2 Bangcle had only armeabi-v7a; v2.1.1 ships all 4 ABIs). Pixel 3 + Pixel 9 both selected arm64-v8a at install time on this APK. Run IDs proving end-to-end: - Pixel 3 (Android 12): output/2026.05.12_172526/ — 8/8 strict UI pass - Pixel 9 (Android 16): output/2026.05.12_182313/ — 8/8 strict UI pass Both rows in specs/tracking_matrix.csv are tagged energy_invalid_usb_supplying as expected — the float-charge masking issue is hub-ctrl territory (Epic 1.6), not a harness issue. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three new experiment configs for the 2026-05-12 cohort smoke runs, plus per-app Documenter hooks (uninstall + interaction). Pattern matches existing metronome_bangcle_pixel9.json: - repetitions: 1 (smoke; bump to 3 for matrix sweep) - duration: 120000 ms (profiler ceiling; workload is N-scenario bounded) - interaction_covers_duration: true - batterymanager + android (CPU/mem) profilers both enabled - paths points at the signed packed APK at /home/irena/Documents/Master Thesis/APKs/<slug>_protected.signed.apk - application_id matches each app's manifest debug-suffix (or not): com.tips.tipuous.debug, com.viliussutkus89.documenter.debug, com.amrdeveloper.linkhub Hooks: - Scripts/before_experiment_uninstall_documenter.py: standard chain pattern (device-state + BATTERY_STATS grant + uninstall) with PACKAGE=com.viliussutkus89.documenter.debug. - Scripts/interaction_appium_documenter.py: thin wrapper importing appium_android_tests.documenter and calling run_workload — same pattern as the metronome / tipuous / linkhub wrappers. Smoke results from these configs all in 2026-05-12 evening session: - documenter_bangcle_pixel9.json (run 2026.05.12_210020): 5/7 strict - linkhub_bangcle_pixel9.json (run 2026.05.12_210316): 6/7 strict - tipuous_bangcle_pixel9.json (run 2026.05.12_204540): 6/7 strict Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…onfigs + Android 16 dialog dismissal
New per-app uninstall + interaction hooks:
- Scripts/before_experiment_uninstall_{avnc,calculator,crcontainer,diaguard,gallerywall,horoscapp,iamspeed,pdfviewer}.py
- Scripts/interaction_appium_{avnc,calculator,crcontainer,diaguard,gallerywall,horoscapp,iamspeed,pdfviewer}.py
- interaction_appium_iamspeed.py pre-grants ACCESS_FINE/COARSE_LOCATION + POST_NOTIFICATIONS,
forces GPS off (settings put secure location_mode 0) to avoid the SecurityException crash
cycle from the upstream FOREGROUND_SERVICE_LOCATION manifest bug.
New experiment configs (Pixel 9 baseline + stub-bangcle for the 3 Option-D-unlocked apps):
- {avnc,calculator,crcontainer,diaguard,gallerywall,horoscapp,iamspeed,pdfviewer}_baseline_pixel9.json
- {calculator,gallerywall,iamspeed}_stub_bangcle_pixel9.json
- {crcontainer,horoscapp,pdfviewer}_bangcle_pixel9.json
- diaguard_bangcle_pixel{3,6}.json (legacy from morning's empirical 3-condition experiment)
after_launch.py: aggressive 10s polling for Android 16 PageSizeMismatchDialog with
"Don't Show Again" persistent flag tap. Required because every Bangcle pack triggers
the dialog (libSecShell.so isn't 16-KB-aligned) AND AndroidRunner's fresh-install-per-run
resets the per-package flag. Without this, multi-Activity Bangcle smokes hang for
12+ minutes waiting for foreground transitions.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ixel 9 - Introduced baseline configurations for the following applications: - PdfViewer - PodAura - PoetsKingdom - Repertoire - Tipuous - YtAlarm - Created corresponding WiFi-mode copies for each application to ensure compatibility with USB power management. - Added Bangcle-protected configurations for PodAura, PoetsKingdom, and Tipuous, including necessary scripts for pre- and post-experiment handling. - Ensured all configurations include appropriate profiling settings for battery and memory usage monitoring.
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.


No description provided.