Skip to content

Master experiment#111

Open
IrenaRistova wants to merge 17 commits into
S2-group:masterfrom
IrenaRistova:master-experiment
Open

Master experiment#111
IrenaRistova wants to merge 17 commits into
S2-group:masterfrom
IrenaRistova:master-experiment

Conversation

@IrenaRistova
Copy link
Copy Markdown

No description provided.

IrenaRistova and others added 17 commits May 12, 2026 16:18
- .gitignore: add examples/batterymanager/Scripts/.device_state_capabilities/
  (per-machine cache keyed by adb serial, written by before_experiment
  device-state verifier in E0.T8).
- requirements-appium.txt: pin Appium-Python-Client>=5.0 alongside the
  existing AndroidRunner requirements.txt so a single venv satisfies both
  AndroidRunner and the black-box Appium harness in appium_android_tests/.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…e harness

- AndroidRunner/NativeExperiment.py: tweaks to support
  interaction_covers_duration=true (so per-app Appium scripts that block
  on a workload thread don't get double-slept by the runner) and the
  Master Experiment device list.
- AndroidRunner/Plugins/batterymanager/Batterymanager.py: small
  adjustments to play nicely with the rebuilt BatteryManager companion
  fork (com.example.batterymanager_utility) used post-2026-05-08.
- devices.json: register Pixel 3, Pixel 6, Pixel 9 entries for the
  thesis device matrix.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
E0.T4 (tracking matrix): after_experiment hook appends one row per
(app, variant, device, run_id) to specs/tracking_matrix.csv, with
dedup-by-run_id semantics so re-runs replace rather than append.

E0.T4b (APK provenance hook): before_run writes apk_meta.json into the
per-run output dir; update_tracking_matrix.py populates apk_path /
apk_sha256 / apk_storage columns from it (no manual --apk-path flag
needed when the standard hook chain runs).

E0.T7 (three-window energy split): update_tracking_matrix.py post-hoc
slices the BatteryManager per-sample CSV into pre_workload / workload_only
/ whole_window using profiler-start, first-Appium-scenario-ts, and
last-Appium-scenario-ts as boundaries.

Files:
- Scripts/_lib_apk_meta.py: SHA-256 + apk_meta.json writer/reader.
- Scripts/update_tracking_matrix.py: full matrix updater (idempotent,
  stdlib-only, CSV dedup, 3-window slicer).
- Scripts/before_run.py: writes apk_meta.json before each run.
- Scripts/after_experiment.py: invokes update_tracking_matrix.py;
  wrapped per-step so failures don't crash the experiment.
- Scripts/after_launch.py: minor tweak to fit the chain.
- compute_energy_from_sysfs.py: standalone sysfs aggregator (fallback
  for devices where BatteryManager is gated; unused on the canonical
  Pixel 3 path but kept for noise-source experiments).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…averaged (E0.T5)

Scripts/detect_crash_anr.py classifies each run as one of:
none / crash / anr / system_error_dialog / lost_foreground, parsing
logcat (or BatteryManager-side logcat dumps when the adb_log persistency
strategy is enabled) and force-stopping the package so no zombie process
leaks into the next run's energy window. Writes crash_anr_status.json
into the per-run output dir with the classification + matching log
lines as evidence.

Scripts/after_run.py now invokes detect_crash_anr.py once per run.

Why this matters for the thesis: a single Bangcle-induced ANR run can
compress the median of three reps toward "low energy" because the app
died and the device idled. Without explicit isolation, that becomes a
silent bias toward protected variants. The matrix updater reads
crash_anr_status.json and populates the crash_anr_status column;
unknown values surface as notes=crash_anr_status_missing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…0.T8)

Scripts/_lib_device_state.py: shared helpers for setting/reading device
state — brightness lock, dumpsys battery unplug + status overrides,
current_now sampling, per-device capability cache (auto-gitignored).

Scripts/before_experiment_apply_device_state.py: once per experiment.
Sets screen_brightness_mode=0 + brightness=128 (persists across runs),
runs the canonical "dumpsys battery unplug" sequence, sleeps 5 s, then
reads /sys/class/power_supply/battery/current_now. Caches the verdict
(verified_discharge / suspected_supplying / unknown) at
.device_state_capabilities/<serial>.json.

Scripts/before_run_record_device_state.py: once per run. Captures
battery level, current_now, voltage_now, brightness, airplane mode,
third-party package count, BATTERY_STATS grant state. The matrix
updater turns abnormal fields into notes-column annotations:
energy_invalid_usb_supplying, energy_validity_unknown,
battery_stats_not_granted, brightness_drift, etc.

Strict mode (MASTEREXP_STRICT_DISCHARGE_CHECK=1) aborts the experiment
when discharge is suspect — for the eventual thesis batch where we
prefer fail-fast over silently-corrupted data. Default lenient mode
lets dev iteration continue and tags the row instead.

Software charge-disable is provably ineffective on Pixel 3 + Pixel 9
(charge IC ignores dumpsys battery unplug); hub-ctrl (Epic 1.6) is the
canonical mitigation. Until that lands, the discharge verdict tells us
which rows are usable for cross-variant energy comparison.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ly (E0.T10)

Scripts/before_experiment_grant_battery_stats.py: idempotent
pm grant com.example.batterymanager_utility BATTERY_STATS. Constraint 9
post-2026-05-08 supervisor meeting. Warn-not-fail on non-zero exit so a
broken companion install can't kill the run; broken state surfaces in
the matrix as notes=battery_stats_not_granted.

Scripts/before_experiment.py chains, in order:
  1. apply_device_state (E0.T8)        — physical-world controls first
  2. grant_battery_stats (E0.T10)       — permission state second
  3. (per-app uninstall, separate hook) — subject state third

Each link is wrapped in try/except so failure in one doesn't skip the
others; SystemExit from strict-mode discharge check propagates.

NOTE: the BATTERY_STATS grant was originally believed to unmask
BATTERY_PROPERTY_CURRENT_NOW. Controlled A/B 2026-05-08 evening and the
AOSP API surface both prove this is not the case — CURRENT_NOW is not
gated by BATTERY_STATS. We keep the grant as consistency hygiene
(so all three devices report the same permission state in
device_state.json) but it does not fix float-charge masking — that is
hub-ctrl territory (Epic 1.6). See docs/MEASUREMENT_NOISE_SOURCES.md §1
in the workspace repo for the retraction.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… linkhub

CONVENTIONS.md §10 contract: fresh uninstall between APK variants is
required because AndroidRunner does not by default uninstall a previous
variant — it overlays. That is wrong for cross-variant comparisons
because the previous variant's data dir, ART caches, and JIT state can
persist.

Each before_experiment_uninstall_<slug>.py hardcodes its app's manifest
package and calls device.uninstall(PACKAGE) at experiment start. The
metronome hook also chains apply_device_state + grant_battery_stats
(both idempotent) so the full pre-experiment sequence runs regardless
of which entry-point hook the config references.

Slug coverage matches the Epic 3 sub-tasks scaffolded as of 2026-05-09:
metronome (Pilot A), tipuous, repertoire, linkhub (the three no-INTERNET
parallel-subagent outputs).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…(E0.T1 / E0.T2)

Scripts/interaction_appium.py: generic dispatcher reading APPIUM_APP env
var; imports appium_android_tests.<APPIUM_APP> and calls its
run_workload(experiment, device). Writes appium_status.json with a
clean failure_reason string on every failure path (no Python tracebacks
into AndroidRunner). Documented at Scripts/README-appium-hooks.md.

Scripts/interaction_appium_TEMPLATE.py: copy-paste template for new
per-app wrappers. Each wrapper hardcodes APPIUM_APP=<slug> and
delegates to the dispatcher — useful when AndroidRunner configs reference
a script path rather than an env var.

Scripts/interaction_appium_metronome.py +
Scripts/interaction_appium_metronome_espresso_mirror.py: thin wrappers
for Pilot A (Kr0oked Metronome). Espresso-mirror variant sets
APPIUM_WORKLOAD=espresso_mirror to select the Set α scenario suite.
Both predate the generic dispatcher and remain the canonical path for
existing Metronome configs.

Scripts/interaction_appium_{tipuous,repertoire,linkhub}.py: per-app
wrappers for the three no-INTERNET apps scaffolded in Epic 3 wave 1
(2026-05-09 parallel subagents). Each delegates to its own per-app
module under appium_android_tests/<slug>/.

Scripts/interaction_monkey_only.py + Scripts/interaction.py changes:
monkey-only fallback retained for smoke-style runs that don't need
UiAutomator2 / strict-UI scoring. Documented for completeness only;
not part of the thesis-grade comparison.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ofiler (Epic 1.5)

Scripts/aux_postprocess.py reads the per-run csv emitted by
AndroidRunner's built-in 'android' profiler (Plugins/android/Android.py,
A-Mobile 2020 — data_points: [cpu, mem]) and aggregates into
aux/aux_summary.json with cpu_avg_pct, cpu_p95_pct, mem_pss_avg_mb,
mem_pss_max_mb. The matrix updater (E1.5.T5) reads aux_summary.json to
populate the four aux columns in specs/tracking_matrix.csv.

Goal: attribute energy deltas to underlying workload deltas (so the
thesis can say "Bangcle-packed runs use X% more CPU and Y% more PSS
than baseline" alongside "they use Z mWh more"). Without aux, energy
deltas have no causal context.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… JSONs (E0.T3)

_templates/ contains the four authoritative templates that all concrete
configs are materialized from:
- _templates/device_pixel{3,6,9}.json — per-device blocks (serial,
  adb-path, BatteryManager grants needed, ABI gotchas as _comments).
- _templates/app_variant_2min.json — the locked-invariant 2-minute
  per-(app, variant) experiment template with {{APP_ID}},
  {{APPLICATION_ID}}, {{APK_PATH}}, {{DEVICE_NAME}},
  {{INTERACTION_HOOK}} placeholders.

Concrete configs in two naming styles:
- New per-app convention <slug>_<variant>_<device>.json:
  linkhub_baseline_pixel3.json, repertoire_baseline_pixel3.json,
  tipuous_baseline_pixel3.json, metronome_bangcle_pixel9.json (the
  v2.1.1 protected-build pointer added 2026-05-12).
- Legacy monkey_* family kept for the existing Metronome Pilot A flow
  (Pixel 3 / Pixel 6 / Pixel 9 baseline + R8-obfuscated + Bangcle-packed
  espresso_mirror configs, plus a couple of plain-monkey smokes).

espresso_compare_apk_paths.json: ad-hoc validator config used to
cross-check that two APK paths point at the same manifest package.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
examples/batterymanager/README-experiments.md: per-config catalog —
what each .json runs, on which device, against which APK variant, with
which interaction hook. Lookup table for "which config do I copy when
adding a new (app, variant, device) cell?"

examples/batterymanager/README-templates.md: explains _templates/, the
placeholder substitution flow, and the "locked invariants" in
app_variant_2min.json that must NOT be changed when materializing
concrete configs (duration ceiling, repetitions, interaction_covers_duration,
time_between_run, BatteryManager profiler block).

examples/batterymanager/ESPRESSO_MIRROR_VALIDATION.md: thesis-defence
validation matrix — per-scenario relationship between the upstream
Metronome Espresso InstrumentedTest.kt and our Appium espresso_mirror
suite (Close / Partial / Touch-only / Not implemented). Updated for
the AndroT-snapshot scenario list; predates Set α (which dropped
tempoMarkingsWalk and added two audio scenarios — needs a Set α
addendum when time permits).

Scripts/README-appium-hooks.md: companion to
appium_android_tests/CONVENTIONS.md, scoped to the AndroidRunner side:
when to use the generic interaction_appium.py dispatcher vs a per-app
wrapper, the APPIUM_APP env-var contract, the appium_status.json
failure_reason vocabulary, and a 5-step "add a new app" checklist.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…smoke

Companion to examples/batterymanager/metronome_bangcle_pixel9.json
(committed 2026-05-12 in 420a34d). Same APK target, just devices block
swapped to "Pixel 3". Used 2026-05-12 to verify the v2.1.1 packed APK
installs and runs the Set α scenarios on Pixel 3 in addition to Pixel 9,
proving the cross-device install gate is finally cleared for the
protected variant (v1.7.2 Bangcle had only armeabi-v7a; v2.1.1 ships
all 4 ABIs).

Pixel 3 + Pixel 9 both selected arm64-v8a at install time on this
APK. Run IDs proving end-to-end:
  - Pixel 3 (Android 12): output/2026.05.12_172526/ — 8/8 strict UI pass
  - Pixel 9 (Android 16): output/2026.05.12_182313/ — 8/8 strict UI pass

Both rows in specs/tracking_matrix.csv are tagged
energy_invalid_usb_supplying as expected — the float-charge masking
issue is hub-ctrl territory (Epic 1.6), not a harness issue.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three new experiment configs for the 2026-05-12 cohort smoke runs,
plus per-app Documenter hooks (uninstall + interaction). Pattern
matches existing metronome_bangcle_pixel9.json:
- repetitions: 1 (smoke; bump to 3 for matrix sweep)
- duration: 120000 ms (profiler ceiling; workload is N-scenario bounded)
- interaction_covers_duration: true
- batterymanager + android (CPU/mem) profilers both enabled
- paths points at the signed packed APK at
  /home/irena/Documents/Master Thesis/APKs/<slug>_protected.signed.apk
- application_id matches each app's manifest debug-suffix (or not):
  com.tips.tipuous.debug, com.viliussutkus89.documenter.debug,
  com.amrdeveloper.linkhub

Hooks:
- Scripts/before_experiment_uninstall_documenter.py: standard chain
  pattern (device-state + BATTERY_STATS grant + uninstall) with
  PACKAGE=com.viliussutkus89.documenter.debug.
- Scripts/interaction_appium_documenter.py: thin wrapper importing
  appium_android_tests.documenter and calling run_workload — same
  pattern as the metronome / tipuous / linkhub wrappers.

Smoke results from these configs all in 2026-05-12 evening session:
- documenter_bangcle_pixel9.json (run 2026.05.12_210020): 5/7 strict
- linkhub_bangcle_pixel9.json (run 2026.05.12_210316): 6/7 strict
- tipuous_bangcle_pixel9.json (run 2026.05.12_204540): 6/7 strict

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…onfigs + Android 16 dialog dismissal

New per-app uninstall + interaction hooks:
- Scripts/before_experiment_uninstall_{avnc,calculator,crcontainer,diaguard,gallerywall,horoscapp,iamspeed,pdfviewer}.py
- Scripts/interaction_appium_{avnc,calculator,crcontainer,diaguard,gallerywall,horoscapp,iamspeed,pdfviewer}.py
- interaction_appium_iamspeed.py pre-grants ACCESS_FINE/COARSE_LOCATION + POST_NOTIFICATIONS,
  forces GPS off (settings put secure location_mode 0) to avoid the SecurityException crash
  cycle from the upstream FOREGROUND_SERVICE_LOCATION manifest bug.

New experiment configs (Pixel 9 baseline + stub-bangcle for the 3 Option-D-unlocked apps):
- {avnc,calculator,crcontainer,diaguard,gallerywall,horoscapp,iamspeed,pdfviewer}_baseline_pixel9.json
- {calculator,gallerywall,iamspeed}_stub_bangcle_pixel9.json
- {crcontainer,horoscapp,pdfviewer}_bangcle_pixel9.json
- diaguard_bangcle_pixel{3,6}.json (legacy from morning's empirical 3-condition experiment)

after_launch.py: aggressive 10s polling for Android 16 PageSizeMismatchDialog with
"Don't Show Again" persistent flag tap. Required because every Bangcle pack triggers
the dialog (libSecShell.so isn't 16-KB-aligned) AND AndroidRunner's fresh-install-per-run
resets the per-package flag. Without this, multi-Activity Bangcle smokes hang for
12+ minutes waiting for foreground transitions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ixel 9

- Introduced baseline configurations for the following applications:
  - PdfViewer
  - PodAura
  - PoetsKingdom
  - Repertoire
  - Tipuous
  - YtAlarm

- Created corresponding WiFi-mode copies for each application to ensure compatibility with USB power management.

- Added Bangcle-protected configurations for PodAura, PoetsKingdom, and Tipuous, including necessary scripts for pre- and post-experiment handling.

- Ensured all configurations include appropriate profiling settings for battery and memory usage monitoring.
@sonarqubecloud
Copy link
Copy Markdown

Quality Gate Failed Quality Gate failed

Failed conditions
2 Security Hotspots

See analysis details on SonarQube Cloud

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant