Skip to content

Feature/homegate mobile api arm64 fix#348

Closed
domisko wants to merge 24 commits into
orangecoding:masterfrom
domisko:feature/homegate-mobile-api-arm64-fix
Closed

Feature/homegate mobile api arm64 fix#348
domisko wants to merge 24 commits into
orangecoding:masterfrom
domisko:feature/homegate-mobile-api-arm64-fix

Conversation

@domisko

@domisko domisko commented Jun 19, 2026

Copy link
Copy Markdown

No description provided.

domisko and others added 24 commits June 18, 2026 00:36
Add DeepL translation for listing descriptions
Adds a /api/listings/:id/commute endpoint that fetches walking, cycling
and driving directions from a listing to the user's commute destination
in parallel using Promise.allSettled. The ORS API key is stored in admin
settings. Commute times are displayed automatically on listing open.
Add commute times feature via OpenRouteService
The Finish button appeared broken when the user's session had expired:
the backend save failed silently, so the in-memory state never updated
and the modal stayed open. On reload it would reappear because nothing
was persisted.

Fix: update store state optimistically before the API call so the modal
closes immediately regardless of backend result. Also write the dismissed
hash to localStorage as a fallback, so reloads don't re-show the modal
even when the backend save failed (e.g. due to session expiry).
Transitous (MOTIS v2) covers all four modes — walking, cycling, driving,
and public transit — for free with no API key required, making the ORS
key in admin settings unnecessary. The transit mode also shows transfer
count alongside the travel time.
- Walk: falls back to haversine straight-line estimate (~5km/h) when
  Transitous returns no route (API caps walk distance at ~500m);
  displayed as '~X min' to signal it is an approximation
- Commute times cached in localStorage for 24h keyed by listing +
  destination coords, avoiding redundant API calls on re-visits
- User-Agent updated to identify as a fork with contact email per
  Transitous usage policy
Homegate's React SPA only mounts listing cards that are within the
viewport on initial load. Add an autoScroll option to puppeteerExtractor
that scrolls incrementally to the bottom before capturing HTML, giving
the page time to render all cards. Enable it in the Homegate provider.
The previous autoScroll took a single HTML snapshot after scrolling to
the bottom. Homegate's React virtual list unmounts cards that leave the
viewport, so the snapshot only contained the last few visible items.

New approach: scroll one viewport at a time, collect outerHTML of every
result-list-item at each position (deduped by content), then return a
synthetic wrapper containing all cards. This captures all listings
regardless of whether they get unmounted after scrolling past.

Also fixes a bug where autoScroll ran before waitForSelector, causing
scrolling during the DataDome JS challenge before any listings existed.
Similarity cache / pipeline:
- Soft-delete similarity-filtered listings instead of hard-deleting, so
  their hashes stay in DB and _findNew skips them on the next run without
  hitting the similarity filter again. This breaks the store→filter→delete→
  re-store loop that caused all listings to be silently discarded.
- Include manually_deleted rows in getKnownListingHashesForJobAndProvider
  so soft-deleted (similarity-filtered) listings are not re-processed.
- Add source tracking to the similarity cache: each duplicate log line now
  shows which provider/job first saw the listing.
- Hard delete via UI still clears everything (including soft-deleted rows),
  allowing a clean restart.

Homegate virtual-list scraping:
- Dedup scroll-collected cards by their stable listing href instead of
  outerHTML. React renders slightly different HTML for the same card at
  different scroll positions, causing the same listing to be collected
  multiple times and then caught by the similarity filter.
- Add autoScrollDedupeSelector option to puppeteerExtractor for providers
  to specify which anchor to use as the dedup key.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- .nvmrc: bump from 16.14.0 to 22 so nvm use / nvm install work without
  manual overrides (package.json already requires >=22)
- README: add fork notice and Custom Features section listing Homegate,
  Transitous commute times, and the news-modal fix
- README: replace upstream docker run command with local docker compose
  build instructions and a note on how to sync from upstream

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Implement getListings() using the reverse-engineered Homegate mobile API
  (POST api.homegate.ch/search/listings, HMAC-SHA256 X-App-Id, Basic Auth).
  Currently exported but not wired into config — DataDome blocks the endpoint
  without the mobile SDK device token.
- On ARM64/ARM (Raspberry Pi), launch Chromium with --use-angle=swiftshader
  so the GPU renderer reports "SwiftShader" instead of the distinctive ARM GPU
  string (e.g. "V3D 4.2") that DataDome uses to fingerprint ARM scrapers.
- Add offline test fixture (homegate_api.json) and Nominatim + Homegate API
  stubs in buildFetchMock() so the homegate test passes without live network.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Fetch the Homegate search page HTML through the configured residential
proxy using plain fetch() + undici ProxyAgent, then extract listing data
from the server-side-rendered window.__INITIAL_STATE__ JSON embedded in
the page. No browser or Chromium needed — avoids all ARM64 fingerprint
issues on Raspberry Pi.

DataDome challenges are client-side JS injected after the server response;
the SSR JSON is present in the raw HTML when the request arrives from a
residential Swiss IP with Chrome headers. The proxy is read from Fredy
settings (Settings → Execution → Proxy URL) at runtime.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…iable extraction via `__INITIAL_STATE__` dynamic JavaScript evaluation, replacing `undici` proxy-based fetch with Puppeteer. Refactor listings extraction logic accordingly, preserving test compatibility using fixture-based browser mocks.
…taDome bypass

- Log proxy URL (masked) at launch so it is visible in Pi logs that the
  proxy setting is actually reaching CloakBrowser
- Add homepage warm-up visit before the search URL so DataDome sees an
  established session rather than a cold direct hit
- waitForFunction waits up to 25 s for __INITIAL_STATE__ after DataDome
  challenge resolves; falls back to page content preview on timeout
- Update offline test mock with url(), content(), waitForFunction() stubs

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The Swiss ImmoScout24 app exposes a JSON API at api.immoscout24.ch that
returns full listing data (title, price, size, rooms, address, images,
description) without requiring a headless browser. This works on ARM64
Raspberry Pi unlike the Homegate browser-based approach which is blocked
by DataDome's ARM64 canvas fingerprinting.

Flow:
  POST /search/listings → listing IDs
  GET  /listings/listings?ids=...&fieldset=srp-list → full details

DataDome protection requires a validated cookie obtained once via the
Android app + Charles Proxy (Max-Age=31536000, lasts ~1 year). Cookie
is stored in Settings → Execution → ImmoScout24 CH DataDome Cookie.
On 403, the provider attempts a fast-path retry with the cookie from
the Set-Cookie response header before surfacing the error.

URL format: https://www.immoscout24.ch/de/immobilien/mieten/ort-lausanne

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@domisko domisko closed this Jun 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant