Direct mobile device control via Appium. iOS (XCUITest) and Android (UIAutomator2).
A thin, editable harness for putting LLM agents on real phones. The agent perceives the device via UI tree + screenshots, acts via low-level taps and swipes, and writes its own per-app skills as it learns.
Connect an LLM directly to a real phone with a thin, editable harness. The agent perceives the screen, reasons about what to do, and acts — no app-specific APIs needed.
agent: wants to send a text
│
ui_tree() → finds compose field, send button
│
tap(field) → type_text(...) → tap(send)
│
message sent — works on iPhone and Android
The shortest path from "phone in hand" to "LLM agent driving it" — on macOS, Linux, or Windows, over USB or Wi-Fi, one device or ten:
- One-command install that probes what's missing (
mobile-use bootstrap) and a doctor that reads your actual config and tells the truth. - Wireless that remembers: pair once (
android pairsurvives reboots),--persistsaves the device,wifi reconnect(or the session itself) re-establishes after host reboots and DHCP changes. - Multi-device without port juggling: one shared Appium server, per-device
driver ports auto-assigned collision-free,
DevicePool.from_remembered(). - Agent-native: built-in agent loop with multimodal grounding, a
dependency-free MCP server (
mobile-use mcp), curated action surface with a destructive-verb gate, and an interactive live viewer.
Honest feature matrix vs raw Appium, Maestro, mobile-mcp, DroidRun, AppAgent, and scrcpy — including where they win: docs/comparison.md.
Three commands, in order. Each one is idempotent — re-running is safe.
Install from git.
pip install mobile-usefrom PyPI is a DIFFERENT, unrelated project that happens to share the name — install from this repo.
git clone https://github.com/jackulau/mobile_use.git && cd mobile_use
pip install -e . # installs the mobile-use / iphone-harness / android-harness CLIs
mobile-use bootstrap # installs Appium + xcuitest + uiautomator2 + brew/node deps
mobile-use init # auto-detects connected device, writes .env (prompts for Apple Team ID on iOS)
mobile-use quickstart # doctor + smoke test — prints "ready" or the first thing to fixmobile-use bootstrap accepts --dry-run (preview only), --ios-only, --android-only.
mobile-use init accepts --yes (non-interactive — defaults for everything).
mobile-use quickstart auto-detects platform when one device is paired; pass --ios / --android to disambiguate.
If anything fails:
mobile-use --doctor # numbered checks with one-line remediations
iphone-harness --reload # nuke the daemon (rare but kills weird stale state)
mobile-use ios sign-wda # iOS: re-sign WebDriverAgent (the #1 setup blocker)
mobile-use ios build-wda # iOS: build the WDA test target (first-run setup)
mobile-use quickstart --autostart-appium # spawn Appium server in backgroundSee SETUP.md for the manual / per-step appendix, including a
troubleshooting decision tree.
Android-on-Linux is a first-class target. mobile-use bootstrap auto-detects
your package manager (apt, dnf, pacman, zypper, apk) and installs adb, node,
and the Appium uiautomator2 driver natively — no Homebrew required.
# Linux host (any apt/dnf/pacman/zypper/apk distro):
pip install -e .
mobile-use bootstrap --android-only
mobile-use init --android-only
mobile-use quickstart --androidiOS on Linux requires a Mac somewhere in the loop (Xcode + Apple codesigning are macOS-only by Apple). Two patterns:
- Remote daemon (TCP) — Linux runs zero daemon locally; talks to a
remote
iphone-harnessdaemon on a Mac via TCP:# On the Mac (one shot): IPH_BIND=tcp://127.0.0.1:8763 iphone-harness -c 'pass' # On Linux (in another shell): ssh -L 8763:127.0.0.1:8763 <mac-host> mobile-use --ios --remote-daemon tcp://127.0.0.1:8763 -c 'print(active_app())'
- Remote Appium URL —
IPH_APPIUM_URL=http://<mac>:4723lets a local iphone-harness on Linux talk to a Mac running just Appium+WDA.
See SETUP.md → "iOS from Windows / Linux"
for the full walkthrough.
Android-on-Windows is a first-class target. adb and Appium are
cross-platform — install the Android platform-tools (adb on PATH) plus
Node + Appium, then:
# Windows host (PowerShell):
pip install -e .
mobile-use bootstrap --android-only # winget steps for adb + node, npm appium install
mobile-use quickstart --androidThe daemon transport auto-selects TCP loopback on Windows (the AF_UNIX
sockets used on macOS/Linux are Unix-only). Each named device gets a
deterministic loopback port, so multi-device routing, devices status/reload,
and the viewer all work exactly as on macOS/Linux — no configuration needed.
iOS on Windows needs a Mac in the loop (Xcode + Apple codesigning are
macOS-only) — use the same remote-Mac bridge as Linux above
(SETUP.md → "iOS from Windows / Linux").
mobile-use devices list # auto-detect every connected iOS + Android
mobile-use devices status # show which named daemons are running
mobile-use devices reload --all # restart every named daemonPython API mirrors the CLI — no manual UDID lookup, no port juggling:
from mobile_use import DevicePool
pool = DevicePool.from_connected(
xcode_org_id="ABCDE12345", # iOS — set once for every iPhone in the pool
wda_bundle_id="com.you.wda",
)
pool.ensure_all_ready() # parallel daemon spawn, isolated Appium ports
pool.broadcast(lambda d: d.tap_at_xy(200, 400))
pool.broadcast(lambda d: d.screenshot()) # → {name: {"result": png_bytes}}Each device gets its own daemon socket (/tmp/iph-<name>.sock,
/tmp/anh-<name>.sock) and its own auto-allocated Appium port in
4724-4799 so multiple iPhones / Pixels can run side by side without
collisions. Override with appium_url= if you need a specific port or a
remote Appium server.
End-to-end example: docs/demos/multi-device-broadcast.py.
Watch every screen at once:
mobile-use devices view # open all connected devices in a grid (browser)
mobile-use devices view --port 8765 --no-browser
mobile-use devices view --devices iphone-A,pixel-1 # cherry-pick┌─ multi-device live view ────────────────── 3/3 streams live ─┐
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ ios/iphone-A │ │ ios/iphone-B │ │ android/px-1 │ │
│ │ [screen] │ │ [screen] │ │ [screen] │ │
│ │ 4.0fps · #N │ │ 4.0fps · #N │ │ 4.0fps · #N │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
└──────────────────────────────────────────────────────────────┘
One HTTP server, one auto-allocated port, N MJPEG streams under /stream/<name>.
Loopback-only, read-only mirror. Single-device view still works via --headed.
Example: docs/demos/multi-device-viewer.py.
from iphone_harness.helpers import wake_device, retry_on_disconnect, record_screen
wake_device() # screen-off / locked? wake it.
@retry_on_disconnect(max_attempts=3) # USB blip / WDA crash → auto-restart + retry
def run_script():
tap(find(label="Compose"))
type_text("hello")
record_screen(duration=10) # save mp4 to /tmp (XCUITest + UIAutomator2)
# record/replay a tap sequence (dumb — literal replay):
from mobile_use import record_replay
import iphone_harness.helpers as h
record_replay.start_recording("flow.py", helpers=h)
# ... your taps/swipes/typing ...
record_replay.stop_recording() # writes runnable flow.py
record_replay.replay("flow.py") # play it back
# smart macro — annotate intent + LLM re-targets when the UI shifts:
with record_replay.recording("compose.py", helpers=h):
with record_replay.annotate("open compose screen"):
h.tap(h.find(label="Compose"))
with record_replay.annotate("type message body"):
h.type_text("hello")
# replay_smart re-finds buttons via your LLM when labels / layout move
record_replay.replay_smart("compose.py", helpers=h, llm=my_llm_callable)CLI equivalent — mobile-use macro record <name> opens a REPL with helpers + recording active; mobile-use macro replay <name> --smart adapts steps when the UI shifts. See docs/macros.md for the full walkthrough.
brew install libimobiledevice ideviceinstaller android-platform-tools node
npm i -g appium
appium driver install xcuitest # iOS only
appium driver install uiautomator2 # Android only
pip install -e .
cp .env.example .env # fill in IPH_UDID / IPH_XCODE_ORG_ID / IPH_WDA_BUNDLE_ID and/or ANH_UDIDPlug in iPhone — Trust This Computer, Settings → Privacy & Security → Developer Mode → On, trust the WDA developer profile. Plug in Android — enable USB Debugging, tap Allow on this computer.
pip install -e . installs the CLI commands (mobile-use, iphone-harness, android-harness) into your Python scripts directory. If they're not on your PATH after install, either:
# Option 1: find and add the scripts directory to PATH
python3 -m site --user-base # shows e.g. /Users/you/.local
export PATH="$(python3 -m site --user-base)/bin:$PATH"
# Option 2: run via Python directly
python3 -m mobile_use.cli --version
python3 -m iphone_harness.run --doctor
python3 -m android_harness.run --doctorpython3 -c "import mobile_use; print(mobile_use.__version__)" # should print 0.1.0
iphone-harness --version # or: python3 -m iphone_harness.run --version
android-harness --version # or: python3 -m android_harness.run --versionThree CLI entry points — platform-specific or unified:
# Start Appium (shared server for both platforms):
appium --base-path /
# Platform-specific:
iphone-harness --doctor
iphone-harness -c 'print(active_app())'
android-harness --doctor
android-harness -c 'print(active_app())'
# Unified CLI (auto-detects platform when one device connected):
mobile-use --doctor
mobile-use -c 'print(active_app())'
mobile-use --ios -c 'print(active_app())'
mobile-use --android -c 'print(active_app())'iphone-harness -c '
appium("mobile: launchApp", bundleId="com.apple.MobileSMS")
wait_for_app("com.apple.MobileSMS")
field = wait_for_element(name="messageBodyField", timeout=5.0)
tap(field)
type_text("hello from mobile-use")
tap(find(type="XCUIElementTypeButton", name="sendButton"))
'android-harness -c '
appium("mobile: startActivity", package="com.google.android.apps.messaging", activity=".ui.ConversationListActivity")
wait_for_app("com.google.android.apps.messaging")
btn = wait_for_element(content_desc="Start chat", timeout=5.0)
tap(btn)
'Persistent interactive REPL with session continuity — state persists between runs:
mobile-use agent --ios # iOS agent loop
mobile-use agent --android # Android agent loop
mobile-use agent # auto-detect platform
mobile-use agent --session mytest # named sessionInside the agent REPL, all helpers are pre-imported. Extra bindings: agent, session, perceive(), act().
The agent loop's hotspot is the VLM round-trip on every step. Three OFF-by-default
layers cut it down, each degrading cleanly to the next (yolo → template → tree → VLM):
# 1. Perception/action cache (ON by default): a repeated identical screen replays the
# last action and skips the LLM. Disable with MU_PERCEPTION_CACHE=0.
# 2. Template matcher — grounds tree-less screens (games/canvas/web views) from
# captured element crops. Needs the [detection] extra:
pip install 'mobile-use[detection]'
MU_LOCAL_DETECTOR=1 mobile-use agent --ios
# 3. Trained YOLO-nano detector — the primary local grounding path (one forward pass).
# Distill a detector from the self-labeling dataset (every grounded tap records a
# free training sample), then serve it:
pip install 'mobile-use[yolo]'
mobile-use train-detector --train --epochs 80 # -> runs/train/weights/best.pt
MU_YOLO_DETECTOR=1 MU_DETECTOR_WEIGHTS=runs/train/weights/best.pt mobile-use agent --ios
# Let a confident, task-named match TAP DIRECTLY and skip the VLM for that step (even
# when the tree exists). OFF by default — a wrong match is a real tap with no VLM gate:
MU_LOCAL_SHORTCIRCUIT=1 MU_YOLO_DETECTOR=1 MU_DETECTOR_WEIGHTS=best.pt mobile-use agentMeasure the win on real screenshots (modeled VLM latency, real local wall-clock):
mobile-use bench-perception # synthetic (modeled) baseline
mobile-use bench-perception --images ./shots --weights best.pt # REAL measuredNo device or labels yet? Generate a synthetic seed dataset to exercise the whole
dataset → train → weights → ground pipeline (mobile_use.synthetic_ui.generate_seed_dataset).
Confidence gate for both detectors: MU_DETECTOR_MIN_CONF (default 0.78). See
SETUP.md for the full env-var reference (and the polars-lts-cpu note for
training on older CPUs).
Training is self-validating: train-detector --train only reports trained after the
produced checkpoint actually loads and runs one inference (else trained_unverified), aborts
early on an empty dataset, and resolves the bare yolov8n.pt base model to the committed
repo-root copy so an offline run never triggers an implicit download.
Beyond perception, the loop's per-step side-effect costs were profiled and cut.
Deterministic counts (asserted in tests/test_step_overhead.py; wall-clock is
reported, never asserted):
| Per-step cost | Before | After | How |
|---|---|---|---|
| Session JSON full-file writes | 2 | 1 | unchanged current_app no longer rewrites the file |
| Screenshot PNG copies (same screen) | 1/step | 1 total | content-addressed store — hash decides, copy only when new |
Pre-act auto_dismiss device RPCs |
1/act | 0 | skipped when the fresh same-step snapshot showed no alert (MU_PREACT_DISMISS) |
get_available_actions introspection |
every LLM step | once per module | _ACTIONS_MEMO |
| YOLO checkpoint deserializes (startup) | 2 | 1 | verification load is kept |
Also: idevice_id + adb detect probes run concurrently (bare mobile-use
cold start worst case ~1.5s, was ~3s), ensure_daemon trusts a verified probe
for IPH_ENSURE_TTL/ANH_ENSURE_TTL seconds (default 10), gesture settle
sleeps scale via IPH_GESTURE_SETTLE/ANH_GESTURE_SETTLE (default 1.0 = stock;
0 for emulators/CI), and the collector's per-row UI-tree dump is compact+capped
(MU_COLLECT_TREE=full restores raw). Crops are now named per-sample — the old
source-basename naming silently overwrote every crop into one file.
Dev velocity: the suite is pytest-xdist-safe — pip install 'mobile-use[dev]'
then pytest -n auto tests -q (~30-40s on 8 workers vs ~2-3 min serial).
mobile-use selfcheck # dep-rung matrix + action surface + training smoke (device-free)
mobile-use selfcheck --train # also run a bounded 1-epoch real YOLO train (needs [yolo])selfcheck reports which local-grounding rungs are live (and why not), confirms the action
verbs are consistent across platforms, and runs the synthetic dataset → build → ground smoke —
exit 0 iff the core invariants hold. (For device connectivity use mobile-use --doctor.)
Every action the agent dispatches is also argument-validated before it runs (unknown kwarg /
missing required arg / non-numeric coordinate → a clean error, never a blind call into the daemon).
Drive multiple iOS and Android devices simultaneously:
from mobile_use import DevicePool
pool = DevicePool()
pool.add_ios("iphone1", udid="00008030-XXX", xcode_org_id="ABC", wda_bundle_id="com.me.wda")
pool.add_android("pixel", udid="SERIAL123")
pool.ensure_all_ready()
# Drive all devices
for dev in pool.devices:
print(dev.name, dev.active_app())
# Drive a specific device
pool["iphone1"].tap_at_xy(200, 400)
pool["pixel"].press_home()
# Parallel execution across all devices
results = pool.broadcast(lambda d: d.screenshot())
# Platform-filtered broadcast
pool.broadcast_ios(lambda d: d.active_app())
pool.broadcast_android(lambda d: d.press_home())Each device gets its own named daemon instance (IPH_NAME / ANH_NAME) with
separate sockets. All pool devices share ONE Appium server (4723, or your
IPH/ANH_APPIUM_URL) — simultaneous sessions are isolated by auto-assigned
per-device driver ports (appium:systemPort / appium:wdaLocalPort /
appium:mjpegServerPort), deterministic per name and collision-free under
concurrent pool builds. Your own caps always win. Pass appium_url= per
device for a dedicated server (e.g. a remote Mac).
Build pools without typing UDIDs:
pool = DevicePool.from_connected() # every USB/Wi-Fi device discovered now
pool = DevicePool.from_remembered() # every wireless device saved by --persist
pool.add_ios("wifi-iphone", udid="...", wda_url="http://iPhone.local:8100") # cable-free memberBy default mobile-use is headless: scripts run, the daemon talks to the
device, you see no UI. Add --headed to spin up a local MJPEG viewer in
your browser and watch the live device screen mirror while the script runs:
mobile-use --ios --headed -c 'tap_at_xy(100, 200); time.sleep(2)'
# → opens http://127.0.0.1:<random-port>/ in your default browser
# → live mirror at ~6 fps, JPEG quality 60 (knobs in mobile_use/viewer/server.py)The viewer is interactive: click the screen to tap that point on the
device, type into the send box (or straight onto the page), and use the
home button — with a visible control on/off toggle. Set
MOBILE_USE_VIEWER_READONLY=1 (or --read-only on devices view) for a
plain mirror. Use --headless (or omit the flag) to skip the viewer
entirely. Works on iOS and Android.
Quality knobs (via Python API, when running in agent mode):
from mobile_use.viewer.server import ViewerServer
v = ViewerServer(platform="ios", fps=12, quality=80, max_dim=1200)
v.start(); print(v.url)
# ...
v.stop()Windows hosts can't build WebDriverAgent (no Xcode). Drive iOS via a Mac on the network running the daemon over TCP:
# On the Mac (one time): full Part A in SETUP.md
# On the Mac (each session):
IPH_BIND=tcp://127.0.0.1:8763 iphone-harness -c 'pass'
# On Windows / Linux:
ssh -L 8763:127.0.0.1:8763 user@mac.local # SSH tunnel (recommended)
mobile-use --ios --remote-daemon tcp://127.0.0.1:8763 -c 'print(active_app())'
# Add --headed to also see the live screen mirror in your local browser:
mobile-use --ios --remote-daemon tcp://127.0.0.1:8763 --headed -c '...'Full walkthrough + security caveat: SETUP.md → "iOS from Windows / Linux (remote Mac bridge)".
mobile-use tracks the device-OS and Appium-toolchain versions it's verified against.
The matrix lives in mobile_use/versions.py and is printed by mobile-use --doctor:
| Component | Supported | Notes |
|---|---|---|
| iOS | 15 – 26 | iOS >= 17 needs the RemoteXPC tunnel (USB or Wi-Fi) |
| Android | 8 – 16 | UiAutomator2; Wi-Fi via mobile-use android wifi <ip> |
| Appium server | >= 2.0.0 | 3.x recommended |
| xcuitest-driver | >= 5.0.0 | >= 10.0.0 requires Appium 3 |
| uiautomator2-driver | >= 3.0.0 | Android driver |
A newer OS than the tested max usually works — --doctor flags it "untested-newer"
rather than blocking. The doctor compares your installed Appium + drivers to this matrix
and warns (never blocks) when something is out of range.
iOS 17+ (incl. iOS 26): Apple replaced lockdownd with RemoteXPC, so Appium reaches
WebDriverAgent only through a tunnel — Appium's bundled appium-ios-remotexpc, or
sudo pymobiledevice3 remote tunneld. This applies over USB and Wi-Fi; without it,
session create fails with RSDRequired / InvalidServiceError.
Drive a phone over Wi-Fi — no cable tethered during the run.
iOS — attach to WebDriverAgent over Wi-Fi. WDA must be installed + running (USB once), and on iOS 17+ the RemoteXPC tunnel must be up. Then point Appium at the iPhone's Wi-Fi IP (WDA's default port is 8100):
# .env (or export):
IPH_WDA_URL=http://192.168.1.50:8100
mobile-use --ios -c 'print(active_app())'mobile-use --doctor preflights IPH_WDA_URL reachability before connecting. Under the
hood this sets Appium's appium:webDriverAgentUrl; an IPH_CAPS override still wins.
Android — adb over Wi-Fi. One command switches a USB-connected device to TCP, connects, and prints the serial to use:
mobile-use android wifi 192.168.1.42 --persist # adb tcpip + connect; saves ANH_UDID
# -> .env updated AND device remembered (store: ~/.mobile_use/wifi_devices.json)
mobile-use --android -c 'print(active_app())'
mobile-use android wifi 192.168.1.42 --disconnect # drop the wireless linkNo cable, ever (Android 11+): pair via Wireless debugging — pairing
survives device reboots, unlike plain adb tcpip:
mobile-use android pair 192.168.1.42:37123 123456 # ip:port + code from the pairing dialog
mobile-use android wifi 192.168.1.42 --persistRemembered devices auto-reestablish. --persist (both platforms) writes
the remember-store; reconnect everything after a host reboot / network change
with one command — or let the session self-heal (the daemon ensure path
retries wifi devices automatically):
mobile-use devices remembered # what's saved (+ last_seen)
mobile-use wifi reconnect # android: adb connect; ios: mDNS re-resolvemobile-use devices list shows a TRANSPORT column (usb / wifi) per device —
including Wi-Fi-only iPhones (idevice_id -n is merged into discovery).
Full walkthrough incl. the iOS tunnel: SETUP.md → "Wireless (Wi-Fi) control".
| File | What |
|---|---|
alerts.md |
System vs. in-app alerts; accept/dismiss patterns |
home-bar-tap-zone.md |
Why taps in the bottom ~80px fail |
native-screenshot.md |
Saving images to Photos via AssistiveTouch |
ocr-fallback.md |
Apple Vision OCR when accessibility tree fails |
picker-wheels.md |
Driving date/time/value picker wheels |
scroll-into-tappable-zone.md |
Auto-scroll out of home-bar zone |
wait-for-animations.md |
Poll-for-element patterns |
| File | What |
|---|---|
navigation-bar.md |
Back/Home/Recents — the Android nav bar zone |
permissions.md |
Runtime permission dialogs and granting patterns |
notifications.md |
Notification shade interaction |
toasts.md |
Toast messages — transient, not in accessibility tree |
webview.md |
Switching between native and webview contexts |
Domain skills live in agent-workspace/domain-skills/<bundleId-or-package>/. Set IPH_DOMAIN_SKILLS=1 (iOS) or ANH_DOMAIN_SKILLS=1 (Android) and call domain_skills(id) after launching an app.
| Platform | App | Skill |
|---|---|---|
| iOS | Amazon | buy-now.md |
| iOS | Chess.com | play-a-bot.md |
| iOS | navigation.md, post-photo.md |
|
| iOS | post.md |
|
| iOS | Messages | send-text.md, tapback-reaction.md |
| iOS | Clock | create-alarm.md |
| iOS | Settings | auto-lock.md |
| iOS | X (Twitter) | post.md |
Bundled skills + helpers for the most common "the phone is full / messy" tasks
on both platforms. Capability matrix and gap analysis:
docs/cleanup-capability.md.
| Helper | What |
|---|---|
list_installed_apps() |
iOS: scrapes Settings → iPhone Storage. Android: pm list packages -3 with Settings fallback. |
uninstall_app(id_or_label) |
Dispatches to platform-specific uninstall. Returns {ok, action, reason}. |
storage_summary() |
Used / Free / Total. Display strings — parse if needed. |
bulk_select(items, deletion_button="Delete") |
Generic Select-mode → tap-each → Delete pattern. |
confirm_destructive(label="Delete", timeout=4.0) |
Waits for the confirmation alert and taps it. |
| Platform | App | Skill |
|---|---|---|
| iOS | SpringBoard | uninstall-app.md, organize-home-screen.md, app-library.md |
| iOS | Settings | iphone-storage.md, clear-safari-data.md, screen-time-limits.md |
| iOS | Photos | bulk-delete-photos.md, empty-recently-deleted.md, delete-by-album.md |
| iOS | Files | browse-and-delete.md, empty-downloads.md, empty-files-recently-deleted.md |
| Android | Settings | uninstall-app.md, storage-cleanup.md, clear-app-cache.md |
| Android | Pixel Launcher | long-press-uninstall.md, organize-home-screen.md, app-drawer.md |
| Android | Files by Google | cleanup.md |
| Android | Google Photos | bulk-delete.md, empty-bin.md |
# iOS — inventory + folder organize + uninstall a test app + empty Photos bin
python3 docs/demos/clean-and-organize-ios.py
# Preview only (no destructive ops)
DRY_RUN=1 python3 docs/demos/clean-and-organize-ios.py
# Android equivalent — opt in to uninstall by setting TEST_PACKAGE
python3 docs/demos/clean-and-organize-android.py
TEST_PACKAGE=com.example.junkapp python3 docs/demos/clean-and-organize-android.pypython3 -m pytest tests/test_cleanup_skills.py -xNo device required — tests read skill files and the helpers module from disk.
Out-of-scope (documented, not implemented): rooting/jailbreak, bypassing
Screen Time PIN, cloud-side deletes, OEM-launcher-specific recipes outside
Pixel/AOSP. See docs/cleanup-capability.md.
Two parallel harnesses sharing the same Appium server:
┌──────────────────┐
iphone-harness -c ──► │ iphone_harness │ ──► Appium ──► XCUITest/WDA ──► iPhone
│ daemon (iph-*) │ :4723
└──────────────────┘
┌──────────────────┐
android-harness -c ──► │ android_harness │ ──► Appium ──► UIAutomator2 ──► Android
│ daemon (anh-*) │ :4723
└──────────────────┘
run.py—iphone-harnessCLIhelpers.py— public action API (tap, swipe, find, screenshot, ocr, ...)daemon.py— long-lived process owning the Appium/XCUITest sessionadmin.py— daemon lifecycle + doctor_ipc.py— AF_UNIX JSON-line RPC
run.py—android-harnessCLIhelpers.py— public action API (tap, swipe, find, screenshot, ocr, ...)daemon.py— long-lived process owning the Appium/UIAutomator2 sessionadmin.py— daemon lifecycle + doctor_ipc.py— AF_UNIX JSON-line RPC
cli.py— unifiedmobile-useCLI with platform auto-detectionmultibox.py— multi-device support (Device,DevicePool)agent_loop.py— persistent agent loop (perceive → reason → act cycle)session.py— session continuity (state persists between agent runs)skills.py— auto skill authoring (writes.mdfiles for discoveries)agent-workspace/— agent-editable helpers + domain skillsinteraction-skills/— iOS UI mechanicsandroid-interaction-skills/— Android UI mechanics
Both harnesses expose the same core API. Platform-specific extras noted.
# Perception
screenshot(path=None) → str path on host
window_size() → {'width', 'height'}
ui_tree(visible_only=False) → list[dict]
find(...) → element or None
find_all(...) → list[element]
active_app() → dict
ocr(image_path=None) → (lines, (px_w, px_h))
find_text(query, ...) → line dict or None
annotated_screenshot(path=None) → (annotated_path, items)
page_source() → raw XML
# Input
tap_at_xy(x, y)
tap(element)
tap_safe(element, refind=callable)
double_tap(x, y)
long_press(x, y, duration=1.0)
swipe(x1, y1, x2, y2, duration=0.4)
scroll(direction='down')
scroll_by(dy=-400)
type_text(text)
click(selector/predicate, ...)
send_keys(selector/predicate, keys, ...)
set_value(selector/predicate, value, ...)
paste_text(text, ...)
# Device
unlock()
# Navigation (both platforms — Android native buttons, iOS gesture equivalents)
press_home() # both — go to home screen
press_back() # Android: back key; iOS: swipe-from-left edge
press_recents() # Android: recents; iOS: app switcher
swipe_back() # iOS: explicit edge-swipe (alias for press_back on iOS)
open_app_switcher() # iOS: swipe up + pause
# iOS-only
native_screenshot() # saves to iPhone Photos
set_assistive_touch(on=True)
open_control_center()
close_control_center()
ensure_cc_tile(label)
start_screen_recording()
stop_screen_recording()
# Android-only
open_notifications()
close_notifications()
grant_permission(package, permission)
# Waits
wait(seconds=1.0)
wait_for(predicate, timeout=10.0)
wait_for_element(...)
wait_for_app(bundle_id_or_package)
# Alerts
alert()
alert_accept()
alert_dismiss()
# Skill discovery
domain_skills(bundle_id_or_package)
# Escape hatch — anything the driver supports
appium('mobile: anything', **params)
iOS (iphone-harness) |
Android (android-harness) |
|
|---|---|---|
| Element IDs | label, name (NSPredicate) |
text, resource_id, content_desc |
| Element types | XCUIElementTypeButton, etc. |
android.widget.Button, etc. |
| App identifier | bundleId |
package + activity |
| find() params | label=, name=, type=, value= |
text=, resource_id=, type=, content_desc= |
| click() selector | iOS NSPredicate string | UiSelector / XPath / accessibility_id / resource ID |
| Danger zone | Bottom ~80px (home bar gesture) | Bottom ~48dp (navigation bar) |
| Setup pain | Apple signing + WDA provisioning | USB debugging toggle |
PRs welcome — fork the repo, use it for real tasks, push your improvements back.
The most valuable contributions are new skills:
- Domain skills (
agent-workspace/domain-skills/<id>/*.md) — per-app playbooks for apps on either platform - Interaction skills (
interaction-skills/*.mdorandroid-interaction-skills/*.md) — reusable UI mechanics - Bug fixes and harness improvements
Don't write skills from memory. Use the harness for a real task, let the agent figure out the non-obvious parts, and PR the generated .md file. Hand-authored skills lie. Agent-generated skills reflect the actual UI tree.
- Pixel coordinates — use accessibility predicates instead
- Secrets or personal data — the directory is public
- Task narration — capture the map, not the diary
Released under the MIT License. See LICENSE.
Built by @jackulau.