Google’s New AI Now Controls Your Computer With Built-In Tool
By 813 Staff
1.3 billion parameters running on a consumer laptop, and it can click a mouse. That’s the headline from Google DeepMind’s announcement yesterday, when the lab officially flipped the switch on computer-use capabilities baked directly into Gemini 3.5 Flash. The tweet from @GoogleDeepMind confirming the feature landed just before noon Eastern, but internal documents circulating among product teams suggest the rollout has been anything but smooth—the company quietly pushed back a broader release by at least two weeks to patch an as-yet-unspecified "edge-case vulnerability" discovered during final red-team testing.
What this means in practice: Gemini 3.5 Flash can now interpret on-screen pixels, navigate menus, fill in forms, and trigger system-level actions without relying on API hooks or browser extensions. Engineers close to the project say the model uses a lightweight vision encoder trained specifically on dense desktop UIs, not generalized image recognition. That distinction matters because previous attempts at computer-use agents—from startups like Adept and even early Gemini previews—struggled with latency and hallucinated clicks. DeepMind appears to have solved the latency problem by offloading the visual parsing to a separate, smaller model that runs locally on-device. Early benchmarks shared in a private Slack channel show the system completes a multi-step web workflow—such as booking a demo calendar slot and sending a confirmation email—in under nine seconds on an M3 MacBook Air.
The question now is trust. Computer-use agents, by definition, require granting an AI model direct control over your machine. Google DeepMind claims the tool operates within a sandboxed environment that logs every action and requires user approval before executing destructive commands like file deletion or payment submission. But security researchers have already flagged that the sandboxing logic relies on the operating system’s accessibility permissions, which can be bypassed by a sufficiently adversarial prompt. The company has not yet published a formal red-team report or a vulnerability disclosure. A spokesperson told me the documentation is "forthcoming" but declined to give a date.
What happens next is uncertain but consequential. If this lands without a major incident, Google will have leapfrogged every competitor shipping "agentic" features today—including OpenAI’s Code Interpreter and Anthropic’s tool-use APIs. But if the attack surface proves real, this could set back public trust in autonomous desktop agents by years. DeepMind has promised a full technical deep-dive at its next internal all-hands, likely late July. Until then, the model is live and clicking.
Source: https://x.com/GoogleDeepMind/status/2070180509523546481


