Security researchers disclosed a serious vulnerability in Microsoft Copilot, dubbed SearchLeak, that allowed attackers to steal two-factor authentication codes directly from users' inboxes and documents. The attack worked by injecting malicious instructions into content that Copilot would retrieve and process — essentially hijacking the AI's behavior without the user ever knowing.
The mechanism is prompt injection: an attacker embeds hidden instructions inside a document, email, or web page that the AI assistant is likely to read. When Copilot ingests that content, it follows the attacker's commands rather than serving the user. In this case, the payload could direct Copilot to locate and forward 2FA tokens, effectively bypassing a core layer of account security.

This matters well beyond Microsoft's ecosystem. Prompt injection is a structural weakness in how large language models process untrusted input alongside trusted instructions. The model has no reliable way to distinguish between "data I should summarize" and "commands I should execute." Every AI assistant that reads external content — emails, files, web results — carries some version of this attack surface.
For builders integrating LLMs into workflows, the practical lesson is this: never grant an AI agent write or exfiltration capabilities without strict output filtering and human-in-the-loop confirmation for sensitive actions. Retrieval-augmented systems should treat all retrieved content as untrusted and sandbox it away from action-triggering logic wherever possible.
Microsoft has patched the specific SearchLeak vector, but the underlying pattern will resurface. Until the industry establishes robust, standardized defenses against prompt injection — something that remains an open research problem — every agentic AI deployment that touches external data is a potential pivot point for attackers.
