One of the most famed utilize cases for Copilot Vision is in-app tutorials. If you’re looking for a specific action or menu in sophisticated software, Copilot Vision can quickly point you in the right direction. It is a floating toolbar that can follow you to any application or part of Windows, given the context of what is on the screen, whether it is the entire desktop or just a specific window.
This means fewer prompts, and combined with voice control, it’s more like having an experienced friend standing over your shoulder. Microsoft calls this feature “Highlights,” and you can trigger it by asking Copilot to “show me how” — whether that’s while you’re editing a photo, checking your calendar, or writing a shopping list. In some applications, such as Word, Excel and PowerPoint, Copilot Vision can even “see” what is off-screen, such as slides in a presentation or pages in a Word document that are not noticeable in full view.
Mehdi also talked about Gaming Copilot, which brings Copilot Vision to the world of gaming, whether on a PC or on a device like the up-to-date ROG Xbox Ally. I saw a demonstration of this in action, with an assistant giving the player instructions on the next missions in an open-world adventure.
It’s the combination of Co-Pilot Vision and Co-Pilot Actions that gets captivating. This is Microsoft’s approach to AI agents. Copilot Actions can perform tasks on your behalf locally through your application or operating system by simply following natural language instructions. So instead of Copilot Vision showing you where to find this unknown setting in Adobe Photoshop, Copilot Actions can simply enable it for you. Copilot Actions can perform the same editing on a folder full of photos or extract information from a vast PDF file.
