Guide for implementing Google Gemini API image understanding - analyze images with captioning, classification, visual QA, object detection, segmentation, and multi-image comparison. Use when analyzing images, answering visual questions, detecting objects, or processing documents with vision.
Security
1 medium severity finding. This skill can be installed but you should review these findings before use.
The skill exposes the agent to untrusted, user-generated content from public third-party sources, creating a risk of indirect prompt injection. This includes browsing arbitrary URLs, reading social media posts or forum comments, and analyzing content from unknown websites.
Third-party content exposure detected (high risk: 0.90). The skill's analyze-image.py explicitly downloads arbitrary http/https URLs (requests.get in scripts/analyze-image.py) and sends those untrusted, third-party images to the Gemini model (and the repo's examples/best-practices show using model outputs in follow-up prompts and parsing responses into actions like bounding-box parsing or further requests), so remote user-generated content can materially influence subsequent prompts and tool-driven behavior.
b1b2fe0
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.