System Overview
VISION has three parts that work together seamlessly:
┌────────────────────┐ Cloud Sync ┌────────────────────┐│ │ ◄──────────────────────► │ ││ SMART GLASSES │ │ MOBILE APP ││ (Worn by VIP) │ │ (Android) ││ │ │ ││ • Voice-controlled│ │ • VIP interface ││ • AI vision │ │ • Caretaker view ││ • Navigation │ │ • Chat & sharing ││ • Haptic feedback │ │ • Live monitoring ││ │ │ │└────────────────────┘ └────────────────────┘The Smart Glasses
The glasses are the VIP’s eyes and ears. They combine cameras, sensors, and on-device AI to help the user understand and navigate their surroundings — all controlled by voice.
Hardware features
- Dual cameras — one for general vision (object detection, navigation, document scanning), one dedicated to face recognition
- Ultrasonic distance sensing — for obstacle detection with distance awareness
- GPS — for navigation and location tracking
- Head position sensor — gentle reminders to keep your head level
- Physical button — push-to-talk plus tap controls (skip, cancel)
- Vibration motors — haptic feedback for alerts
- Lights — with auto-brightness adjustment
- Bluetooth + WiFi — for pairing and cloud connectivity
What runs on the glasses
- Voice command processing and voice response
- Real-time face recognition (multiple faces at once)
- Obstacle detection with priority-based warnings
- Object recognition for scene description
- GPS-based navigation and location logging
- Document scanning with AI-powered text extraction and summarization
The Mobile App
A companion Android app for VIPs and their caretakers. Every action on the glasses is reflected here, and vice versa.
- VIP mode — manage contacts (faces), review history, customize your glasses, chat with caretakers
- Caretaker mode — monitor the VIP’s live navigation, review detected faces and locations, chat and share content
Cloud Connectivity
The glasses and mobile app don’t connect directly over Bluetooth for day-to-day use. Instead, they sync through the cloud in real time.
This means:
- Your caretaker can check in from anywhere in the world
- Adding a new face in the app makes it recognizable on the glasses instantly
- If the glasses briefly lose connection, data syncs back when they reconnect
- All data is encrypted in transit and at rest
Languages
All three system components — glasses, app, and cloud voices — support English, Chinese (Mandarin), and Malay.
Response Time
From button press to voice response, VISION aims for a total response time of under 10 seconds for most commands — often much faster for simple requests.
What Happens When You Speak a Command
- You press and hold the button — the glasses confirm with a small vibration
- You speak your command naturally
- You release — the glasses confirm with a second vibration
- The glasses understand what you said and pick the best-matching command
- The right module runs (object detection, navigation, messaging, etc.)
- The glasses speak the result — you can tap to skip or double-tap to stop
See User Flow for detailed walkthroughs of each interaction.