Future Plans
What's next for YAH Mei? Exciting features on the horizon!
Project Plan: Multilingual REALITY Livestreaming Bot
YAH Mei is evolving into a comprehensive livestreaming solution. Here's our detailed project plan:
Low-Cost Hybrid Model: • PC (Controller + REALITY Environment) • 2 Dedicated Android Phones (Processing) • Cloud AI Services (Core Intelligence)
Components & Roles
• OS: User's primary OS (Windows/Linux/Mac) • Core Script: Python 3.x with LangChain framework for orchestration and AI interaction • GUI Interface: Python-based dashboard using PyQt5 or Tkinter for: ◦◦◦◦ Real-time status monitoring ◦◦◦◦ Audio level visualization and control ◦◦◦◦ Conversation history display ◦◦◦◦ Manual override controls ◦◦◦◦ System configuration panel • REALITY Environment: Android Emulator running REALITY App and ADBKeyboard • Audio Routing: Voicemeeter for audio capture and playback management • RAG Storage: Local ChromaDB for conversation history/memory • AI API Client: OpenAI Python library for cloud and local services • ADB Control: Python subprocess module for emulator commands • Viewer Comment Reading: WebSocket connection to REALITY App
• Hardware: Dedicated Android phone • Environment: chroot Linux environment (Debian/Ubuntu ARM) • ASR Engine: faster-whisper (using tiny or base multilingual models) • Server: Python web server exposing an OpenAI-compatible API endpoint
• Hardware: Dedicated Android phone • Environment: chroot Linux environment • TTS Engine: MeloTTS with custom-trained EN/ZH/JA voice models • Server: Python web server exposing an OpenAI-compatible API endpoint
AI Models providing LLM and Embedding capabilities, accessed via an OpenAI-compatible API endpoint (e.g., Google Gemini backend configured for compatibility).
Workflow
1. Input: • Voice: Collaborators speak within the REALITY App • Text: Viewers post text comments in the REALITY App 2. Audio Capture: Emulator audio captured by Voicemeeter 3. ASR Request: Core Script sends audio to ASR Server on Phone 1 4. ASR Processing: Phone 1 transcribes audio and returns text 5. Viewer Comment Reading: Core Script reads latest viewer comments 6. LangChain Processing (PC): • Receives ASR transcription and viewer comments • Calls Cloud Embedding API to generate query vector • Queries local ChromaDB for relevant context • Constructs complete prompt with all information • Calls Cloud LLM API with the complete prompt • Receives text response and detects language 7. TTS Request: Core Script sends text to TTS Server on Phone 2 8. TTS Processing: Phone 2 generates audio from text 9. Audio Output: Core Script plays audio through Voicemeeter