Future Plans

What's next for YAH Mei? Exciting features on the horizon!

Project Plan: Multilingual REALITY Livestreaming Bot

YAH Mei is evolving into a comprehensive livestreaming solution. Here's our detailed project plan:

architecture

Core Architecture

In Development

Low-Cost Hybrid Model: • PC (Controller + REALITY Environment) • 2 Dedicated Android Phones (Processing) • Cloud AI Services (Core Intelligence)

Components & Roles

computer

PC (Primary Controller & REALITY Environment)

In Development

OS: User's primary OS (Windows/Linux/Mac) • Core Script: Python 3.x with LangChain framework for orchestration and AI interaction • GUI Interface: Python-based dashboard using PyQt5 or Tkinter for: ◦◦◦◦ Real-time status monitoring ◦◦◦◦ Audio level visualization and control ◦◦◦◦ Conversation history display ◦◦◦◦ Manual override controls ◦◦◦◦ System configuration panel • REALITY Environment: Android Emulator running REALITY App and ADBKeyboard • Audio Routing: Voicemeeter for audio capture and playback management • RAG Storage: Local ChromaDB for conversation history/memory • AI API Client: OpenAI Python library for cloud and local services • ADB Control: Python subprocess module for emulator commands • Viewer Comment Reading: WebSocket connection to REALITY App

smartphone

Phone 1 (ASR Server)

In Development

Hardware: Dedicated Android phone • Environment: chroot Linux environment (Debian/Ubuntu ARM) • ASR Engine: faster-whisper (using tiny or base multilingual models) • Server: Python web server exposing an OpenAI-compatible API endpoint

smartphone

Phone 2 (TTS Server)

In Development

Hardware: Dedicated Android phone • Environment: chroot Linux environment • TTS Engine: MeloTTS with custom-trained EN/ZH/JA voice models • Server: Python web server exposing an OpenAI-compatible API endpoint

cloud

Cloud Services

In Development

AI Models providing LLM and Embedding capabilities, accessed via an OpenAI-compatible API endpoint (e.g., Google Gemini backend configured for compatibility).

Workflow

schema

System Workflow

In Development

1. Input: • Voice: Collaborators speak within the REALITY App • Text: Viewers post text comments in the REALITY App 2. Audio Capture: Emulator audio captured by Voicemeeter 3. ASR Request: Core Script sends audio to ASR Server on Phone 1 4. ASR Processing: Phone 1 transcribes audio and returns text 5. Viewer Comment Reading: Core Script reads latest viewer comments 6. LangChain Processing (PC): • Receives ASR transcription and viewer comments • Calls Cloud Embedding API to generate query vector • Queries local ChromaDB for relevant context • Constructs complete prompt with all information • Calls Cloud LLM API with the complete prompt • Receives text response and detects language 7. TTS Request: Core Script sends text to TTS Server on Phone 2 8. TTS Processing: Phone 2 generates audio from text 9. Audio Output: Core Script plays audio through Voicemeeter