Project Overview
Bisik is a desktop speech recognition application that brings the power of OpenAI’s Whisper API directly to your workflow. Built with Electron.js, it provides instant transcription capabilities with a simple keyboard shortcut (double backslash), making it perfect for note-taking, content creation, and accessibility needs. The app features custom prompt modes, allowing users to tailor the transcription output to their specific use cases.

Tech Stack
- Electron.js
 - Node.js
 - OpenAI Whisper API
 - HTML/CSS/JavaScript
 - Electron Builder (for cross-platform packaging)
 
Key Challenges
- System Integration: Creating a global keyboard shortcut that works across all applications while the app runs in the background
 - Audio Capture: Implementing reliable microphone access and audio recording across different operating systems
 - API Integration: Efficiently handling audio data transmission to OpenAI’s Whisper API while managing API costs
 - User Experience: Designing an intuitive interface that stays out of the way but is instantly accessible when needed
 
Solutions
Global Hotkey Implementation:
- Utilized Electron’s globalShortcut API to register system-wide keyboard shortcuts
 - Implemented a double-backslash trigger mechanism for quick activation
 - Added proper permission handling for microphone access across platforms
 
Efficient Audio Processing:
- Developed an audio capture system that records and processes speech in real-time
 - Implemented audio compression to reduce file sizes before API transmission
 - Created a buffering system to handle continuous speech without interruptions
 
Custom Prompt System:
- Built a flexible prompt management system allowing users to create custom transcription modes
 - Implemented template-based prompts for different use cases (meeting notes, code dictation, etc.)
 - Added persistent storage for user-created prompts and settings
 
Cross-Platform Compatibility:
- Designed the app to work seamlessly on macOS with plans for Windows and Linux support
 - Used Electron Builder for creating native installers
 - Implemented platform-specific UI adjustments for native feel
 
Features & Innovation
- Instant Activation: Double-backslash hotkey for immediate transcription start
 - Custom Modes: Users can create personalized prompts for specific transcription scenarios
 - API Key Management: Secure storage of OpenAI API keys with easy configuration
 - Minimal UI: Clean, distraction-free interface that focuses on functionality
 - Background Operation: Runs quietly in the system tray until needed
 
Lessons Learned
- Mastered Electron.js development including main/renderer process communication and system tray integration
 - Gained experience with audio APIs and real-time audio processing in Node.js
 - Learned to handle cross-platform compatibility challenges in desktop applications
 - Developed skills in creating user-friendly interfaces for technical tools
 - Understood the importance of efficient API usage and cost optimization
 
Attachments
Main Recording Interface

Settings and API Configuration

Custom Prompt Modes

GitHub Repository: Bisik - Speech Recognition App