Project Overview
Bisik is a desktop speech recognition application that brings the power of OpenAI’s Whisper API directly to your workflow. Built with Electron.js, it provides instant transcription capabilities with a simple keyboard shortcut (double backslash), making it perfect for note-taking, content creation, and accessibility needs. The app features custom prompt modes, allowing users to tailor the transcription output to their specific use cases.
Tech Stack
- Electron.js
- Node.js
- OpenAI Whisper API
- HTML/CSS/JavaScript
- Electron Builder (for cross-platform packaging)
Key Challenges
- System Integration: Creating a global keyboard shortcut that works across all applications while the app runs in the background
- Audio Capture: Implementing reliable microphone access and audio recording across different operating systems
- API Integration: Efficiently handling audio data transmission to OpenAI’s Whisper API while managing API costs
- User Experience: Designing an intuitive interface that stays out of the way but is instantly accessible when needed
Solutions
Global Hotkey Implementation:
- Utilized Electron’s globalShortcut API to register system-wide keyboard shortcuts
- Implemented a double-backslash trigger mechanism for quick activation
- Added proper permission handling for microphone access across platforms
Efficient Audio Processing:
- Developed an audio capture system that records and processes speech in real-time
- Implemented audio compression to reduce file sizes before API transmission
- Created a buffering system to handle continuous speech without interruptions
Custom Prompt System:
- Built a flexible prompt management system allowing users to create custom transcription modes
- Implemented template-based prompts for different use cases (meeting notes, code dictation, etc.)
- Added persistent storage for user-created prompts and settings
Cross-Platform Compatibility:
- Designed the app to work seamlessly on macOS with plans for Windows and Linux support
- Used Electron Builder for creating native installers
- Implemented platform-specific UI adjustments for native feel
Features & Innovation
- Instant Activation: Double-backslash hotkey for immediate transcription start
- Custom Modes: Users can create personalized prompts for specific transcription scenarios
- API Key Management: Secure storage of OpenAI API keys with easy configuration
- Minimal UI: Clean, distraction-free interface that focuses on functionality
- Background Operation: Runs quietly in the system tray until needed
Lessons Learned
- Mastered Electron.js development including main/renderer process communication and system tray integration
- Gained experience with audio APIs and real-time audio processing in Node.js
- Learned to handle cross-platform compatibility challenges in desktop applications
- Developed skills in creating user-friendly interfaces for technical tools
- Understood the importance of efficient API usage and cost optimization
Attachments
Main Recording Interface
Settings and API Configuration
Custom Prompt Modes
GitHub Repository: Bisik - Speech Recognition App