Bisik: Desktop Speech Recognition App with OpenAI Whisper

Tech Stack:
Electron.js Node.js OpenAI Whisper Speech Recognition

A desktop application built with Electron.js that provides instant speech-to-text transcription using OpenAI's Whisper API

Bisik: Desktop Speech Recognition App with OpenAI Whisper

Project Overview

Bisik is a desktop speech recognition application that brings the power of OpenAI’s Whisper API directly to your workflow. Built with Electron.js, it provides instant transcription capabilities with a simple keyboard shortcut (double backslash), making it perfect for note-taking, content creation, and accessibility needs. The app features custom prompt modes, allowing users to tailor the transcription output to their specific use cases.

Bisik App Interface

Tech Stack

  • Electron.js
  • Node.js
  • OpenAI Whisper API
  • HTML/CSS/JavaScript
  • Electron Builder (for cross-platform packaging)

Key Challenges

  • System Integration: Creating a global keyboard shortcut that works across all applications while the app runs in the background
  • Audio Capture: Implementing reliable microphone access and audio recording across different operating systems
  • API Integration: Efficiently handling audio data transmission to OpenAI’s Whisper API while managing API costs
  • User Experience: Designing an intuitive interface that stays out of the way but is instantly accessible when needed

Solutions

Global Hotkey Implementation:

  • Utilized Electron’s globalShortcut API to register system-wide keyboard shortcuts
  • Implemented a double-backslash trigger mechanism for quick activation
  • Added proper permission handling for microphone access across platforms

Efficient Audio Processing:

  • Developed an audio capture system that records and processes speech in real-time
  • Implemented audio compression to reduce file sizes before API transmission
  • Created a buffering system to handle continuous speech without interruptions

Custom Prompt System:

  • Built a flexible prompt management system allowing users to create custom transcription modes
  • Implemented template-based prompts for different use cases (meeting notes, code dictation, etc.)
  • Added persistent storage for user-created prompts and settings

Cross-Platform Compatibility:

  • Designed the app to work seamlessly on macOS with plans for Windows and Linux support
  • Used Electron Builder for creating native installers
  • Implemented platform-specific UI adjustments for native feel

Features & Innovation

  • Instant Activation: Double-backslash hotkey for immediate transcription start
  • Custom Modes: Users can create personalized prompts for specific transcription scenarios
  • API Key Management: Secure storage of OpenAI API keys with easy configuration
  • Minimal UI: Clean, distraction-free interface that focuses on functionality
  • Background Operation: Runs quietly in the system tray until needed

Lessons Learned

  • Mastered Electron.js development including main/renderer process communication and system tray integration
  • Gained experience with audio APIs and real-time audio processing in Node.js
  • Learned to handle cross-platform compatibility challenges in desktop applications
  • Developed skills in creating user-friendly interfaces for technical tools
  • Understood the importance of efficient API usage and cost optimization

Attachments

Main Recording Interface

Bisik Recording Screen

Settings and API Configuration

Bisik Settings Page

Custom Prompt Modes

Custom Prompt Configuration

GitHub Repository: Bisik - Speech Recognition App