Tag
21 articles
Learn to build a basic voice assistant similar to Siri using Python's speech recognition and text-to-speech libraries. This beginner-friendly tutorial teaches you how to create a command-based assistant that listens, understands, and responds to voice commands.
This article explains NVIDIA's Nemotron 3.5 ASR, a 600M-parameter streaming speech recognition model that processes 40 languages in real-time using cache-aware optimization techniques.
Learn to build a basic voice-controlled assistant app that recognizes spoken commands and responds with text-to-speech output, demonstrating the core technology behind modern voice assistants.
Learn to build a basic AI voice assistant that can understand spoken questions and respond with intelligent answers using Python and OpenAI's API.
Learn how voice AI works and why it's particularly challenging in India's diverse linguistic environment. Discover how companies like Wispr Flow are working to make voice technology more accessible.
Learn to build a basic speech-to-speech conversational AI system that processes voice input, generates intelligent responses, and speaks back to users.
This explainer explores the advanced AI technologies behind modern dictation apps, including transformer architectures, real-time processing, and multimodal learning techniques.
IBM has launched two new Granite Speech 4.1 2B models — one autoregressive for high-accuracy speech recognition with translation, and one non-autoregressive for fast inference.
OpenMOSS has released MOSS-Audio, an open-source foundation model that unifies speech, sound, music, and temporal audio reasoning, outperforming existing open-source models including systems more than four times its size.
This article explains how the Deepgram Python SDK enables developers to integrate advanced voice AI capabilities like transcription, text-to-speech, and asynchronous audio processing into Python applications.
Learn to build a basic voice translation application using DeepL's API and Python. This beginner-friendly tutorial teaches you how to capture voice input, translate it in real-time, and speak the results aloud.
Learn what Microsoft VibeVoice is, how it uses AI to understand and generate human speech, and why it's important for the future of voice technology.