Audio Transcription & Speaker Diarization

Dive into our latest project using AI to transcribe audio and identify speakers!

Explore Now

Project Overview

What It Does

This project leverages AI to transcribe audio files and perform speaker diarization, identifying who speaks when. Using Whisper for transcription and Pyannote for diarization, it processes audio into a JSON output with timestamps, speaker IDs, and text.

Tech Stack

Python: Core programming language
Whisper (OpenAI): Speech-to-text transcription
Pyannote: Speaker diarization
Hugging Face: Model hosting and authentication
Torch & Torchaudio: Audio processing

Tools & Technologies

Whisper

OpenAI's Whisper model for high-accuracy audio transcription.

Learn More

Pyannote

Advanced speaker diarization to identify speakers in audio.

Learn More

Hugging Face

Platform for model hosting and authentication.

Learn More

Code Highlight


import torch
from transformers import pipeline
from pyannote.audio import Pipeline

# Load Whisper for transcription
transcriber = pipeline(
    "automatic-speech-recognition",
    model="openai/whisper-tiny",
    device=0 if torch.cuda.is_available() else -1
)

# Load Pyannote for diarization
diarizer = Pipeline.from_pretrained(
    "pyannote/speaker-diarization-3.1",
    use_auth_token=os.getenv("HF_TOKEN")
)

View Full Code on GitHub

Back to Portfolio