Transcription

Transcription converts video audio to text. It auto-detects language and returns timed segments with optional speaker diarization.

curl -X POST https://api.netraflow.com/v1/jobs \
  -H "Content-Type: application/json" \
  -H "X-Api-Key: sk_live_your_key_here" \
  -d '{
    "url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
    "capabilities": ["transcription"]
  }'

{
  "data": {
    "job_id": "job_abc123",
    "status": "completed",
    "results": {
      "transcription": {
        "text": "We're no strangers to love. You know the rules and so do I...",
        "segments": [
          {
            "start": 0.0,
            "end": 4.8,
            "text": "We're no strangers to love",
            "speaker": 0
          },
          {
            "start": 4.8,
            "end": 8.2,
            "text": "You know the rules and so do I",
            "speaker": 0
          }
        ],
        "language": "en",
        "duration_seconds": 212.0,
        "word_count": 423,
        "speakers_detected": 1
      }
    }
  }
}

Response fields

Prop

Type

Segment fields

Prop

Type

Response fields

Segment fields

On this page