Query Regarding Azure APIs Subscription

Question

Query Regarding Azure APIs Subscription

Hiten Mehta 20

I am interested in subscribing to the Azure Video Indexer APIs and had a few queries before proceeding:

Do the APIs offer voice recognition capabilities, specifically the ability to identify individual speakers in a conversation?

Does the transcript output include timestamps for the spoken content?

Emotion or sentiment detection?

I would appreciate your clarification on the above points.

Looking forward to your response.

Warm regards, Hiten Mehta

0 comments

Answer accepted by question author

1 additional answer

Your answer

Answer 1

Pavankumar Purilla 11,575 Microsoft External Staff Moderator

Hi Hiten Mehta,
Azure Video Indexer provides advanced voice recognition capabilities, including speaker diarization, which allows the system to detect and differentiate between individual speakers in a multi-speaker audio or video recording. These speakers are labeled (e.g., Speaker 1, Speaker 2) throughout the transcript, and with additional customization, these can be mapped to known individuals. Additionally, the transcript output includes precise timestamps, offering both word-level and phrase-level time codes, enabling accurate alignment of text with audio/video playback. This data is accessible in formats like JSON and VTT/SRT for integration and review. Furthermore, Azure Video Indexer supports emotion and sentiment detection. It analyzes facial expressions, vocal tones, and spoken words to infer emotional states such as happiness, anger, or neutrality. This sentiment analysis enhances the understanding of audience engagement and speaker intent.
For detailed documentation, you can refer to the Azure Video Indexer API documentation and the API reference.

Pavankumar Purilla 11,575 Reputation points Microsoft External Staff Moderator

2025-07-03T07:21:07.76+00:00

Hi Hiten Mehta,
Did you get any chance to check the response. Thank you!
Hiten Mehta 20 Reputation points

2025-07-05T05:55:40.8533333+00:00

Hi Pavan,

I had checked it earlier. What I don't understand is that there are only 2 options: Video only and Audio only. I believe we will need both Audio and Video. Please correct me if I am wrong. The resolution of our videos will be 720p or lower.

Answer 2

Hiten Mehta 20

Hi Pavan,

Can you please guide me on the pricing for the services I have listed?

Pavankumar Purilla 11,575 Reputation points Microsoft External Staff Moderator

2025-07-03T14:00:19.7266667+00:00

Hi Hiten Mehta,
Azure Video Indexer pricing is based on the type of media (audio or video), duration, resolution, and the features enabled (e.g., transcription, sentiment analysis, facial recognition, etc.). For video files, pricing depends on the resolution tier — for instance, low (up to 720p), standard HD (up to 1080p), and high (up to 4K) each have different per-minute rates. If you are using audio-only content, it is typically billed at a lower rate. Features like transcription, speaker indexing, sentiment detection, face detection, and content moderation are included in the processing cost, and no separate charges apply for each individual AI skill. You are billed per minute of media processed.

You can find the detailed pricing model on the Azure Video Indexer pricing page, under the Media Services > Video Analyzer section. Please note that actual costs may vary depending on region and usage volume. You can also estimate costs using the Azure Pricing Calculator.
Hiten Mehta 20 Reputation points

2025-07-03T14:09:29.6366667+00:00

I had checked it earlier. What I don't understand is that there are only 2 options: Video only and Audio only. I believe we will need both Audio and Video. Please correct me if I am wrong. The resolution of our videos will be 720p or lower.

Share via

Query Regarding Azure APIs Subscription

1 additional answer

Your answer