Skip to main content
POST
/
api
/
v3
/
minimax
/
speech-2.5-hd-preview
speech-2.5-hd-preview (speech 2.5 hd preview)
curl --request POST \
  --url https://api.example.com/api/v3/minimax/speech-2.5-hd-preview
Minimax’s GPTProto format for the speech 2.5 hd preview API.
curl -X POST "https://gptproto.com/api/v3/minimax/speech-2.5-hd-preview" \
  -H "Authorization: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
  "emotion": "surprised",
  "enable_base64_output": false,
  "enable_sync_mode": false,
  "english_normalization": false,
  "pitch": 0,
  "speed": 0.9,
  "text": "Hi there! Minimax speech 2.5 now live on Gptproto! Let's try it!",
  "voice_id": "Deep_Voice_Man",
  "volume": 1,
  "language_boost": "Chinese"
}'
{
  "error": {
    "message": "Invalid signature",
    "type": "401"
  }
}

Parameters

ParameterTypeRequiredDefaultDescription
textstring✅ YesHello world! This is a test of the text-to-speech system.Text to convert to speech. Every character is 1 token. Maximum 10000 characters. Use between words to control pause duration (0.01-99.99s).
voice_idstring✅ YesWise_Woman* Desired voice ID. One of the following system voice IDs: Wise_Woman, Friendly_Person, Inspirational_girl, Deep_Voice_Man, Calm_Woman, Casual_Guy, Lively_Girl, Patient_Man, Young_Knight, Determined_Man, Lovely_Girl, Decent_Boy, Imposing_Manner, Elegant_Man, Abbess, Sweet_Girl_2, Exuberant_Girl. * Use a voice ID you have trained.
speednumber❌ No10.50 ~ 2.00 . Speech speed. Range: 0.5-2.0, where 1.0 is normal speed.
volumenumber❌ No10.10 ~ 10.00 . Speech volume. Range: 0.1-10.0, where 1.0 is normal volume.
pitchinteger❌ No0-12 ~ 12 .Speech pitch. Range: -12 to 12, where 0 is normal pitch.
emotionstring❌ Nohappyhappy, sad, angry, fearful, disgusted, surprised, neutral . The emotion of the generated speech.
english_normalizationboolean❌ NofalseThis parameter supports English text normalization, which improves performance in number-reading scenarios.
sample_rateinteger❌ No80008000, 16000, 22050, 24000, 32000, 44100 .Sample rate of generated sound.
bitrateinteger❌ No3200032000, 64000, 128000, 256000. Bitrate of generated sound.
channelstring❌ No11, 2 . The number of channels of the generated audio. 1: mono, 2: stereo.
formatstring❌ Nomp3mp3, wav, pcm, flac .Format of generated sound.
language_booststring❌ NoautoChinese, Chinese,Yue, English, Arabic, Russian, Spanish, French, Portuguese, German, Turkish, Dutch, Ukrainian, Vietnamese, Indonesian, Japanese, Italian, Korean, Thai, Polish, Romanian, Greek, Czech, Finnish, Hindi, auto Enhance the ability to recognize specified languages and dialects.
enable_base64_outputboolean❌ NofalseIf enabled, the output will be encoded into a BASE64 string instead of a URL. This property is only available through the API.
enable_sync_modeboolean❌ NofalseIf set to true, the function will wait for the result to be generated and uploaded before returning the response. It allows you to get the result directly in the response. This property is only available through the API.