speech-2.5-hd-preview-voice-clone (voice to voice) - GPTProto API Documentation

POST

/

api

/

v3

/

minimax

/

voice-clone

speech-2.5-hd-preview-voice-clone (voice to voice)

curl --request POST \
  --url https://api.example.com/api/v3/minimax/voice-clone

Minimax’s GPTProto format for the Voice to Voice API.

curl -X POST "https://gptproto.com/api/v3/minimax/voice-clone" \
  -H "Authorization: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
  "accuracy": 0.7,
  "audio": "https://d1q70pf5vjeyhc.cloudfront.net/media/92d2d4ca66f84793adcb20742b15d262/audios/1752727142784562094_VRSOK53Y.mp3",
  "custom_voice_id": "Alice-jiuhao-2",
  "model": "speech-2.5-hd-preview-voice-clone",
  "need_noise_reduction": false,
  "need_volume_normalization": false,
  "text": "This is a preview of your cloned voice. I hope you enjoy it!"
}'

{
  "error": {
    "message": "Invalid signature",
    "type": "401"
  }
}

Parameters

Parameter	Type	Required	Default	Description
`audio`	string	❌ No	-	The uploaded file is cloned and supports formats such as MP3, M4A, and WAV.
`custom_voice_id`	string	✅ Yes	-	Custom user-defined ID. Minimum 8 characters; must include letters and numbers and start with a letter (e.g., gptproto0001). Duplicate voice-ids will throw an error.
`model`	string	✅ Yes	`speech-2.5-hd-preview-voice-clone`	Specify the TTS model to be used for the preview. This is only a preview after cloning. Once the model is generated, any Minimax Turbo or HD voice model can be used for inference.
`need_noise_reduction`	string	❌ No	-	Enable noise reduction. Default is false (no noise reduction).
`need_volume_normalization`	boolean	❌ No	`false`	Specify whether to enable volume normalization. If not provided, the default value is false.
`accuracy`	number	❌ No	`0.7`	0.00 ~ 1.00 .Uploading this parameter will set the text validation accuracy threshold, with a value range of [0,1]. If not provided, the default value for this parameter is 0.7.
`text`	string	❌ No	`This is a preview of your cloned voice. I hope you enjoy it!`	Text for audio preview. Limited to 2000 characters.

Response

speech-2.5-hd-preview (speech 2.5 hd preview)

speech-2.5-hd-preview-voice-clone (file upload)