Skip to main content
POST
/
api
/
v3
/
kwaivgi
/
kling-lipsync
/
text-to-video
Text to Speech Lip Sync
curl --request POST \
  --url https://gptproto.com/api/v3/kwaivgi/kling-lipsync/text-to-video \
  --header 'Authorization: <authorization>' \
  --header 'Content-Type: application/json' \
  --data '{
  "video": "<string>",
  "text": "<string>",
  "voice_id": "<string>",
  "voice_language": "<string>",
  "voice_speed": 123
}'
{
    "error": {
    "message": "Invalid signature",
    "type": "401"
}
}

Overview

Kling TextToVideo by Kwaivgi creates videos with lifelike lip movements that precisely sync to input text for natural speaking visuals. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

Authentication

This endpoint requires authentication using a Bearer token.
Authorization
string
default:"sk-***********"
required
Your API key in the format: YOUR_API_KEY

Request Body

video
string
default:"https://example.com/video.mp4"
required
The URL of the video file for generating synchronized lip movements.Requirements:
  • Supported formats: .mp4, .mov
  • File size: Max 100MB
  • Duration: 2s - 10s
  • Resolution: 720p or 1080p only
  • Dimensions: Both width and height must be between 720px and 1920px
text
string
required
Text content for lip-sync video generation. This text will be converted to speech and synchronized with lip movements in the video.Limit: Maximum 120 characters
voice_id
string
default:"genshin_klee2"
required
Voice ID to use for speech synthesis. Different voice IDs provide different character voices and speaking styles.Available Voice IDs:
  • genshin_klee2 - Genshin Klee character voice
  • genshin_vindi2 - Genshin Vindi character voice
  • genshin_kirara - Genshin Kirara character voice
  • zhinen_xuesheng - Student voice
  • AOT - Attack on Titan style
  • ai_shatang - Sweet voice
  • ai_kaiya - Kaiya voice
  • oversea_male1 - Overseas male voice
  • ai_chenjiahao_712 - Chen Jiahao voice
  • girlfriend_4_speech02 - Girlfriend voice 4
  • chat1_female_new-3 - Female chat voice
  • chat_0407_5-1 - Chat voice variant
  • cartoon-boy-07 - Cartoon boy voice
  • uk_boy1 - UK boy voice
  • cartoon-girl-01 - Cartoon girl voice
  • PeppaPig_platform - Peppa Pig style voice
  • ai_huangzhong_712 - Huang Zhong voice
  • ai_huangyaoshi_712 - Huang Yaoshi voice
  • ai_laoguowang_712 - Lao Guo Wang voice
  • chengshu_jiejie - Mature sister voice
  • you_pingjing - Calm voice
  • calm_story1 - Calm storytelling voice
  • uk_man2 - UK man voice
  • laopopo_speech02 - Grandmother voice
  • heainainai_speech02 - Grandma voice
voice_language
string
default:"en"
The voice language corresponding to the Voice ID.Supported Languages:
  • zh - Chinese
  • en - English
voice_speed
number
default:"1"
Speech rate for text to video generation. Controls how fast the generated speech should be.Range: 0.8 - 2.0
  • Default: 1.0 (normal speed)
  • Values > 1.0 increase speed
  • Values < 1.0 decrease speed

Request Example

curl --location 'https://gptproto.com/api/v3/kwaivgi/kling-lipsync/text-to-video' \
--header 'Authorization: YOUR_API_KEY' \
--header 'Content-Type: application/json' \
--data '{
  "text": "Kling lipsync on GptprotoAI is an AI-powered model. Try it now!",
  "video": "https://d1q70pf5vjeyhc.cloudfront.net/predictions/b82002c0695a48ccb6e08d23602402ed/1.mp4",
  "voice_id": "genshin_klee2",
  "voice_language": "en",
  "voice_speed": 1.3
}'

Response

data.id
string
Unique identifier for the prediction, Task Id
data.status
string
Status of the task: created, processing, completed, or failed
{
    "error": {
    "message": "Invalid signature",
    "type": "401"
}
}

Usage Notes

  • Face Requirements: The video should contain a clear, visible face for optimal lip synchronization. The model creates unique movement trajectories based on facial features
  • Natural Lip Movements: The AI generates naturally matched lip movements that synchronize precisely with the generated audio
  • Video Integrity: Areas outside the face remain consistent with the original video, ensuring visual integrity and continuity
  • Voice Models: Different voice_id values provide different character voices and speaking styles (e.g., “genshin_klee2” for anime-style voices)
  • Language Support: Use the voice_language parameter to specify the language for speech generation (e.g., “en” for English, “zh” for Chinese, “ja” for Japanese)
  • Speed Control: The voice_speed parameter allows you to control speech rate. Default is 1.0 (normal speed), values > 1.0 increase speed, values < 1.0 decrease speed
  • Processing Time: Processing duration varies based on text length, video length, and complexity
  • Query Results: Use the task ID returned in the response to query the generation status via the Query Task endpoint