Overview
Kling TextToVideo by Kwaivgi creates videos with lifelike lip movements that precisely sync to input text for natural speaking visuals. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.Authentication
This endpoint requires authentication using a Bearer token.Your API key in the format:
YOUR_API_KEYRequest Body
The URL of the video file for generating synchronized lip movements.Requirements:
- Supported formats:
.mp4,.mov - File size: Max 100MB
- Duration: 2s - 10s
- Resolution: 720p or 1080p only
- Dimensions: Both width and height must be between 720px and 1920px
Text content for lip-sync video generation. This text will be converted to speech and synchronized with lip movements in the video.Limit: Maximum 120 characters
Voice ID to use for speech synthesis. Different voice IDs provide different character voices and speaking styles.Available Voice IDs:
genshin_klee2- Genshin Klee character voicegenshin_vindi2- Genshin Vindi character voicegenshin_kirara- Genshin Kirara character voicezhinen_xuesheng- Student voiceAOT- Attack on Titan styleai_shatang- Sweet voiceai_kaiya- Kaiya voiceoversea_male1- Overseas male voiceai_chenjiahao_712- Chen Jiahao voicegirlfriend_4_speech02- Girlfriend voice 4chat1_female_new-3- Female chat voicechat_0407_5-1- Chat voice variantcartoon-boy-07- Cartoon boy voiceuk_boy1- UK boy voicecartoon-girl-01- Cartoon girl voicePeppaPig_platform- Peppa Pig style voiceai_huangzhong_712- Huang Zhong voiceai_huangyaoshi_712- Huang Yaoshi voiceai_laoguowang_712- Lao Guo Wang voicechengshu_jiejie- Mature sister voiceyou_pingjing- Calm voicecalm_story1- Calm storytelling voiceuk_man2- UK man voicelaopopo_speech02- Grandmother voiceheainainai_speech02- Grandma voice
The voice language corresponding to the Voice ID.Supported Languages:
zh- Chineseen- English
Speech rate for text to video generation. Controls how fast the generated speech should be.Range: 0.8 - 2.0
- Default:
1.0(normal speed) - Values > 1.0 increase speed
- Values < 1.0 decrease speed
Request Example
Response
Unique identifier for the prediction, Task Id
Status of the task: created, processing, completed, or failed
Usage Notes
- Face Requirements: The video should contain a clear, visible face for optimal lip synchronization. The model creates unique movement trajectories based on facial features
- Natural Lip Movements: The AI generates naturally matched lip movements that synchronize precisely with the generated audio
- Video Integrity: Areas outside the face remain consistent with the original video, ensuring visual integrity and continuity
- Voice Models: Different
voice_idvalues provide different character voices and speaking styles (e.g., “genshin_klee2” for anime-style voices) - Language Support: Use the
voice_languageparameter to specify the language for speech generation (e.g., “en” for English, “zh” for Chinese, “ja” for Japanese) - Speed Control: The
voice_speedparameter allows you to control speech rate. Default is 1.0 (normal speed), values > 1.0 increase speed, values < 1.0 decrease speed - Processing Time: Processing duration varies based on text length, video length, and complexity
- Query Results: Use the task ID returned in the response to query the generation status via the Query Task endpoint

