AudioGPT

Task Name	Prompt	Inputs	Outputs
Text-To-Speech	Generate a speech with text "here we go".	/
Style Transfer	Speak using the voice of this audio. The text is "Here we go".
Speech Recognition	Transcribe this speech.		Here we go.
Speech Enhancement	Enhance the quality of the speech signal.
Speech Separation	Separate each speech from the speech mixture.
Mono-to-Binaural	Transfer this mono audio into binaural audio.

Task Name	Prompt	Inputs	Outputs
Text-To-Sing	Please generate a piece of singing voice. Text sequence is 小酒窝长睫毛AP是你最美的记号. Note sequence is C#4/Db4 \| F#4/Gb4 \| G#4/Ab4 \| A#4/Bb4 F#4/Gb4 \| F#4/Gb4 C#4/Db4 \| C#4/Db4 \| rest \| C#4/Db4 \| A#4/Bb4 \| G#4/Ab4 \| A#4/Bb4 \| G#4/Ab4 \| F4 \| C#4/Db4. Note duration sequence is 0.407140 \| 0.376190 \| 0.242180 \| 0.509550 0.183420 \| 0.315400 0.235020 \| 0.361660 \| 0.223070 \| 0.377270 \| 0.340550 \| 0.299620 \| 0.344510 \| 0.283770 \| 0.323390 \| 0.360340.	/

Task Name	Prompt	Inputs	Outputs
Text-To-Audio	Generate an audio of a piano playing.	/
Audio Inpainting	I want to inpaint this audio.
Image-To-Audio	Generate an audio of this image.
Sound Detection	What events does this audio include?
Target Sound Detection	Please help me detect the target sound in the audio based on desription: "I want to detect thunder event".		The thunder happened in this audio from 0.0 to 9.984 seconds.
Sound Extraction	Extract the thunder event from this audio.
Audio-To-Text	Give me the description of this audio.		The audio is recording of a goat bleating nearby several times.

Task Name	Prompt	Inputs	Outputs
Talking Head Synthesis	Generate a talking human portrait video.