FunASR
小于 1 分钟
FunASR
阿里巴巴语音识别funasr的android部署
modelscope-funasr本地部署到安卓FunASR
funasr-android
UniASR语音识别-闽南语-通用-16k
sherpa-onnx
Releases/asr-models
Releases/tts-models
speech-enhancement-models
speaker-recongition-models
speaker-segmentation-models
Text-to-speech (TTS)
Automatic Speech Recognition
Generate subtitles for videos
Speaker diarization
ModelScope csukuangfj
VITS
facebook/mms-tts-nan
csukuangfj/vits-mms-nan
TSukiLen/whisper-medium-chinese-tw-minnan
TSukiLen/whisper-small-chinese-tw-minnan
生成的音频
import torch
from transformers import VitsTokenizer, VitsModel, set_seed
tokenizer = VitsTokenizer.from_pretrained("facebook/mms-tts-nan")
model = VitsModel.from_pretrained("facebook/mms-tts-nan")
inputs = tokenizer(text="wo shi shui", return_tensors="pt")
# set_seed(555)
with torch.no_grad():
outputs = model(**inputs)
waveform = outputs.waveform[0]
保存
import torchaudio
# 保存为 WAV 文件(默认采样率 16kHz)
torchaudio.save("output.wav", waveform.unsqueeze(0), model.config.sampling_rate)