CHATTERBOX DOES NOT WORK WITH RTX50XX CARDS DUE TO IT'S ARCHITECTURE.
CHIM comes with two high-quality local TTS services: CHIM XTTS and Chatterbox.
Both use the same port (8020), so only one can be enabled and running at a time.
Choosing Between XTTS and Chatterbox
| Feature | CHIM XTTS | Chatterbox |
|---|---|---|
| Voice Quality | Mid-High | High |
| VRAM Usage | ~4GB | ~4GB |
| Voice Generation | Yes | Yes |
| Auto Voice Generation | Yes | Yes |
| HuggingFace Account | No | Yes |
| Requirements | NVIDIA GPU + CUDA | NVIDIA GPU + CUDA |
Important: You need to setup a HuggingFace account to download the Chatterbox model. Its 100% free and can be done here: https://huggingface.co/settings/tokens
Also chatterbox requires at minimum a 5 second audio sample for it to work.
How It Works
Both XTTS and Chatterbox work the same way:
You can still override voices manually - the automatic in-game generation won't overwrite voices.If you set everything up correctly, you will be easily able to talk to ANY NPC (with an ingame voice) without any prior setup required!You can also manually upload .wav files for generation under
Configuration → XTTS/Chatterbox Management
Installing CHIM XTTS
Finetune XTTS (Advanced Settings) Credit: @ErikErix
Just go to: \wsl.localhost\DwemerAI4Skyrim3\home\dwemer\xtts-api-server\xtts_api_server and open tts_funcs.py with Notepad++ and you will see there:
default_tts_settings = {
"temperature" : 0.75,
"length_penalty" : 1.0,
"repetition_penalty": 5.0,
"top_k" : 50,
"top_p" : 0.85,
"speed" : 1,
"enable_text_splitting": True
-After that it is up to your creativity. 🙂 To apply you need to save and restart the server every time.
Installing Chatterbox
Switching Between Services
To switch from XTTS to Chatterbox (or vice versa):
Here is a guide if you want to run CHIM XTTS on the cloud to save VRAM.
Chatterbox doesn't work on RTX 50xx GPUs due to PyTorch compatibility issues. Additionally, the tts_to_audio endpoint returns 500 errors due to a missing TorchCodec/FFmpeg dependency.
Chatterbox doesn't work on RTX 50xx GPUs due to PyTorch compatibility issues. Additionally, the tts_to_audio endpoint returns 500 errors due to a missing TorchCodec/FFmpeg dependency.
Step 1: Activate the Chatterbox venv
cd /home/dwemer/chatterbox
source venv/bin/activate
Step 2: Install Chatterbox dependencies
pip install -e .
(This will install chatterbox-tts and its pinned dependencies)
Step 3: Replace PyTorch with 50xx-compatible nightly build
pip uninstall -y torch torchvision torchaudio
pip install --no-cache-dir --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu128
You'll see dependency conflict warnings — ignore them, it works fine.
Step 4: Install scipy
pip install scipy
Step 5: Fix the audio save error
The nightly torch breaks torchaudio's save function (requires TorchCodec/FFmpeg).
Fix by editing restapi.py:
nano /home/dwemer/chatterbox/restapi.py
Find this line in the tts_to_audio function:
python ta.save(buffer, wav, model.sr, format="wav")
Replace it with:
python import scipy.io.wavfile scipy.io.wavfile.write(buffer, model.sr, wav.squeeze(0).cpu().numpy())
Save and exit (Ctrl+O, Ctrl+X).
Or use sed:
sed -i 's| ta.save(buffer, wav, model.sr, format="wav")| import scipy.io.wavfile\n scipy.io.wavfile.write(buffer, model.sr, wav.squeeze(0).cpu().numpy())|' /home/dwemer/chatterbox/restapi.py
Step 6: Log in to HuggingFace (first time only)
huggingface-cli login
Paste your token from https://huggingface.co/settings/tokens
Step 7: Test
python restapi.py
Server should start on http://0.0.0.0:8020. Model weights download on first run.
Notes:
- The pip install -e . step will downgrade torch. That's why the torch nightly install comes AFTER — it overwrites the downgrade.
- Dependency conflict warnings about torch==2.6.0 vs nightly are expected and harmless.
- The scipy fix bypasses torchaudio entirely for WAV saving. Same audio output, no FFmpeg dependency.
- Tested on RTX 5080, DwemerDistro, CHIM 2.3.3.
- HuggingFace token only needed for first run to download model weights. Gets cached after that.
We also provide support for Mantella XTTS. It requires a few configuration changes but is rather simple to set up.
You may also need to enable your firewall to allow apps through to WSL2:
https://superuser.com/questions/1714002/wsl2-connect-to-host-without-disabling-the-windows-firewall
Any voices you see from videos of people who do not wish to be generated are done by users without our permission.
WE CAN NOT CONTROL THIS!
WE DO NOT CONDONE THE USE OF THE TOOLS PROVIDED TO GENERATE AI VOICES OF THOSE WHO DO NOT WISH TO BE AI GENERATED.
We will not now or in the future provide training data or voice files for anyone who does not wish to be generated within our mod files.
Zonos TTS is one of the most powerful TTS services supported by CHIM.
It has rather life like voices and emotion, but at the cost of having a 6GB VRAM requirement.
This makes it very hard to run both Skyrim and Zonos on the same machine unless you have a super computer!
There are 3 ways you can run Zonos:
Here is a guide if you want to run Zonos on the cloud to save VRAM.
Zonos works quite simply.
Whatever voices are in your voice cache, will be used to generate an AI voice every-time it makes an TTS request. You do not need to sync any voices on startup. Using the XTTS Management upload feature will place any new voices into your cache.
If playing normally most voices for NPCs should be in your cache already.
You can manually upload new voices using the CHIM XTTS Management page to place new voices in your voice cache.
There is not too much for us to say about the xVASynth implementation. It's a decent TTS service that's been around for a few years now and is simple to install. However it does lack some voices compared to MeloTTS or CHIM XTTS.
You may also need to enable your firewall to allow apps through to WSL2:
https://superuser.com/questions/1714002/wsl2-connect-to-host-without-disabling-the-windows-firewall
It’s pretty easy to set up and install:
MeloTTS is one of our recommended TTS services for you to use. It is free, runs locally, and with a low hardware requirement. Currently we have all the default approved Skyrim Voices trained using it. The quality is not as good compared to CHIM XTTS, but it will allow all players to have easy access to a comprehensive TTS service. There is no current way to easily train more voices using the Distro.
It is rather easy to set up. It can be installed using the main installation script or as an optional component folder in the Distro. You can run it on CPU (required usage for AMD users), or GPU (which is faster). After that just select it as the TTS service in the default profile, and speak to any vanilla NPC. They will automatically be allocated with an appropriate voice whenever they are activated.
We do recommend that you use CHIM XTTS if you have the hardware to do so as it is much better quality.
Here is a guide to run it on the cloud (down below in this document): CHIM Manual
The reason we have MeloTTS support is for users who can not run other more powerful TTS services while supporting all Skyrim like voices.
All our current MeloTTS voices are listed at the bottom of this document.
More info on MeloTTS can be found here: https://github.com/myshell-ai/MeloTTS
Cartesia TTS is a cloud-based TTS service similar to ElevenLabs, but at around ¼ of the price!
We have a video of it in action here: https://www.youtube.com/watch?v=TXATmooLwLQIt
Key Features:
How It Works:
Setup:
Inworld TTS is a cloud-based TTS service that provides high-quality voice generation similar to ElevenLabs, but at around ¼ of the price!
Video demo: https://www.youtube.com/watch?v=QEntIyHfX5g
Key Features:
How It Works:
Setup:
You can look in the TTS Studio page to see voices that are currently generated.
Piper is another TTS service that can run efficiently on low powered machines. It has a better quality than MeloTTS, but worse than XTTS.
It also uses the same voiceid logic as MeloTTS. List can be found below.
However you will need to manually download voices for it to work with CHIM.
How to install Piper Voices:
Piper Voice Links:
Mantella (Nexus, main mod), file size 4.1GB,
https://www.nexusmods.com/skyrimspecialedition/mods/98631?tab=files&file_id=632328
Mantella - Expanded Piper Models List (Nexus, optional), file size 1.3GB
https://www.nexusmods.com/skyrimspecialedition/mods/98631?tab=files&file_id=591368
Mantella Missing Voice Files - Male and Female Child (.onnx files), (Nexus):
https://www.nexusmods.com/skyrimspecialedition/mods/139736?tab=files&file_id=586124
Mantella Piper ZoraFairChild (Nexus)
https://www.nexusmods.com/skyrimspecialedition/mods/143687?tab=files&file_id=602216
Other Voices
https://huggingface.co/rhasspy/piper-voices
https://brycebeattie.com/files/tts/