Skip to main content
Voice settings control how your agent sounds. Choose from multiple voice providers, adjust speaking style, and fine-tune the audio experience.

Voice Providers

Retone supports two premium TTS providers:

Basic Settings

Voice Selection

Choose from available voices:
  1. Click the Voice dropdown
  2. Browse available voices by provider
  3. Preview voices before selecting
  4. Click to apply

Speed

Control speaking rate:
ValueEffect
0.5Half speed (very slow)
0.8Slightly slower
1.0Normal speed
1.2Slightly faster
2.0Double speed (very fast)
For phone calls, 0.9-1.1 typically sounds most natural.

Volume

Output volume level (0-100):
Volume: 80  # Recommended for clear audio

Responsiveness

How quickly the agent starts speaking (1-10):
ValueEffect
1-3Slower, more deliberate responses
4-6Balanced (recommended)
7-10Faster, may interrupt caller

Emotional Expression (Cartesia)

Cartesia Sonic 3 supports emotional speech:

Emotion Types

EmotionWhen to Use
neutralProfessional, business contexts
friendlyCustomer service, general conversations
excitedSales, promotions, positive news
sympatheticHandling complaints, showing empathy
curiousAsking questions, gathering information
confidentAssertions, recommendations

Intensity

LevelEffect
lowSubtle emotional undertone
mediumNoticeable but not exaggerated
highStrong emotional expression
Emotion:
  Type: friendly
  Intensity: medium

Back-channeling

Natural acknowledgment sounds during conversation:
Back-channeling:
  Enabled: true
  Frequency: 5        # How often (0-10)
  Words:
    - "mm-hmm"
    - "I see"
    - "right"
    - "okay"
    - "got it"

When Back-channeling Triggers

  • During caller’s longer statements
  • At natural pauses
  • When caller shares information
  • To indicate active listening
Higher frequency (7-10) sounds very engaged but may feel excessive. Lower (1-3) is more subtle.

Pre-response Phrases

Filler phrases before AI generates a response:
Pre-response:
  Enabled: true
  Phrases:
    - "Let me check that for you..."
    - "One moment please..."
    - "Good question..."
    - "I can help with that..."

Benefits

  • Reduces perceived latency
  • Sounds more natural
  • Gives AI processing time

Conditional Pre-responses

Trigger different phrases based on context:
Conditions:
  - trigger: "complex_query"
    phrase: "Let me look into that..."
  - trigger: "simple_query"
    phrase: "Sure..."
  - trigger: "emotional"
    phrase: "I understand..."

Laughter Injection

Add natural laughter to appropriate moments:
Laughter:
  Enabled: true
When enabled, the AI may add subtle laughter in response to humorous comments from callers.

Pronunciation Overrides

Custom pronunciations for specific words:
Pronunciations:
  "Acme": "ACK-mee"
  "API": "A P I"
  "SQL": "sequel"
  "GIF": "jiff"
  "nginx": "engine-x"

Adding Pronunciations

  1. Go to Voice Settings
  2. Click Pronunciation Overrides
  3. Add word and phonetic spelling
  4. Test with voice preview

Auto-enabled Features

These settings are enabled by default:
FeatureDescription
Allow interruptionsCaller can interrupt agent
Speech normalizationNumbers read naturally
Noise filteringBackground noise reduced
Background speech detectionHandles side conversations

Configuration Example

Complete voice configuration:
Voice Settings:
  Provider: Cartesia
  Voice: "Sonic 3 - Professional Male"
  Speed: 1.0
  Volume: 80
  Responsiveness: 5

  Emotion:
    Type: friendly
    Intensity: medium

  Back-channeling:
    Enabled: true
    Frequency: 5
    Words: ["mm-hmm", "I see", "right"]

  Pre-response:
    Enabled: true
    Phrases:
      - "Let me check..."
      - "One moment..."

  Pronunciations:
    "TechFlow": "TECH-flow"
    "API": "A P I"

Testing Voice Settings

  1. Open the Test Call panel
  2. Have a conversation with your agent
  3. Listen for:
    • Natural pacing
    • Clear pronunciation
    • Appropriate emotion
    • Smooth back-channeling
  4. Adjust settings and retest

Best Practices

Match Brand Voice

Choose a voice that fits your company’s personality.

Test on Phone

Audio sounds different on phone vs. computer. Test both.

Moderate Back-channeling

Too much feels robotic; too little feels distant.

Add Key Pronunciations

Ensure company names, products, and jargon sound correct.