Build a Voice Survey Bot: Automate Customer Feedback Calls with AI

You just shipped a new feature. Your product manager wants feedback. Your marketing team schedules a survey email. Three days later: 2% open rate, 0.3% completion.

Email surveys are dead. But what if instead of hoping customers find your survey, your survey called them?

In this post, we'll build a voice survey bot that calls customers, reads questions aloud, captures their spoken answers, and stores structured responses — all with about 100 lines of Python and no telephony expertise required.

Why Voice Surveys Work

Response rates for voice surveys hover around 30-50% compared to <5% for email. The reason is simple: a ringing phone demands attention in a way an email never will.

And with AI-powered speech recognition, you don't have to force users into clunky keypad menus ("Press 1 for satisfied..."). You can ask open-ended questions and let them answer naturally — then analyze the transcripts programmatically.

The catch: building this traditionally means wrangling SIP trunks, RTP streams, DTMF signaling, and TTS pipelines. That's weeks of work before you write your first survey question.

We'll skip all of that.

What We're Building

A Python service that:

Initiates outbound calls to a list of phone numbers
Plays a TTS greeting and survey questions
Records and transcribes spoken answers in real time
Stores structured survey responses

The telephony infrastructure — RTP handling, STT/TTS, codec negotiation — is handled by VoIPBin. Your code never touches audio.

Prerequisites

Python 3.9+
A VoIPBin account (free tier available)
pip install requests flask

Get your API token by signing up:

curl -X POST https://api.voipbin.net/v1.0/auth/signup \
  -H "Content-Type: application/json" \
  -d '{ "username": "yourname", "password": "yourpassword", "email": "you@example.com" }'

The response includes accesskey.token — copy it. No email verification, no waiting.

Step 1: Design Your Survey Flow

VoIPBin uses a Flow — a JSON-defined sequence of actions. For a survey, we want:

Greet the caller
Ask a question
Listen for a spoken response (and transcribe it)
Repeat for each question
Thank and hang up

Here's a minimal three-question survey flow:

import requests

VOIPBIN_TOKEN = "your_api_token"
WEBHOOK_URL = "https://your-server.com/survey-webhook"

headers = {
    "Authorization": f"Bearer {VOIPBIN_TOKEN}",
    "Content-Type": "application/json"
}

survey_flow = {
    "name": "customer-survey-v1",
    "actions": [
        {"type": "talk", "text": "Hi! This is a quick 3-question survey. Let's begin."},
        {"type": "talk", "text": "Q1: On a scale of 1-10, how satisfied are you with our service?"},
        {"type": "listen", "timeout": 5, "webhook": f"{WEBHOOK_URL}?question=1"},
        {"type": "talk", "text": "Q2: What is one thing we could do better?"},
        {"type": "listen", "timeout": 15, "webhook": f"{WEBHOOK_URL}?question=2"},
        {"type": "talk", "text": "Q3: Would you recommend us to a friend? Say yes or no."},
        {"type": "listen", "timeout": 5, "webhook": f"{WEBHOOK_URL}?question=3"},
        {"type": "talk", "text": "Thank you for your feedback. Have a great day!"},
        {"type": "hangup"}
    ]
}

response = requests.post(
    "https://api.voipbin.net/v1.0/flows",
    headers=headers,
    json=survey_flow
)
flow_id = response.json()["id"]
print(f"Survey flow created: {flow_id}")

The listen action is the key here. VoIPBin records the caller's speech, runs STT, and POSTs the transcript to your webhook. Your server receives clean text — zero audio processing on your end.

Step 2: Receive Transcripts via Webhook

Set up a Flask endpoint to receive transcribed answers:

from flask import Flask, request, jsonify
from datetime import datetime

app = Flask(__name__)
survey_responses = {}  # use a real DB in production

@app.route("/survey-webhook", methods=["POST"])
def handle_transcript():
    question_num = request.args.get("question")
    payload = request.json

    call_id = payload.get("call_id")
    transcript = payload.get("transcript", "").strip()
    timestamp = datetime.utcnow().isoformat()

    if call_id not in survey_responses:
        survey_responses[call_id] = {"started_at": timestamp, "answers": {}}

    survey_responses[call_id]["answers"][f"q{question_num}"] = {
        "transcript": transcript,
        "recorded_at": timestamp
    }

    print(f"[Call {call_id}] Q{question_num}: {transcript}")
    return jsonify({"status": "ok"})

@app.route("/responses", methods=["GET"])
def get_responses():
    return jsonify(survey_responses)

if __name__ == "__main__":
    app.run(port=5000)

Step 3: Launch the Survey Campaign

Trigger outbound calls to your customer list:

import time

def call_customer(phone_number, flow_id):
    payload = {
        "flow_id": flow_id,
        "source": "+12025550100",  # Your VoIPBin number
        "destination": phone_number
    }
    response = requests.post(
        "https://api.voipbin.net/v1.0/calls",
        headers=headers,
        json=payload
    )
    if response.status_code == 200:
        data = response.json()
        print(f"Called {phone_number} — ID: {data['id']}")
        return data["id"]
    print(f"Failed: {response.text}")
    return None

customers = ["+14155550101", "+14155550102", "+14155550103"]

for number in customers:
    call_customer(number, flow_id)
    time.sleep(2)  # pace your calls

For production, replace time.sleep with a proper queue (Celery, Redis) and add retry logic for unanswered calls.

Step 4: Analyze the Results

Once calls complete, your /responses endpoint returns structured data:

{
  "call_abc123": {
    "started_at": "2026-04-21T10:00:00",
    "answers": {
      "q1": { "transcript": "eight", "recorded_at": "2026-04-21T10:00:15" },
      "q2": { "transcript": "I wish the checkout was faster", "recorded_at": "2026-04-21T10:00:40" },
      "q3": { "transcript": "yes definitely", "recorded_at": "2026-04-21T10:01:00" }
    }
  }
}

From here, pipe the transcripts into an LLM for instant sentiment analysis:

import openai, json

def analyze_response(answers):
    transcript = "\n".join(
        [f"Q{k[1:]}: {v['transcript']}" for k, v in answers.items()]
    )
    result = openai.chat.completions.create(
        model="gpt-4o",
        messages=[{
            "role": "user",
            "content": (
                "Analyze this customer survey and return JSON with: "
                "satisfaction_score (1-10), sentiment (positive/neutral/negative), "
                f"key_feedback (one sentence).\n\n{transcript}"
            )
        }]
    )
    return json.loads(result.choices[0].message.content)

What You Get

With this setup, each survey call:

Takes ~2 minutes end-to-end
Costs a fraction of a call center
Returns structured, searchable transcripts
Scales to hundreds of calls per hour — no infrastructure changes needed

And critically — you wrote zero telephony code. No SIP, no RTP, no audio codecs. VoIPBin handles all of that. Your code is just Python sending HTTP requests and receiving webhooks.

Going Further

A few ways to extend this:

Conditional branching: If Q1 score < 5, route to a human agent for service recovery
Retry logic: If no answer, call again in 4 hours (once)
Multilingual surveys: Pass language in the talk action for automatic localization
DTMF fallback: If STT confidence is low, fall back to keypad input

The Bigger Picture

Voice feedback is one of the richest signals you can collect from customers. People speak more naturally than they type — you get tone, hesitation, and nuance that multiple-choice forms will never capture.

The only reason most products don't collect voice feedback is that it's been hard. That excuse is gone.

VoIPBin is an AI-native CPaaS that lets developers add voice and telephony to their applications without managing telecom infrastructure. Get started at voipbin.net — API access is instant, no verification required.