You just shipped a new feature. Your product manager wants feedback. Your marketing team schedules a survey email. Three days later: 2% open rate, 0.3% completion.
Email surveys are dead. But what if instead of hoping customers find your survey, your survey called them?
In this post, we'll build a voice survey bot that calls customers, reads questions aloud, captures their spoken answers, and stores structured responses — all with about 100 lines of Python and no telephony expertise required.
Why Voice Surveys Work
Response rates for voice surveys hover around 30-50% compared to <5% for email. The reason is simple: a ringing phone demands attention in a way an email never will.
And with AI-powered speech recognition, you don't have to force users into clunky keypad menus ("Press 1 for satisfied..."). You can ask open-ended questions and let them answer naturally — then analyze the transcripts programmatically.
The catch: building this traditionally means wrangling SIP trunks, RTP streams, DTMF signaling, and TTS pipelines. That's weeks of work before you write your first survey question.
We'll skip all of that.
What We're Building
A Python service that:
- Initiates outbound calls to a list of phone numbers
- Plays a TTS greeting and survey questions
- Records and transcribes spoken answers in real time
- Stores structured survey responses
The telephony infrastructure — RTP handling, STT/TTS, codec negotiation — is handled by VoIPBin. Your code never touches audio.
Prerequisites
- Python 3.9+
- A VoIPBin account (free tier available)
pip install requests flask
Get your API token by signing up:
curl -X POST https://api.voipbin.net/v1.0/auth/signup \
-H "Content-Type: application/json" \
-d '{ "username": "yourname", "password": "yourpassword", "email": "you@example.com" }'
The response includes accesskey.token — copy it. No email verification, no waiting.
Step 1: Design Your Survey Flow
VoIPBin uses a Flow — a JSON-defined sequence of actions. For a survey, we want:
- Greet the caller
- Ask a question
- Listen for a spoken response (and transcribe it)
- Repeat for each question
- Thank and hang up
Here's a minimal three-question survey flow:
import requests
VOIPBIN_TOKEN = "your_api_token"
WEBHOOK_URL = "https://your-server.com/survey-webhook"
headers = {
"Authorization": f"Bearer {VOIPBIN_TOKEN}",
"Content-Type": "application/json"
}
survey_flow = {
"name": "customer-survey-v1",
"actions": [
{"type": "talk", "text": "Hi! This is a quick 3-question survey. Let's begin."},
{"type": "talk", "text": "Q1: On a scale of 1-10, how satisfied are you with our service?"},
{"type": "listen", "timeout": 5, "webhook": f"{WEBHOOK_URL}?question=1"},
{"type": "talk", "text": "Q2: What is one thing we could do better?"},
{"type": "listen", "timeout": 15, "webhook": f"{WEBHOOK_URL}?question=2"},
{"type": "talk", "text": "Q3: Would you recommend us to a friend? Say yes or no."},
{"type": "listen", "timeout": 5, "webhook": f"{WEBHOOK_URL}?question=3"},
{"type": "talk", "text": "Thank you for your feedback. Have a great day!"},
{"type": "hangup"}
]
}
response = requests.post(
"https://api.voipbin.net/v1.0/flows",
headers=headers,
json=survey_flow
)
flow_id = response.json()["id"]
print(f"Survey flow created: {flow_id}")
The listen action is the key here. VoIPBin records the caller's speech, runs STT, and POSTs the transcript to your webhook. Your server receives clean text — zero audio processing on your end.
Step 2: Receive Transcripts via Webhook
Set up a Flask endpoint to receive transcribed answers:
from flask import Flask, request, jsonify
from datetime import datetime
app = Flask(__name__)
survey_responses = {} # use a real DB in production
@app.route("/survey-webhook", methods=["POST"])
def handle_transcript():
question_num = request.args.get("question")
payload = request.json
call_id = payload.get("call_id")
transcript = payload.get("transcript", "").strip()
timestamp = datetime.utcnow().isoformat()
if call_id not in survey_responses:
survey_responses[call_id] = {"started_at": timestamp, "answers": {}}
survey_responses[call_id]["answers"][f"q{question_num}"] = {
"transcript": transcript,
"recorded_at": timestamp
}
print(f"[Call {call_id}] Q{question_num}: {transcript}")
return jsonify({"status": "ok"})
@app.route("/responses", methods=["GET"])
def get_responses():
return jsonify(survey_responses)
if __name__ == "__main__":
app.run(port=5000)
Step 3: Launch the Survey Campaign
Trigger outbound calls to your customer list:
import time
def call_customer(phone_number, flow_id):
payload = {
"flow_id": flow_id,
"source": "+12025550100", # Your VoIPBin number
"destination": phone_number
}
response = requests.post(
"https://api.voipbin.net/v1.0/calls",
headers=headers,
json=payload
)
if response.status_code == 200:
data = response.json()
print(f"Called {phone_number} — ID: {data['id']}")
return data["id"]
print(f"Failed: {response.text}")
return None
customers = ["+14155550101", "+14155550102", "+14155550103"]
for number in customers:
call_customer(number, flow_id)
time.sleep(2) # pace your calls
For production, replace time.sleep with a proper queue (Celery, Redis) and add retry logic for unanswered calls.
Step 4: Analyze the Results
Once calls complete, your /responses endpoint returns structured data:
{
"call_abc123": {
"started_at": "2026-04-21T10:00:00",
"answers": {
"q1": { "transcript": "eight", "recorded_at": "2026-04-21T10:00:15" },
"q2": { "transcript": "I wish the checkout was faster", "recorded_at": "2026-04-21T10:00:40" },
"q3": { "transcript": "yes definitely", "recorded_at": "2026-04-21T10:01:00" }
}
}
}
From here, pipe the transcripts into an LLM for instant sentiment analysis:
import openai, json
def analyze_response(answers):
transcript = "\n".join(
[f"Q{k[1:]}: {v['transcript']}" for k, v in answers.items()]
)
result = openai.chat.completions.create(
model="gpt-4o",
messages=[{
"role": "user",
"content": (
"Analyze this customer survey and return JSON with: "
"satisfaction_score (1-10), sentiment (positive/neutral/negative), "
f"key_feedback (one sentence).\n\n{transcript}"
)
}]
)
return json.loads(result.choices[0].message.content)
What You Get
With this setup, each survey call:
- Takes ~2 minutes end-to-end
- Costs a fraction of a call center
- Returns structured, searchable transcripts
- Scales to hundreds of calls per hour — no infrastructure changes needed
And critically — you wrote zero telephony code. No SIP, no RTP, no audio codecs. VoIPBin handles all of that. Your code is just Python sending HTTP requests and receiving webhooks.
Going Further
A few ways to extend this:
- Conditional branching: If Q1 score < 5, route to a human agent for service recovery
- Retry logic: If no answer, call again in 4 hours (once)
-
Multilingual surveys: Pass
languagein thetalkaction for automatic localization - DTMF fallback: If STT confidence is low, fall back to keypad input
The Bigger Picture
Voice feedback is one of the richest signals you can collect from customers. People speak more naturally than they type — you get tone, hesitation, and nuance that multiple-choice forms will never capture.
The only reason most products don't collect voice feedback is that it's been hard. That excuse is gone.
VoIPBin is an AI-native CPaaS that lets developers add voice and telephony to their applications without managing telecom infrastructure. Get started at voipbin.net — API access is instant, no verification required.









