The Bug You Don't Know You Have
You record a clean 9-second clip on your iPhone. You try to set it as your Telegram video avatar. Telegram accepts the file and... nothing happens. No error. No confirmation. The avatar just stays the same.
This happened to me, and then I saw it happening to friends. The problem isn't your clip. It's the codec.
iPhone shoots in HEVC (H.265) by default, wrapped in a .mov container. Telegram's video avatar system silently rejects it. No error message. No "invalid format." It just ignores the upload.
What Telegram Actually Requires
Telegram's video avatar spec is strict, and almost nobody documents it clearly. After digging through the Bot API docs and testing around 40 different input files, here's what actually matters:
- Codec: H.264 (libx264), not HEVC/H.265
- Resolution: exactly 800x800 pixels, square crop
- Duration: 10 seconds max
- File size: 2 MB max
- Audio: must be removed (Telegram rejects some audio tracks outright)
- Pixel format: yuv420p (not yuv420p10le, which HEVC and HDR video use)
- Faststart flag: moov atom must be at the front for streaming
Every one of these can trip you up independently. An iPhone HEVC clip fails on at least the first three simultaneously.
The ffmpeg Pipeline
The fix is a specific ffmpeg command. Here's what I landed on:
ffmpeg -i input.mov \
-vf "cropdetect=24:16:0,crop=iw:iw,scale=800:800:flags=lanczos,format=yuv420p" \
-c:v libx264 \
-crf 28 \
-preset fast \
-t 10 \
-an \
-movflags +faststart \
output.mp4
Breaking it down:
-
cropdetect=24:16:0scans the first few frames to find natural crop boundaries, removing letterboxing or pillarboxing if present -
crop=iw:iwcrops to a square using the detected width -
scale=800:800:flags=lanczosscales to exactly 800x800 with a quality-preserving filter -
format=yuv420pforces the pixel format Telegram requires -
libx264withcrf 28hits a good quality/size balance for under 2 MB -
-t 10trims to 10 seconds -
-anstrips audio -
+faststartmoves the moov atom to the front
For most iPhone clips (30fps, 1080p or 4K), this runs in 2-4 seconds on a small VPS.
The tricky part is the cropdetect pass. Without it, portrait shots or screen recordings with vertical bars look wrong at 800x800. Running a short detection pass first and feeding its output back into the filter chain gives a cleaner square crop.
In Python, I call ffmpeg in two passes: one quick probe to detect crop params, then the actual encode:
import subprocess, re, tempfile, os
def detect_crop(input_path: str) -> str:
result = subprocess.run(
[
'ffmpeg', '-t', '5', '-i', input_path,
'-vf', 'cropdetect=24:16:0',
'-f', 'null', '-'
],
capture_output=True, text=True
)
# cropdetect writes to stderr
crops = re.findall(r'crop=(\d+:\d+:\d+:\d+)', result.stderr)
return crops[-1] if crops else 'iw:ih:0:0'
def encode_avatar(input_path: str, output_path: str) -> None:
crop = detect_crop(input_path)
w = crop.split(':')[0]
subprocess.run(
[
'ffmpeg', '-y', '-i', input_path,
'-vf', f'crop={w}:{w},scale=800:800:flags=lanczos,format=yuv420p',
'-c:v', 'libx264',
'-crf', '28',
'-preset', 'fast',
'-t', '10',
'-an',
'-movflags', '+faststart',
output_path
],
check=True
)
The aiogram 3 Handler
I use aiogram 3.x for the Telegram bot side. Video uploads come in as a Message with either video or document. Some clients send .mov as an uncompressed document rather than a video, so you need to handle both. A minimal handler:
from aiogram import Router
from aiogram.types import Message, BufferedInputFile
import tempfile, os
router = Router()
@router.message()
async def handle_video(message: Message):
file_id = None
if message.video:
file_id = message.video.file_id
elif message.document and message.document.mime_type in (
'video/mp4', 'video/quicktime', 'video/x-matroska'
):
file_id = message.document.file_id
if not file_id:
return
await message.answer('Processing, hold on...')
with tempfile.TemporaryDirectory() as tmpdir:
input_path = os.path.join(tmpdir, 'input.mov')
output_path = os.path.join(tmpdir, 'output.mp4')
bot_file = await message.bot.get_file(file_id)
await message.bot.download_file(bot_file.file_path, input_path)
encode_avatar(input_path, output_path)
with open(output_path, 'rb') as f:
result_bytes = f.read()
await message.answer_video(
BufferedInputFile(result_bytes, filename='avatar.mp4'),
caption='Ready. Download and set as your Telegram video avatar.'
)
A few things worth noting:
- iPhone
.movfiles often arrive asdocumenttype when sent without compression. Thevideo/quicktimemime check catches them. -
tempfile.TemporaryDirectory()cleans up automatically, which matters when handling dozens of uploads per day. - The real bot also validates file size before downloading (rejects anything over 50 MB) and sends a
ChatAction.upload_videotyping indicator while processing.
Packaging It as @liveavabot
I wrapped this into @LiveAvaBot, a public Telegram bot anyone can use. Send it a video or GIF, it re-encodes and returns an 800x800 H.264 MP4 ready to set as a video avatar.
The bot runs on a Hetzner CX22, aiogram 3 with long-polling, ffmpeg 6. About 155 users so far, with a handful of conversions per day.
GIF support came for free. ffmpeg handles animated GIFs the same way as video, without audio to strip. Input format doesn't matter much as long as ffmpeg can read it.
One thing I didn't expect: some users send .mp4 files that look perfectly fine but still fail as Telegram avatars. Usually it's the pixel format (yuv420p10le from a phone that shot in HDR mode) or a non-square aspect ratio with no obvious letterboxing. The pipeline handles both cases without any special-casing.
Edge Cases and What's Next
4K input. Encoding 4K source is slow on a small VPS. I added a quick ffprobe check and reject anything above 3840px with a friendly message. Most users don't need 4K source for an 800x800 output anyway.
Corrupt files. ffmpeg exits non-zero on truly corrupt input. The subprocess call is wrapped in try/except and sends an error message back to the user rather than crashing the handler.
Duration over 10s. The -t 10 flag handles trimming silently. I added a note in the reply caption when the source was longer than 10 seconds so users know the clip got cut.
Output file size. CRF 28 lands under 2 MB for 10s at 800x800 in the vast majority of cases. For high-motion clips I added a fallback second encode at CRF 32. Haven't seen a failure after that.
What's coming next: a trimming UI so users can pick which 10-second window to use instead of always taking the beginning of the clip.
Built by me. @LiveAvaBot is open to use.













