π Hello Team!
Today I want to share an important lesson we discovered in production
π Default timeouts matter! Especially when you’re working with RabbitMQ and long‑running consumers.
π What’s the default timeout in RabbitMQ?
✅ By default, RabbitMQ expects a message acknowledgment from the consumer within 30 minutes.
✅ If no ACK is received in that window, RabbitMQ assumes:
“This consumer might have failed!”
…and re‑queues the message to another consumer.
⚡ Why did this become a problem for us?
Our consumers trigger shell scripts that sometimes run for 2–3 hours.
π ️ In development, our test scripts were small (20–25 mins), so we never hit the 30‑minute limit.
π But in production:
-
After 30 minutes, RabbitMQ re‑delivered the same message.
-
That led to multiple consumers processing the same job.
-
A single 1‑hour job started running 5 to 10 times in parallel! π±
π Root Cause Analysis (RCA)
✔️ RabbitMQ was working as designed—but our process simply exceeded its default timeout.
✅ The Fix
We updated the message acknowledgment timeout from 30 minutes → 24 hours.
✅ This aligns with our actual processing time.
✅ Now, each job runs exactly once—no more duplicate executions. π
✨ Key Takeaway
π‘ Always review and configure default timeouts when integrating systems—especially if you have long‑running consumers or processes.
A small setting can make a huge difference in production stability. π
Comments
Post a Comment