Buildbot performance and scaling
It seems like it was ages ago when I posted about profiling buildbot. One of the hot spots identified there was the dataReceived call. This has been sped up a little bit in recent versions of twisted, but our buildbot masters were still severely overloaded. It turns out that the buildbot slaves make a lot of RPC calls when sending log data, which results in tens of thousands of dataReceived calls. Multiply that by several dozen build slaves sending log data peaking at a combined throughput of 10 megabits/s and you've got an awful lot of data to handle. By adding a small slave-side buffer, the number of RPC calls to send log data is drastically reduced by an order of magnitude in some tests, resulting in a much better load situation on the master. This is good for us, because it means the masters are much more responsive, and it's good for everybody else because it means we have fewer failures and wasted time due to the master being too busy to handle everything. It also means we can throw more build slaves onto the masters! The new code was deployed towards the end of the day on the 26th, or the end of the 12th week.
Comments