ProblemNetflix has a number of high throughput, low latency mid tier services. In one of these services, it was observed that in case there is a huge surge in traffic in a very short span of time, the machines became cpu starved and would become unresponsive. This would lead...
الثلاثاء، 28 يوليو 2015
الجمعة، 24 يوليو 2015
Java in Flames

Java mixed-mode flame graphs provide a complete visualization of CPU usage and have just been made possible by a new JDK option: -XX:+PreserveFramePointer. We've been developing these at Netflix for everyday Java performance analysis as they can identify all CPU consumers and...
الثلاثاء، 14 يوليو 2015
Tracking down the Villains: Outlier Detection at Netflix
It’s 2 a.m. and half of our reliability team is online searching for the root cause of why Netflix streaming isn’t working. None of our systems are obviously broken, but something is amiss and we’re not seeing it. After an hour of searching we realize there is one rogue server...