27 October 2011

Tracking down a serious problem took this route while testing a public AMI:

  1. After a period of intense traffic, Varnish begins serving a 503 error.
  2. It turned out to be a backend error that could be resolved by restarting Apache.
  3. No errors in the Apache log.
  4. Grep for apache in /var/log messages and found "apache2 invoked oom-killer".
  5. oom-killer kills processes when the system runs out of memory. It was apparently choosing Apache in some cases.
  6. Further research indicated that the public AMI was built without swap space. Seriously. No swap space.
  7. Rather than adding a partition, I simply added a swap file to resume my testing.

blog comments powered by Disqus