當(dāng)前位置：主頁 > 教程 > 服務(wù)器類 >

Linux系統(tǒng)的OOM Killer處理機(jī)制

來源：技術(shù)員聯(lián)盟┆發(fā)布時(shí)間：2018-07-05 18:18┆點(diǎn)擊：

　　最近有位 VPS 客戶抱怨 MySQL 無緣無故掛掉，還有位客戶抱怨 VPS 經(jīng)常死機(jī)，登陸到終端看了一下，都是常見的 Out of memory 問題。這通常是因?yàn)槟硶r(shí)刻應(yīng)用程序大量請求內(nèi)存導(dǎo)致系統(tǒng)內(nèi)存不足造成的，這通常會觸發(fā) Linux 內(nèi)核里的 Out of Memory (OOM) killer，OOM killer 會殺掉某個(gè)進(jìn)程以騰出內(nèi)存留給系統(tǒng)用，不致于讓系統(tǒng)立刻崩潰。如果檢查相關(guān)的日志文件(/var/log/messages)就會看到下面類似的 Out of memory: Kill process 信息：

　　...

　　Out of memory: Kill process 9682 (mysqld) score 9 or sacrifice child

　　Killed process 9682, UID 27, (mysqld) total-vm:47388kB, anon-rss:3744kB, file-rss:80kB

　　httpd invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0

　　httpd cpuset=http://www.3lian.com/ mems_allowed=0

　　Pid: 8911, comm: httpd Not tainted 2.6.32-279.1.1.el6.i686 #1

　　...

　　21556 total pagecache pages

　　21049 pages in swap cache

　　Swap cache stats: add 12819103, delete 12798054, find 3188096/4634617

　　Free swap = 0kB

　　Total swap = 524280kB

　　131071 pages RAM

　　0 pages HighMem

　　3673 pages reserved

　　67960 pages shared

　　124940 pages non-shared

　　Linux 內(nèi)核根據(jù)應(yīng)用程序的要求分配內(nèi)存，通常來說應(yīng)用程序分配了內(nèi)存但是并沒有實(shí)際全部使用，為了提高性能，這部分沒用的內(nèi)存可以留作它用，這部分內(nèi)存是屬于每個(gè)進(jìn)程的，內(nèi)核直接回收利用的話比較麻煩，所以內(nèi)核采用一種過度分配內(nèi)存(over-commit memory)的辦法來間接利用這部分 “空閑” 的內(nèi)存，提高整體內(nèi)存的使用效率。一般來說這樣做沒有問題，但當(dāng)大多數(shù)應(yīng)用程序都消耗完自己的內(nèi)存的時(shí)候麻煩就來了，因?yàn)檫@些應(yīng)用程序的內(nèi)存需求加起來超出了物理內(nèi)存(包括 swap)的容量，內(nèi)核(OOM killer)必須殺掉一些進(jìn)程才能騰出空間保障系統(tǒng)正常運(yùn)行。用銀行的例子來講可能更容易懂一些，部分人取錢的時(shí)候銀行不怕，銀行有足夠的存款應(yīng)付，當(dāng)全國人民(或者絕大多數(shù))都取錢而且每個(gè)人都想把自己錢取完的時(shí)候銀行的麻煩就來了，銀行實(shí)際上是沒有這么多錢給大家取的。

　　內(nèi)核檢測到系統(tǒng)內(nèi)存不足、挑選并殺掉某個(gè)進(jìn)程的過程可以參考內(nèi)核源代碼 linux/mm/oom_kill.c，當(dāng)系統(tǒng)內(nèi)存不足的時(shí)候，out_of_memory() 被觸發(fā)，然后調(diào)用 select_bad_process() 選擇一個(gè) “bad” 進(jìn)程殺掉，如何判斷和選擇一個(gè) “bad” 進(jìn)程呢，總不能隨機(jī)選吧?挑選的過程由 oom_badness() 決定，挑選的算法和想法都很簡單很樸實(shí)：最 bad 的那個(gè)進(jìn)程就是那個(gè)最占用內(nèi)存的進(jìn)程。

　　/**

　　* oom_badness - heuristic function to determine which candidate task to kill

　　* @p: task struct of which task we should calculate

　　* @totalpages: total present RAM allowed for page allocation

　　* The heuristic for determining which task to kill is made to be as simple and

　　* predictable as possible. The goal is to return the highest value for the

　　* task consuming the most memory to avoid subsequent oom failures.

　　unsigned long oom_badness(struct task_struct *p, struct mem_cgroup *memcg,

　　const nodemask_t *nodemask, unsigned long totalpages)

　　{

　　long points;

　　long adj;

　　if (oom_unkillable_task(p, memcg, nodemask))

　　return 0;

　　p = find_lock_task_mm(p);

　　if (!p)

　　return 0;

　　adj = (long)p->signal->oom_score_adj;

　　if (adj == OOM_SCORE_ADJ_MIN) {

　　task_unlock(p);

　　return 0;

　　}

　　* The baseline for the badness score is the proportion of RAM that each

　　* task's rss, pagetable and swap space use.

　　points = get_mm_rss(p->mm) + p->mm->nr_ptes +

　　get_mm_counter(p->mm, MM_SWAPENTS);

　　task_unlock(p);

　　* Root processes get 3% bonus, just like the __vm_enough_memory()

　　* implementation used by LSMs.

　　if (has_capability_noaudit(p, CAP_SYS_ADMIN))

　　adj -= 30;

　　/* Normalize to oom_score_adj units */

　　adj *= totalpages / 1000;

　　points += adj;

　　* Never return 0 for an eligible task regardless of the root bonus and

　　* oom_score_adj (oom_score_adj can't be OOM_SCORE_ADJ_MIN here).

　　return points > 0 ? points : 1;

　　}

　　上面代碼里的注釋寫的很明白，理解了這個(gè)算法我們就理解了為啥 MySQL 躺著也能中槍了，因?yàn)樗捏w積總是最大(一般來說它在系統(tǒng)上占用內(nèi)存最多)，所以如果 Out of Memeory (OOM) 的話總是不幸第一個(gè)被 kill 掉。解決這個(gè)問題最簡單的辦法就是增加內(nèi)存，或者想辦法優(yōu)化 MySQL 使其占用更少的內(nèi)存，除了優(yōu)化 MySQL 外還可以優(yōu)化系統(tǒng)(優(yōu)化 Debian 5，優(yōu)化 CentOS 5.x)，讓系統(tǒng)盡可能使用少的內(nèi)存以便應(yīng)用程序(如 MySQL) 能使用更多的內(nèi)存，還有一個(gè)臨時(shí)的辦法就是調(diào)整內(nèi)核參數(shù)，讓 MySQL 進(jìn)程不容易被 OOM killer 發(fā)現(xiàn)。

上一篇：linux su和sudo命令的區(qū)別
下一篇：Apache偽靜態(tài)時(shí)rewrite匹配參數(shù)超過10個(gè)的處理方法

Linux系統(tǒng)的OOM Killer處理機(jī)制

常見問題

欄目

系統(tǒng)安裝常見問題