Server Hung in Blocked Checkpoint

问题描述:
在server触发一个checkpoint要求时,如果server发现一些不正常的情况,则server会阻塞在 CKPT REQ状态以此防止造成数据的不一致性。

onstat -
IBM Informix Dynamic Server Version 9.40.UC3 -- On-line (CKPT REQ) -- Up 00:03:37 -- 141312 Kbytes
Blocked:CKPT

解答:
有很多原因会导致server blocked checkpoint. 下面是最常见的原因及解决方法:

1)逻辑日志满
运行'onstat -l'检查逻辑日志状态
检查online.log找到logical log full的消息
解决方法:
备份逻辑日志,使用ontape或onbar

2) Dbspace 或 Chunk标志为Down :

运行 'onstat -d' 检查 dbspace and chunk对应的 flags 列状态,是否有标志为PD
同时检查onconfig 文件中参数ONDBSPACEDOWN 设置的值和 online.log提示消息
解决方法:
检查相关chunks的正确路径和权限,及其底层存储的报错信息。
如果 ONDBSPACEDOWN 设置为2,运行'onmode -O'

3)有事务在临界区:
运行 'onstat -u' 并且检查 flags 列 ,找到是否有某个session处于临界区,flag列的第五个标志位为X
解决方法:
通过 'onstat -g ses ' 输出确认是哪个sql是running状态
可以通过onmode –z sid 杀掉这个session。

最后在一国外网站找到解决办法(我的情况属于第一种,逻辑日志满了):

我操作了第4、5、7步,最后bar_act.log日志如下:

2009-12-26 16:34:52 8174  8172 /home/informix/bin/onbar_d -b -l
 2009-12-26 16:34:52 8174  8172 Logical Logs will not be backed up because LTAPEDEV value is /dev/null
 2009-12-26 16:34:52 8174  8172 A log backup is already running. Can't start another.
 2009-12-26 16:34:52 8174  8172 /home/informix/bin/onbar_d complete, returning 152 (0x98)

online.log

16:35:39  Logical Log 98 - Backup Completed
16:35:39  Logical Log 99 - Backup Started
16:35:39  Logical Log 99 - Backup Completed
16:35:39  Logical Log 100 - Backup Started
16:35:40  Logical Log 100 - Backup Completed
16:36:16  Logical Log 101 - Backup Started
16:36:16  Logical Log 101 - Backup Completed
16:36:17  Logical Log 101 Complete.
16:36:19  Process exited with return code 152: /bin/sh /bin/sh -c /home/informix/etc/log_full.sh 2 23 "Logical Log 101 Complete." "Logical Log 101 Complete."

数据库状态也变为:

$ onstat -
Informix Dynamic Server 2000 Version 9.21.UC2     -- On-Line -- Up 03:58:34 -- 27084 Kbytes

Dev : Informix's Logical Log Files are Full -- Backup is Needed

Due to certain condition, such as a big chunk of data is trying to roll back, Informix database server tend to use up all the logical spaces that are assigned to it.

If one day you found that your informix server is not functioning properly, why not check at the logical log file first? I've spend my whole day standing in the server room learning this lesson. This is how I manage to resolve it:

1. If logical file is full, You'll see this error in informix's log file :Logical Log Files are Full -- Backup is Needed. You will not be able to search certain tables and you may also encounter table lock often.

2. type onstat -l to list the list of logical file that you have. If it is due to this error, you will see most, if not all, logical space is 100%.

3. type onmode -l to clear logical log file.

4. If the logical space is fully utilised, onmode -l cannot be use. It will not succeed. You need to goto %informix%/etc/onconfig.std (%ONCONFIG file), change the LTAPEDEV value to /dev/null.

5. Run ontape -a to run backup. Follow instruction.

6. Then run onmode -l again to clear logical file one by one.

7. Change back LTAPEDEV to its original value (/dev/tapedev).

8. Restart DB.

此文章由 flyinweb 于 2009-12-26 16:51:54 编辑

本日志由 flyinweb 于 2009-12-26 13:05:16 发表,目前已经被浏览 280 次,评论 0 次;

作者添加了以下标签: Blocked:CKPTOn-line (CKPT REQ)

引用通告:http://www.517sou.net/Article/368/Trackback.ashx

评论订阅:http://www.517sou.net/Article/368/Feeds.ashx

评论列表

    暂时没有评论
(必填)
(必填,不会被公开)