dbenv-> txn_checkpoint can't finish the master HA

BDB dear gurus,.

I tested my HA with bdb4.6.21 api base code. Every minute, I started a pthread dedicated for control points. The thread checkpointing feature is as below:

for (; sleep (WRITE_CHECKPOINT_INTERVAL)) {}

If ((ret = dbenv-> txn_checkpoint (dbenv, 0, 0, DB_FORCE))! = 0) {}

trace (TRACE_LEVEL_INFO, "%s: write checkpoint failed for %s.\n", __FUNCTION__, db_strerror (ret));

}

trace (TRACE_LEVEL_INFO, "% s: writing checkpoint %s.\n", __FUNCTION__, db_strerror (ret));

RET = dbenv-> log_archive (dbenv, NULL, DB_ARCH_REMOVE);

If {(retired)

trace (TRACE_LEVEL_INFO, "% s: log_archive failed to eliminate newspapers %s.\n", __FUNCTION__, db_strerror (ret));

}

}

When I run my code without env DB_INIT_REP Pavilion, the control point and autoremove works. And I got the track ' write the successful return of control point: 0 ".

But when I run my code with DB_INIT_REP set and a single master, a replica, it seems that the function txn_checkpoint call never returns. Backtrace from gdb to the thread of control point can be seen below:

#0 0x00007fa59f92d033 in select() to... / sysdeps/UNIX/syscall-template. S:82

#1 0x00007fa59ff128c5 (/usr/local/BerkeleyDB.4.6/lib/libdb-4.6.so __os_sleep)

#2 0x00007fa59ff1bd94 (/usr/local/BerkeleyDB.4.6/lib/libdb-4.6.so __txn_checkpoint)

#3 0x00007fa59ff1c0f0 (/usr/local/BerkeleyDB.4.6/lib/libdb-4.6.so __txn_checkpoint_pp)

#4 0 x 0000000000404288 in xxx (arg = 0xc0d010) at xxx.c:1776

#5 0x00007fa59fc06e9a in start_thread (arg = 0x7fa59bda4700) at pthread_create.c:308

#6 0x00007fa59f933ccd in clone () to... / sysdeps/UNIX/SysV/Linux/x86_64/clone. S:112

#7 0 x 0000000000000000 in? ()

How can I proceed?

Thanks, Min

I guess that your thread of control point is executed on your master.

The call to __os_sleep() is planned at a control point on the master. Control points can take a long time. When a master does a control point, registration for this control point is sent to the customer and the customer must also make a point of control. It is possible that the customer might have different hardware or features I/O causing sound control to take much longer than the control point. So we wait for extra time after a master control station to allow customer checkpoint complete. This allows to improve the ability of the client to follow the master.

You can control the amount of overtime that wait us after the master control with the time-out DB_REP_CHECKPOINT_DELAY point that you can set with the call to DB_ENV-> rep_set_timeout(). The default value for DB_REP_CHECKPOINT_DELAY is quite long - 30 seconds. If you make the control points every minute, you probably want to reduce this time-out value.

Paula Bingham

Oracle

Tags: Database

Similar Questions

Maybe you are looking for