Backup of client failing

View: New views
5 Messages — Rating Filter:   Alert me  

Backup of client failing

by shoaib r :: Rate this Message:

| View Threaded | Show Only this Message

Hi
 
Does anyone have an ideas why my client box doesn't get backed up? amcheck come back fine, but the backups do not complete.
 
the tapeserver communicates ok with the client via inetd.
 
If I specify just the filesystems on the tapeserver, the backups complete sucessfully within 7 minutes.
 
But when I include the client server (cleint01) filesystems, things time out. I've played around with all the timeout settings in amanda.conf with no progress at all.
 
planner: FAILED client01 /u01 20051018 0 [disk /u01, all estimate timed out]
planner: FAILED client01 /opt 20051018 0 [disk /opt, all estimate timed out]
planner: FAILED client01 /var 20051018 0 [disk /var, all estimate timed out]
planner: FAILED client01 /usr 20051018 0 [disk /usr, all estimate timed out]
planner: FAILED client01 / 20051018 0 [disk /, all estimate timed out]
Any ideas?
 
Thanks
 
Shabs


To help you stay safe and secure online, we've developed the all new Yahoo! Security Centre.

Re: Backup of client failing

by Paul Bijnens :: Rate this Message:

| View Threaded | Show Only this Message

shoaib r wrote:
> planner: FAILED client01 / 20051018 0 [disk /, all estimate timed out]
> Any ideas?

There is probably a hint in the debug files on the client in
/tmp/amanda/*.debug .



--
Paul Bijnens, Xplanation                            Tel  +32 16 397.511
Technologielaan 21 bus 2, B-3001 Leuven, BELGIUM    Fax  +32 16 397.512
http://www.xplanation.com/          email:  Paul.Bijnens@...
***********************************************************************
* I think I've got the hang of it now:  exit, ^D, ^C, ^\, ^Z, ^Q, ^^, *
* F6, quit, ZZ, :q, :q!, M-Z, ^X^C, logoff, logout, close, bye, /bye, *
* stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt,  abort,  hangup, *
* PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e,  kill -1 $$,  shutdown, *
* init 0, kill -9 1, Alt-F4, Ctrl-Alt-Del, AltGr-NumLock, Stop-A, ... *
* ...  "Are you sure?"  ...   YES   ...   Phew ...   I'm out          *
***********************************************************************



Parent Message unknown Re: Backup of client failing

by Paul Bijnens :: Rate this Message:

| View Threaded | Show Only this Message

Please keep the discussion on the list, there may be other
people with tips of the solution, and it may help other
people with the same problem.


shoaib r wrote:

> The debug logs gave me this as the only error:
>  
> /tmp/amanda/amandad.20051021010122.debug
> ------------------------------------------------
> Amanda 2.4 NAK HANDLE 000-000344C0 SEQ 1129852801
> ERROR amandad busy
> ----
> amandad: time 3615.713: got packet:
> ----
> Amanda 2.4 REQ HANDLE 000-00034708 SEQ 1129852802
> ------------------------------------------------

The "amandad busy" means that the server did sent a request while
the client was still busy with processing another request.

That could happen when you have more than one amandaserver and
the amandad on the client is already busy with another server.
Make sure a client has all his disklist entries in only one server.

Or you could have a stuck amandad client, still handling some request
from the previous test.  Kill all the amanda processes on the client,
and try again.

--
Paul Bijnens, Xplanation                            Tel  +32 16 397.511
Technologielaan 21 bus 2, B-3001 Leuven, BELGIUM    Fax  +32 16 397.512
http://www.xplanation.com/          email:  Paul.Bijnens@...
***********************************************************************
* I think I've got the hang of it now:  exit, ^D, ^C, ^\, ^Z, ^Q, ^^, *
* F6, quit, ZZ, :q, :q!, M-Z, ^X^C, logoff, logout, close, bye, /bye, *
* stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt,  abort,  hangup, *
* PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e,  kill -1 $$,  shutdown, *
* init 0, kill -9 1, Alt-F4, Ctrl-Alt-Del, AltGr-NumLock, Stop-A, ... *
* ...  "Are you sure?"  ...   YES   ...   Phew ...   I'm out          *
***********************************************************************



Re: Backup of client failing

by shoaib r :: Rate this Message:

| View Threaded | Show Only this Message

Apologies for not keeping my last posting on the list (unintentional - I just hit reply rather than selecting reply-all from the drop-down list)
 
The client does not have a disklist file and I'm pretty sure its not a server for anything else.
 
Your 2nd assumption of a stuck amandad process on the client was correct. I've killed the amanda processes and tried again except this time I've included only one filesystem from the tape server and one from the client server - that completed fine. No problems. But when I include all the filesystems on the client machine, I get the same error all over again:
 
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
amandad: time 4199.964: received other packet, NAKing it
  addr: peer 51.108.14.49 dup 51.108.14.49, port: peer 824 dup 841
amandad: time 4199.964: sending nack:
----
Amanda 2.4 NAK HANDLE 000-000344C0 SEQ 1129883130
ERROR amandad busy
----
amandad: time 4215.041: got packet:
----
Amanda 2.4 REQ HANDLE 000-00034708 SEQ 1129883131
SECURITY USER amanda
SERVICE sendbackup
OPTIONS features=fffffeff9ffe7f;hostname=ptpreq01;
GNUTAR /export  0 1970:1:1:0:0:0 OPTIONS |;bsd-auth;no-record;index;exclude-list=/usr/local/lib/amanda/exclude.gtar;
----
amandad: time 4215.041: received other packet, NAKing it
  addr: peer 51.108.14.49 dup 51.108.14.49, port: peer 824 dup 841
amandad: time 4215.041: sending nack:
----
Amanda 2.4 NAK HANDLE 000-00034708 SEQ 1129883131
ERROR amandad busy
----
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------

Paul Bijnens <paul.bijnens@...> wrote:
Please keep the discussion on the list, there may be other
people with tips of the solution, and it may help other
people with the same problem.


shoaib r wrote:

> The debug logs gave me this as the only error:
>
> /tmp/amanda/amandad.20051021010122.debug
> ------------------------------------------------
> Amanda 2.4 NAK HANDLE 000-000344C0 SEQ 1129852801
> ERROR amandad busy
> ----
> amandad: time 3615.713: got packet:
> ----
> Amanda 2.4 REQ HANDLE 000-00034708 SEQ 1129852802
> ------------------------------------------------

The "amandad busy" means that the server did sent a request while
the client was still busy with processing another request.

That could happen when you have more than one amandaserver and
the amandad on the client is already busy with another server.!
Make sure a client has all his disklist entries in only one server.

Or you could have a stuck amandad client, still handling some request
from the previous test. Kill all the amanda processes on the client,
and try again.

--
Paul Bijnens, Xplanation Tel +32 16 397.511
Technologielaan 21 bus 2, B-3001 Leuven, BELGIUM Fax +32 16 397.512
http://www.xplanation.com/ email: Paul.Bijnens@...
***********************************************************************
* I think I've got the hang of it now: exit, ^D, ^C, ^\, ^Z, ^Q, ^^, *
* F6, quit, ZZ, :q, :q!, M-Z, ^X^C, logoff, logout, close, bye, /bye, *
* stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt, abort, hangup, *
* PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e, kill -1 $$, shutdown, *
* init 0, kill -9 1, Alt-F4, Ctrl-Alt-Del, AltGr-NumLock, Stop-A, ... *
* ... "Are you sure?" ... YES ... Phew ... I'm out *
***********************************************************************



To help you stay safe and secure online, we've developed the all new Yahoo! Security Centre.

Re: Backup of client failing

by Paul Bijnens :: Rate this Message:

| View Threaded | Show Only this Message

shoaib r wrote:

>  
> Your 2nd assumption of a stuck amandad process on the client was
> correct. I've killed the amanda processes and tried again except this
> time I've included only one filesystem from the tape server and one from
> the client server - that completed fine. No problems. But when I include
> all the filesystems on the client machine, I get the same error all
> over again:
>  
> ----------------------------------------------------------------------------------
> ----------------------------------------------------------------------------------
> amandad: time 4199.964: received other packet, NAKing it
>   addr: peer 51.108.14.49 dup 51.108.14.49, port: peer 824 dup 841
> amandad: time 4199.964: sending nack:
> ----
> Amanda 2.4 NAK HANDLE 000-000344C0 SEQ 1129883130
> ERROR amandad busy


Why does the server sent a request, while the client believes it is
still handling a previous one?  There are also duplicate packets.
Is there an ACK that got lost?
Is there a large number of disks for that host?

Maybe the reply UDP packet somehow did not get to the server completely.
Maybe because you hit a limit or bug in large UDP packet handling.
The packet that is sent to the server, is printed in the same amandad
debug file.
Using tcpdump/snoop/ethereal you may watch the traffic on the
client and host for port 10080, and compare the traces to verify that
what is sent by the client, is also received at the server.

What OS is the client? Server?  I've even heard about network devices
inbetween that mangled large UDP packets beyond recognition.

You may compare a amandad.debug file from a succeeded run with an
error run and see where it breaks.


--
Paul Bijnens, Xplanation                            Tel  +32 16 397.511
Technologielaan 21 bus 2, B-3001 Leuven, BELGIUM    Fax  +32 16 397.512
http://www.xplanation.com/          email:  Paul.Bijnens@...
***********************************************************************
* I think I've got the hang of it now:  exit, ^D, ^C, ^\, ^Z, ^Q, ^^, *
* F6, quit, ZZ, :q, :q!, M-Z, ^X^C, logoff, logout, close, bye, /bye, *
* stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt,  abort,  hangup, *
* PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e,  kill -1 $$,  shutdown, *
* init 0, kill -9 1, Alt-F4, Ctrl-Alt-Del, AltGr-NumLock, Stop-A, ... *
* ...  "Are you sure?"  ...   YES   ...   Phew ...   I'm out          *
***********************************************************************