Author Topic: What's the right way to recover from a failed client connection?  (Read 10838 times)

tastewar

  • Jr. Member
  • **
  • Posts: 50
I have a client application that makes many, many http connections. Works fine except when it doesn't :-)

My code looks something like:

Code: [Select]
  if ( wifi.connect ( ServerName, 80 ) == 1 )
  {
    // blah, blah, blah...
  }
  else Serial.println ( "Failed to connect :-(" );
And of course, that's in a function that gets called out of the main loop. Seems like when I see this once, that's game over. That's the only output I see from that point on. And while my code should keep re-attempting to connect to things, after a dozen of so of those attempts, even that stops.

It might initially run for 10 minutes, or half an hour, or two hours. But it does seem to eventually get into that mode.

Is there something I need to do, code-wise, to recover from such a failure?

meldrath

  • Newbie
  • *
  • Posts: 21
Re: What's the right way to recover from a failed client connection?
« Reply #1 on: March 03, 2014, 12:29:27 am »
If you put a timer in there, you can flag when you get stuck, and have it skip that client, retry it, or something else. Basically a watchdog timer.

tastewar

  • Jr. Member
  • **
  • Posts: 50
Re: What's the right way to recover from a failed client connection?
« Reply #2 on: March 03, 2014, 05:01:10 am »
There's just two sites it reads from, a handful of XML pages to digest. And it ought to just periodically retry. And it does for a while. There's nothing in my code to tell it to stop retrying, but the output stops, suggesting the program is stuck. On what, I'm not sure (the joys of troubleshooting Arduino in general...).

When it fails, I can bring up the URL on my computer just fine, so there oughtn't to be a problem. It's something on the DigiX that's failing.

tastewar

  • Jr. Member
  • **
  • Posts: 50
Re: What's the right way to recover from a failed client connection?
« Reply #3 on: March 03, 2014, 09:24:49 am »
So, I was reading about power from USB maybe not being sufficient, so I plugged in an external power supply, and managed to get into the bad state after about 10 minutes. So it appears that it's not a power issue in this case. Also made one other potentially interesting observation: seems as though it had exactly 10 failed connections before apparently getting hung on the 11th attempt. Almost suggests there's some resource that needs to be freed up after a failed connect attempt that I'm not doing...

meldrath

  • Newbie
  • *
  • Posts: 21
Re: What's the right way to recover from a failed client connection?
« Reply #4 on: March 03, 2014, 10:20:26 am »
Are you creating a new DigiFI object when you are failing to retry or anything like that?

tastewar

  • Jr. Member
  • **
  • Posts: 50
Re: What's the right way to recover from a failed client connection?
« Reply #5 on: March 03, 2014, 10:22:21 am »
No; currently just retrying. If there's something that needs to be done when the "connect" method fails, I'd love to know!

meldrath

  • Newbie
  • *
  • Posts: 21
Re: What's the right way to recover from a failed client connection?
« Reply #6 on: March 03, 2014, 10:36:22 am »
So what I think is happening is... you're right. Resources aren't getting recycled when it's hanging in your particular part of the client connect process.

This connect method has 2 parts where it checks for timeouts. You might want to implement a timeout in before that in any "if" code block that evaluates true. It's probably getting stuck in there or having a hard time getting out.

I don't know enough about how the wifi works to guess where to implement this in.

Further, you can do serial prints out to your screen inside the method at various places to see where it's hanging up specifically in that code. (You'll have to edit the DigiFi.cpp for this)

tastewar

  • Jr. Member
  • **
  • Posts: 50
Re: What's the right way to recover from a failed client connection?
« Reply #7 on: March 03, 2014, 12:14:11 pm »
So, just to update, the next time it got stuck, there had been 11 timeouts. So it appears not to be a *simple* issue.

I certainly could put in debug output in the DigiFi code; was hoping someone from Digistump might chime in with some knowledge...

digistump

  • Administrator
  • Hero Member
  • *****
  • Posts: 1465
Re: What's the right way to recover from a failed client connection?
« Reply #8 on: March 12, 2014, 03:00:30 pm »
Can you try this with the chunked transfer mode example - this might help ensure the connections get closed, as I my bet would be that the max connection limit is being hit.

tastewar

  • Jr. Member
  • **
  • Posts: 50
Re: What's the right way to recover from a failed client connection?
« Reply #9 on: March 12, 2014, 07:01:33 pm »
Which one is that? None of the examples I see say "Chunked" in their name.

If you're asking me to re-write my client application, that would be challenging as the XML parsing library I rely heavily on is designed to deal with a byte at a time.

Is there some other call I can include when I'm done with a connection to ensure that it gets cleaned up?

BTW, I haven't seen the timeout issue much lately. The more recent problem has been lock-ups. And I thought I had a handle on that with the Watchdog timer, but I now seem to be having a problem with that :-(

digistump

  • Administrator
  • Hero Member
  • *****
  • Posts: 1465
Re: What's the right way to recover from a failed client connection?
« Reply #10 on: March 15, 2014, 12:12:11 am »
my mistake - the git repo hadn't been pushed to master - now it is on the github repo

I'm not asking you to rewrite your application - my concern is finding solutions for the hardware and libraries - but I can't guarantee it will work as-is either - I've never used your application or worked with it. But if chunked mode makes it work then we'll know it is breaking due to too many connections left un-closed on the client side - which would help narrow it down. While the DigiX is 100% Due compatible - the Due doesn't have a WiFi module and the WiFi module like every other WiFi module out there has its own quirks and is still relatively new.

tastewar

  • Jr. Member
  • **
  • Posts: 50
Re: What's the right way to recover from a failed client connection?
« Reply #11 on: March 15, 2014, 07:35:29 pm »
I think we might be talking past each other a bit. I looked at the chunked example, and it's a server-side application. My device is acting as a client. When I'm speaking of a "failed client connection" I'm talking about the outgoing connection the device is making. But things are in a lot happier place right now. I had been considering some cases "errors" that probably weren't. Given that the documents I'm retrieving are XML and thus have a structure, I can tell if I've gotten to the end, or at least to the end of the data I care about. When I added that bit in, I reset a lot less frequently now.

digistump

  • Administrator
  • Hero Member
  • *****
  • Posts: 1465
Re: What's the right way to recover from a failed client connection?
« Reply #12 on: March 16, 2014, 02:20:40 am »
That would be a major disconnect due to my misunderstanding - most issues have been with server side - which is why I assumed that.

With the digix being a client you could try the TCP OFF command - and then TCP ON - sorry I'm not on a computer this moment to look up the function in the library but it is easy to find in the source. This would reset the TCP connection - in case it was related to open connections or similar.

If you haven't yet you could also try enabling Flow Control and bump up the baud rate - make sure to do it both in your wifi.begin statement as well as on the module through the web admin - that will ensure that it isn't due to lost data between module and MCU


tastewar

  • Jr. Member
  • **
  • Posts: 50
Re: What's the right way to recover from a failed client connection?
« Reply #13 on: March 16, 2014, 10:05:33 am »
I would love to switch to a faster speed and flow control. Is it true that all that's required is the change in the call to begin, and making the corresponding change to the wifi module via the web interface?

I am currently using the "stop" function when I encounter anything that appears to indicate the connection is no longer healthy, and that calls TCP OFF, although it looks like "connect" does this at the beginning anyway.

I'm going to add some statistics to my project, so I can later query how many times I've had to reset wifi, etc.

tastewar

  • Jr. Member
  • **
  • Posts: 50
Re: What's the right way to recover from a failed client connection?
« Reply #14 on: March 16, 2014, 12:48:58 pm »
I'm now using 57600 baud with flow control. Is there a suggested, or tested, limit to how fast one should set this interface to?