Author Topic: Hardware deteriorating? No -- wifi reset wasn't working.  (Read 7908 times)

tastewar

  • Jr. Member
  • **
  • Posts: 50
Hardware deteriorating? No -- wifi reset wasn't working.
« on: March 13, 2014, 05:36:38 am »
I don't understand what's going on. My DigiX used to connect to WiFi with no problem, but now I have to reset it or power cycle it up to a dozen times or more before it does. It's sitting in the same location, and the AP hasn't moved either. I can't think that anything in the environment has changed. And in the past day, even if I am lucky enough to have it connect, it seems to lock up within a few connections.

The new info I received from JeffRand will definitely help -- I suspect I won't be manually resetting in so much, but it's just really not as stable as it was earlier. Seriously frustrated at this point!
« Last Edit: March 14, 2014, 06:19:29 pm by tastewar »

tastewar

  • Jr. Member
  • **
  • Posts: 50
Re: Hardware deteriorating? Or what?!
« Reply #1 on: March 13, 2014, 08:31:26 pm »
I am so glad to have gotten a reply from JeffRand on the Watchdog Timer -- it was such a relief to understand what was going on with that particular issue. I've now updated my project so that the WDT is enabled, and that is working as expected now! The sad thing is that given my current experience with the DigiX, what I expected is for it to basically do very little useful work, get hung up, and spend most of its time rebooting and attempting to connect to wifi. I just don't get it. It's great to have enthusiastic customers here, but I am really surprised that there aren't more helpful posts by the creators (is it just Erik, or are there more?). I love the idea of the DigiX, and I continue to believe in its potential, but I think there has to be more of an effort on the support front for it to catch on.

gogol

  • Sr. Member
  • ****
  • Posts: 398
Re: Hardware deteriorating? Or what?!
« Reply #2 on: March 14, 2014, 04:20:26 am »
What do you expect?
You open several threads about the same issue, but never explaining your whole setup and what exactly you are trying to get done.

I can't believe, that watchdog-reset can help you with wifi-problems.  I told you in one of your other threads, that you can reset the Wifi-module itself using pin 106.
Watchdog makes only sense if the whole application gets stuck.

Is that the case? If yes, please post the source and some of your debugging results, where you believe, that the program hangs.

For connection problems it would be also helpful, to understand your application and connections. There are many reasons, why things can get stuck. One example is, that you open to many parallel connections.

The DigiX  is an opensource product, it lives from the community.    Erik is the creator and coordinator and does an outstanding job!  If you wish to use products with included support plan, you need to go elsewhere (and pay 100 times more).

Opening new threads will not help. Describe your problem in one thread, describe what you are trying and why you decided to try. Report results!
Look at this thread for example: http://digistump.com/board/index.php/topic,1277.0.html


tastewar

  • Jr. Member
  • **
  • Posts: 50
Re: Hardware deteriorating? Or what?!
« Reply #3 on: March 14, 2014, 05:27:19 am »
Thanks for taking the time to respond, gogol.

Different people have different ideas about how to use a forum. Sometimes people complain when a thread talks about too many different issues. I've looked back at the threads I've started, and those I've contributed to, and I still feel that they are more valuable organized that way:

Is there a way to do a "hard" reset in software? -- seems like something someone might also be curious about
What's the right way to recover from a failed client connection? -- again, this seems like a potentially interesting topic to others, but different from the above
Help with WatchDog? -- this thread uncovered a useful nugget, at least from my perspective. Maybe it's common knowledge elsewhere that you can only set the WDT mode once, but it was news to me.

Certainly from *my* perspective, they are all part of the longer story of the development of my particular application, but I personally don't think that a single long thread per person, documenting all the issues they ran into while building their particular device, is better from the perspective of someone browsing the forum.

I have coded the Pin 106 wifi reset. I made the solder jumper mod. My code calls my reset function when it detects a network error. My problem is, the code can't get to the wifi reset function when it is hung. I was certainly hoping that the wifi reset would be a viable solution, but I am no longer getting the timeouts I was. I now get hung, and often it's just very early in setup when waiting for wifi -- it appears to get hung in wifi.ready. Here's my setup function:

Code: [Select]
#define WiFiResetPin 106
#define LEDPin 13

void setup ( )
{
#if defined DEBUG
  WDT_Disable(WDT);
#endif
  unsigned long wdp_ms = 2048;
  unsigned long    wst;
  pinMode ( WiFiResetPin, OUTPUT );
  pinMode ( LEDPin, OUTPUT );
  digitalWrite ( WiFiResetPin, HIGH );
  digitalWrite ( LEDPin, LOW );
  ResetWiFi ( );
  DebugOutLn ( "Starting setup..." );
  WDT_Enable( WDT, 0x2000 | wdp_ms | ( wdp_ms << 16 ));
  wst = millis ( );
  do
  {
    if ( millis ( ) - wst > 5000 )
    {
      DebugOutLn ( "Error connecting to WiFi network" );
      ResetWiFi ( );
      ImStillAlive ( );
      wst = millis ( );
    }
    delay ( 10 );
  } while ( wifi.ready ( ) != 1 );
 
  DebugOutLn ( "Connected to wifi!" );
}

And here's ResetWiFi:
Code: [Select]
void ResetWiFi ( void )
{
  int i;
  // requires SJ4 to be soldered closed (non-default) and SJ3 to be cut open (default)
  DebugOutLn ( "* * * * Resetting WiFi module! * * * *" );
  digitalWrite ( WiFiResetPin, LOW );
  delay ( 15 );
  digitalWrite ( WiFiResetPin, HIGH );
  wifi.begin ( 9600 );
  for ( i=0; i<100; i++ )
  {
    ImStillAlive ( );
    delay ( 20 );
  }
}
And ImStillAlive:
Code: [Select]
void ImStillAlive ( )
{
  static bool ledon=false;
  WDT_Restart ( WDT );
  ledon = !ledon;
  digitalWrite ( LEDPin, ledon ? HIGH : LOW );
}

I could explain what my code is doing later on in the loop function, but I rarely get past setup. I will see the "Starting Setup" string, but only occasionally "Connected to wifi" and never "Error connecting to WiFi network" so I'm inclined to believe that it's hung in a call to wifi.ready. In general terms, it's a client application, retrieving a web page. There are 2 different sites, and 3 different URLs altogether, but I only retrieve one at a time. There should only be a single connection open, and if I get an error on a connection I would call the reset function.

And my real concern here is that this part used to work pretty reliably. It used to occasionally fail to connect. And as I noted elsewhere, moving to an external power supply didn't help (it is always on external power now, and I've tried a couple of different adapters).

So I would consider trying Erik's suggestion of chunking, if it can fit with the library I'm using, but nobody has followed up to give me a pointer to what sample that is. I follow the github repo, but none of the examples use the word "chunked" in their name or description. And more importantly, I'm not getting past setup 90% of the time or more.

JeffRand

  • Newbie
  • *
  • Posts: 44
Re: Hardware deteriorating? Or what?!
« Reply #4 on: March 14, 2014, 08:18:57 am »
For what it is worth, I agree that the call to wifi.Ready() is getting hung. During the testing I was doing, regarding the watch dog timer, it would occasionally hang when my Wifi router was turned off and I was in a loop polling wifi.ready(). I noticed it would mostly happen when I had the router off and then turned it on. So, I could see this happening if the WiFi signal was weak or had some interference.

I haven't looked at the library for DigiFi yet, but I suspect there is something that is either getting caught in an infinite loop or has a very long time out. If I get a chance this weekend, I'll take a look and see if I can see anything obvious. 

gogol

  • Sr. Member
  • ****
  • Posts: 398
Re: Hardware deteriorating? Or what?!
« Reply #5 on: March 14, 2014, 09:45:55 am »
Unfortunately the example is still incomplete and can't be compiled.

There are includes missing, the function DebugOutln is missing, wifi initialization is missing.

The following is a stripped down example of one of my programs, which I adapted to your flow, but still without any reset/watchdog etc.

That is working without any problems!

Start from there:

Code: [Select]
#include <DigiFi.h>

DigiFi wifi;

void setup ( )
{
  wifi.setDebug(true);
  wifi.begin(9600,false);
 
  Serial.begin(9600);
  while(!Serial.available()){
    Serial.println("Enter any key to begin");
    delay(1000);
  }
  Serial.println("Welcome to the digiX!");
  unsigned long wdp_ms = 2048;
  unsigned long    wst;

  Serial.println ( "Starting setup..." );
  wst = millis ();
  do { 
    Serial.println ( "wait for wifi-ready ..." );
    if ( (millis () - wst) > 5000 ) {
      Serial.println ( "Error connecting to WiFi network" );
      wst = millis ();
    }
    delay ( 500 );
  } while ( wifi.ready() != 1 );
 
  Serial.println ( "Connected to wifi!" );
}







void loop() {
  Serial.println ( "inside LOOP" );
  delay(1000);

}


its a good practice, that you strip down your code to something, which can at least compile without any error, before asking for support.

« Last Edit: March 14, 2014, 09:47:28 am by gogol »

tastewar

  • Jr. Member
  • **
  • Posts: 50
Re: Hardware deteriorating? Or what?!
« Reply #6 on: March 14, 2014, 12:51:14 pm »
OK, this has all been helpful.

I've been taking gogol's example code and mine and whittling down the differences. Mine was failing, and after adding the wifi debug output, I was seeing:

Code: [Select]
Enter any key to begin
Enter any key to begin
* * * * Resetting WiFi module! * * * *
Starting setup...
start at mode
next
wait for a

In gogol's, the next output was "clear buffer"

Turns out, it's the wifi reset function that's causing a problem. When it was first mentioned as a potential solution, I asked for some model code, but none was ever supplied.

The code I'm using is posted in this thread -- does anyone know what important step I'm missing? I can say empirically that putting a 50 ms delay after resetting the pin HIGH didn't help, but adding a 500 ms delay has worked twice now.

Is there model code somewhere for a proper wifi reset?

JeffRand

  • Newbie
  • *
  • Posts: 44
Re: Hardware deteriorating? Or what?!
« Reply #7 on: March 14, 2014, 05:56:17 pm »
Now I can see why the hangs are occurring. The "wait for a" message is right when the startATMode routine sends +++ to the modem. It then goes into a loop waiting for the response. If it doesn't get a response from the modem, it will loop forever. Here's the code from the DigiFi library:
Code: [Select]
    Serial1.write("+++");
    debug("wait for a");
    while(!Serial1.available()){delay(1);}

The while loop should have some kind of time limit. If it hits the limit, the +++ should be sent again. If the modem still doesn't respond, it should fail and pass the failure back to the ready() routine so that it can report that the wifi is not ready. I'll see if I can come up with some code to fix the hang condition.
« Last Edit: March 14, 2014, 05:58:30 pm by JeffRand »

tastewar

  • Jr. Member
  • **
  • Posts: 50
Re: Hardware deteriorating? Or what?!
« Reply #8 on: March 14, 2014, 06:06:18 pm »
Brilliant! Thank you, Jeff. Empirically, waiting appears to be a workable solution, but I know when I'm looking at other people's code (I am a "regular" programmer by day), whenever I see code that requires "Sleep(x)" to make it work, someone doesn't *really* understand the problem. Another guy I worked with had the habit of doubling buffer sizes if he ran into problems that were clearly buffer overflow problems. Didn't fix the problem; usually reduced the duty cycle pretty dramatically though.

digistump

  • Administrator
  • Hero Member
  • *****
  • Posts: 1465
Re: Hardware deteriorating? No -- wifi reset wasn't working.
« Reply #9 on: March 15, 2014, 04:04:27 am »
I just don't get it. It's great to have enthusiastic customers here, but I am really surprised that there aren't more helpful posts by the creators (is it just Erik, or are there more?).

Just me officially - and lots of great people who support me and Digistump. I apologize that my posts have been more sparse lately - generally I respond to about 99% of posts that someone else doesn't get to first - that's about 800 (hopefully helpful) responses in the last year. I try to respond daily and usually do but sometimes life, my day job, or the next product gets in the way of that - I usually then catch up on weekends which I am doing now.


I love the idea of the DigiX, and I continue to believe in its potential, but I think there has to be more of an effort on the support front for it to catch on.

I'm glad you like it!

Your newest round of questions is all under 4 days old - so it was simply a matter of not getting to them, or anyone else's yet in the second half of the week. This is pretty on par with the other OSHW companies - as gogol points out - we are all driven by community, and even though the big OSHW companies have large support staffs compared to us, last I checked several days is par for general help on new products even for them. And just like them our libraries aren't perfect from the start - we need people like you to find bugs like you have, and then fix them, help us fix them, or have others fix them and contribute the fixes (via github) so that everyone can benefit. I'm happy to help - and sorry you got frustrated - but also want to make sure expectations are clear. I don't think any OSHW electronics company is advertising turn key solutions - if there is any doubt about that check out how long the average issue is open on the Arduino issue trackers. If you have something that needs professional attention and immediate turnaround I offer contracted support as well.

We also have a support@digistump.com email - granted if you email us there with a general troubleshooting question we'll ask you to go to the forums, and then watch for your post so we can help and have the community help too - but especially when you felt the device was malfunctioning that would have been the quickest way to contact us, we would have troubleshooted with you, and then if it was a hardware issue, replaced the board. We don't feature it prominently though - because then we'd never have time to participate here.

Empirically, waiting appears to be a workable solution, but I know when I'm looking at other people's code (I am a "regular" programmer by day), whenever I see code that requires "Sleep(x)" to make it work, someone doesn't *really* understand the problem.

I'm a web developer (well CTO but always a developer still too) by day and I'd agree with that statement in that context - but in hardware it certainly doesn't mean the problem isn't understood. A GPS unit has a fixed hot or cold start time - you might have to delay (or do a non-blocking delay) for that time - until it is ready. WiFi is similar. The endless loop is bad programming, yes it should time out - but there are plenty of reasons to have delays in embedded code - often the datasheets call for it directly.

I don't mind if you open a bunch of threads as long as they are distinct items (which they arguably were) and you follow up on all of them (you have - thank you!) - but I will say that I tend to answer one question for each customer before I answer a second question for the same person - I imagine many in the community do that too. I do like having little threads with solutions that people can find easily - so in that sense it is a good thing.

Now in another thread I've requested you test the chunked code and I've updated the github with it. That might get towards the root of the issue for it crashing in the first place.

Regarding the "wait for a" issue - please try replacing
Code: [Select]
Serial1.write("+++");
with

Code: [Select]
Serial1.write("+");
delay(1);
Serial1.write("+");
delay(1);
Serial1.write("+");
delay(1);

I felt that was more reliable in my original testing but my early beta testers found it unnecessary - yep there it is - the delay again - checkout some of the really popular libraries for Arduino - you'll find tons of delays.

I hope this all helps - both to understand what to expect, and to help with the actual issues at hand.

Thanks for joining our community - I look forward to knocking this issue out with your help (or vice-a-versa).
-Erik

Thanks a million JeffRand and gogol for all the help you give around here!
 


tastewar

  • Jr. Member
  • **
  • Posts: 50
Re: Hardware deteriorating? No -- wifi reset wasn't working.
« Reply #10 on: March 15, 2014, 06:42:12 am »
Erik- thank you for the thoughtful reply. As my rant did state, I was quite frustrated at the time. I apologize for the tone.

Quote
The endless loop is bad programming, yes it should time out - but there are plenty of reasons to have delays in embedded code - often the datasheets call for it directly.

That was all I meant -- the fact that currently, I have to put a delay in my code otherwise I am subject to the library hanging. I've got nothing against those kinds of delays; I would only state that the purpose of a hardware support library is to hide all those technical details from the higher-level programmer. It's not *always* 100% achievable, but it's a goal. And I don't mind spinning in a loop either! Something again very different from "computer" programming, but perfectly appropriate here.

So having worked around the bug with resetting that causes my device to lock up, it hasn't locked up since. However, it's definitely not reliably connecting and retrieving data. I will look at the "chunked" example and see if it can be adapted to my code.

Thanks again, Erik, and I wholeheartedly agree that JeffRand and gogol are to be commended for their contributions here!

JeffRand

  • Newbie
  • *
  • Posts: 44
Re: Hardware deteriorating? No -- wifi reset wasn't working.
« Reply #11 on: March 15, 2014, 10:05:41 pm »
Okay, I have been able to fix the problem with the program hanging and getting stuck waiting for the module/modem to respond. In the startATMode function, within the DigiFi.cpp file, I changed this code:
Code: [Select]
Serial1.write("+++");
debug("wait for a");
while(!Serial1.available()){delay(1);}
to this:
Code: [Select]
int cnt=0;
while(!Serial1.available()){
  if (cnt--<0) {
    Serial1.write("+++");
    debug("wait for a");
    cnt=250;
  }
  delay(1);
}
 
I'm no longer getting hangs. Now when I'm testing it against Wifi interference, I see this in debug now:
Code: [Select]
start at mode
next
wait for a
wait for a
wait for a
wait for a
wait for a
wait for a
wait for a
wait for a
wait for a
wait for a
wait for a
wait for a
wait for a
wait for a
clear buffer
+ok




Edit: I may have spoken too soon. I'm finding another spot that seems to cause other hangs. I'm pretty sure it is occurring in the readResponse function. I'm going to dig into it a little more and see if I can eliminate the issue there as well.
« Last Edit: March 15, 2014, 11:36:12 pm by JeffRand »

tastewar

  • Jr. Member
  • **
  • Posts: 50
Re: Hardware deteriorating? No -- wifi reset wasn't working.
« Reply #12 on: March 20, 2014, 04:31:02 pm »
Jeff and/or Erik- has there been any further progress on this? I am using a watchdog timer set for 8 seconds, and I now keep track of uptime and last reset reason in my device for display on demand, and I rarely see uptime of over an hour. So my code is pretty clearly getting stuck in some wifi function. It seems like Jeff is close to an understanding. Will his fixes get pulled into the library on Github? Does Erik agree with the recommended changes?

JeffRand

  • Newbie
  • *
  • Posts: 44
Re: Hardware deteriorating? No -- wifi reset wasn't working.
« Reply #13 on: March 20, 2014, 05:27:11 pm »
Sorry for the delay. I've been too busy with my day job lately. I should be able to look at the readResponse function this weekend. I'm not too familiar with Github, but I'll see if I can figure out how to submit a request to change the code to incorporate these fixes.

tastewar

  • Jr. Member
  • **
  • Posts: 50
Re: Hardware deteriorating? No -- wifi reset wasn't working.
« Reply #14 on: March 20, 2014, 06:29:53 pm »
No need to apologize, Jeff. Thanks for the update!