Author Topic: Repeated failures to do first update  (Read 7573 times)

lawrie

  • Newbie
  • *
  • Posts: 6
Repeated failures to do first update
« on: March 22, 2016, 08:57:05 am »
I am trying to update my two Oaks. I have had them a while, but waited until the software appeared to be more stable.

After repeated failures, I looked at the troubleshooting guide and downloaded the Windows software for local updates. The Oak connects successfully and starts the process, but it then fails the firmware update after a random number of bytes have been written. It varies from about 200,000 to 700,000 bytes. The same thing happens on both my devices. Here is the log of the 21st attempt:

2016-03-22 15:21:36 New connection from: 192.168.0.34
2016-03-22 15:21:36 Starting firmware transfer to: 192.168.0.34
2016-03-22 15:22:08 Connection lost to: 192.168.0.34
2016-03-22 15:22:08 Firmware request finished for 192.168.0.34 (Reason: [Failure instance: Traceback (failure with no frames): <class 'twisted.internet.error.ConnectionLost'>: Connection to the other side was lost in a non-clean fashion: Connection lost.
])
2016-03-22 15:22:08 Early termination to: 192.168.0.34 (321536 bytes written, fail count = 21)
2016-03-22 15:22:08 Finishing firmware transfer to: 192.168.0.34 (22 transfers done)

I also tried the second solution from the troubleshooting guide - setting up an unencrypted guest access point on my router. That failed the same way.

Is there anything else I can try to get these devices updated?
« Last Edit: March 22, 2016, 11:36:31 am by lawrie »

emardee

  • Full Member
  • ***
  • Posts: 135
Re: Repeated failures to do first update
« Reply #1 on: March 22, 2016, 12:53:32 pm »
Faster wifi seems to have problems for this initial update.

If you can force your wifi back to "g" or even "b only" it will have a better chance of success.

Mine worked best on an old router which I have hung off the inside of my network (eg WAN port of old router on a LAN port of my fancy router). That worked without any changing of settings.

defragster

  • Sr. Member
  • ****
  • Posts: 467
Re: Repeated failures to do first update
« Reply #2 on: March 22, 2016, 12:56:04 pm »
I had perfect success with my Local server on 7 oaks - with a little extra grief on two of the units. All registered and updated (one was already registered BETA)

Is your computer running the server on a WIRED connection?  If wireless there may be unnecessary Radio air traffic fights.  I used Windows 10 from a CMD PROMPT run as ADMIN with my laptop on wire to wireless router.

Not sure if my steps show I did it differently: http://digistump.com/board/index.php/topic,2103.msg9700.html#msg9700

I changed no router settings - including required login or default/optimal speed.

emardee

  • Full Member
  • ***
  • Posts: 135
Re: Repeated failures to do first update
« Reply #3 on: March 22, 2016, 02:00:41 pm »
I want to make clear that these slower wifi speeds are only needed to get the initial firmware loaded. Once that is achieved the normal wifi settings can be returned, and the device has no problems with "n" speeds etc.

lawrie

  • Newbie
  • *
  • Posts: 6
Re: Repeated failures to do first update
« Reply #4 on: March 22, 2016, 02:05:23 pm »
I reduced the speed on my router to the lowest it supported, but I still had no luck.

lawrie

  • Newbie
  • *
  • Posts: 6
Re: Repeated failures to do first update
« Reply #5 on: March 22, 2016, 02:25:08 pm »
defragster: I could try a wired connection as I have tried most other things, but if too much speed is the problem, that would only make things worse. I did not need to run the local server as admin - it worked running it normally by just double-clicking on the exe file. I don't see how changing to admin would help, but again, I will probably try it. I have been running the local server on Windows 10 and connecting to it from Linux (Ubuntu 14.04).

I expect I am going to have to use a USB to serial connection to get the version of the setup firmware with diagnostics. I have used several different ESP8266 boards successfully, but several of them have problems updating firmware.

emardee

  • Full Member
  • ***
  • Posts: 135
Re: Repeated failures to do first update
« Reply #6 on: March 22, 2016, 04:32:51 pm »
As I understand it (and admittedly I might have understood it wrong!), the issue with wifi isn't speed as such, but collisions on the wifi. A slower wifi connection (b or g) will help with fewer collisions, as will having no other devices on the wifi at the same time.

The speed issue is a separate problem, and this can be fixed by serving the data from a specially tuned slower server.... (either the slower settings offered by the config page, or a local server). However, if your problem is wifi collisions, then a local server may or may not help.

I'm sure someone who understands it better will be able to explain better than me.

Suffice to say, the closer you can get to the "ideal" upload environment as suggested in the troubleshooting, the more chance you have of it working.

If you can find or borrow an old router with a ethernet WAN port, I would suggest trying this by connecting its WAN port to one of your main router's internal LAN ports. It means you can try an older wifi chipset without disturbing your internet connection or you local network. If you also keep the old router's wifi as being only connected to by the OAK, then there should be no wifi collisions.


defragster

  • Sr. Member
  • ****
  • Posts: 467
Re: Repeated failures to do first update
« Reply #7 on: March 22, 2016, 07:50:53 pm »
I reduced the speed on my router to the lowest it supported, but I still had no luck.

Any reduction in radio traffic can only help as that will add undesirable waits.  In my case I'm far enough from my router that WiFi power is 75% or less, but I did not purposefully do anything to slow my WiFi speeds.

As noted I had 6 - two problem children failed after the first succeeded when I noted I had dropped my LAN cable - meaning my laptop in the same room was fighting for radio time.  I plugged the cable and in the end I got all 6 (plus the redo of my first BETA unit). { restart server on that change as the IP will change when WiFi drops } { Also open Task Manager (or reboot) to be sure there is no remnant of the server app holding ports open if you see a failure on starting server }

I was closer when I tried the original updates (direct to the WEB) and two went 3blink and 4 made all signs of starting the download - non-registered.

If you can arrange the Local server update with the server wired to router I think you'll have the best chance.

I noted 'start CMD Prompt as Admin' - doing that removes any chance of issues you won't see.  May not be required - but one less unknown is worth the trivial extra step.
« Last Edit: March 22, 2016, 07:52:56 pm by defragster »

driffster

  • Newbie
  • *
  • Posts: 42
Re: Repeated failures to do first update
« Reply #8 on: March 22, 2016, 08:47:19 pm »
If they fail after more or less random times you can just keep trying, (in my opinion going to slower mode just slows the chance to get the update working, so I would avoid that mode).

I can't offer more help on getting a better wifi, but it you have an TTL or FTDI cable is is possible to flash them directly to the latest version, at that point it only needs to connect to the particle server to register. Since it looks like your Oaks are connecting properly (enough to start download) you should not have problems after.

More info can be found here (update is mentioned at the end of the page):

https://github.com/digistump/OakRestore


emardee

  • Full Member
  • ***
  • Posts: 135
Re: Repeated failures to do first update
« Reply #9 on: March 22, 2016, 09:41:44 pm »
I just had a quick hunt in the github issue that discussed this, (as to be honest the actual reasons were vague in my memory). This is the thread that discussed all the issues with the factory-loaded pre-firmware and the problems that has caused with getting the first proper firmware loaded. These two extracts are probably most pertinent, but there is HEAPS of bedtime reading in that thread if you want to know all the ins and outs. I had nothing to do with these discussions or investigations, but just happened to read some of it:

Quote from: jldeon

I've been digging into a dozen or so packet captures to try and figure out the SOCKET READ TIMEOUT issue. It seems like the TCP stack on the ESP is not particularly great at handling packet loss. Every time it fails, I see a pattern of packets dropped and multiple retransmissions of old packets and ACK packets going back and forth. Eventually there hasn't been valid data in long enough that the Oak gives up on the connection.

<snip>

You'll want to do everything you can to ensure a good connection to the internet, as packet loss tends to be fatal. I suggest dropping your router to B only (on the 2.4GHz band) if you can.

<snip>

I went from constant SOCKET TIMEOUT errors (nearly 100%) to only very occasional errors with this setup.
Extract from this post on github


Quote from: jldeon

Error Analysis

<snip>

SOCKET TIMEOUT

This appears to be some sort of issue in the TCP stack. When packet loss occurs, the ESP8266 doesn't recover well. It seems to cause a lot of extra retransmitted ACK packets, which confuse most standards-compliant TCP servers. They exhaust their retransmissions and sort of give up on trying to figure out what's up.

I can't put my finger on precisely where the bug is, but given the other bugs (the stack dump that I talked about in a previous post and that fri-sch posted about) this might be down in the Espressif WiFi driver. It's also possible it's in the software TCP stack, but that's lwIP and should be pretty stable. I tried modifying the firmware with a ridiculously long timeout (100 seconds) and upping the retransmit count on the server side (to something like 20) and that made the problem a bit better but didn't fix it.

<snip>

Worked around this one in my server code by transmitting more slowly and modifying the server's socket and TCP parameters to try to give the Oak the best chance of surviving the download.

On the client side, anything you can do to reduce or eliminate possible causes of packet loss that would start retransmission helps. You can help the local network side by doing things like change your wifi to B only, get a stronger signal, go to a different, less crowded channel, turn off other wifi devices, etc. On the server side, hosting your own LAN-based server may help avoid internet-related packet loss.
<snip>
Extract of this post on github.

Certainly b only or g only seems to make a difference, but so does not having other devices connected to wifi whilst loading first firmware.

Mine were run at g only speed (as the sole device connected to that wifi access point), and there was plenty of time for the firmware to download and install without issue (took about 45 secs on my network and setup), but when running at n speeds, it repeatedly failed to flash.
« Last Edit: March 22, 2016, 09:48:06 pm by emardee »

defragster

  • Sr. Member
  • ****
  • Posts: 467
Re: Repeated failures to do first update
« Reply #10 on: March 22, 2016, 11:38:12 pm »
For general 'sketch' OTA installs - I've done many dozens to my Generic_ESP's with no problem - even having multiples active.  Once they are running stable code they have the hardware needed to make it work - especially these newer models with a good antenna like the OAK.  That is direct from the local Arduino host however - and having the luxury of a broader test set of time and users to get where it is.

OAK factory software transition to the updated particle based - with added overhead - is more specific and probably more prone to abort/discard in case of trouble rather than any chance of bricking?

lawrie

  • Newbie
  • *
  • Posts: 6
Re: Repeated failures to do first update
« Reply #11 on: March 23, 2016, 03:15:45 am »
I flashed the latest setup firmware with a USB to TTL connector and this is the diagnostics I get:

Code: [Select]
OakBoot v1 - N,BP,2

START UPDATE ROM
WIFI
WIFI CONNECT
GO TO UPDATE
START UPDATE
HOST LOOKUP OK
PARSING HTTP HEADER
HTTP/1.1 200 OK

FILE LENGTH: 778096

START WRITING UPDATE - NO OUTPUT SHOULD BE EXPECTED FOR UP TO 120 SECONDS
./+./+./+./+./+./+./+./+./+./+./+./+./+./+./+./+./+./+./+./+./+./+./+./+./+./+./+
 ets Jan  8 2013,rst cause:4, boot mode:(3,6)

wdt reset
load 0x40100000, len 3632, room 16
tail 0
chksum 0xc0
load 0x3ffe8000, len 352, room 8
tail 8
chksum 0x82
csum 0x82

OakBoot v1 - H,BU,0

Anyone know what the rst cause 4 means?

lawrie

  • Newbie
  • *
  • Posts: 6
Re: Repeated failures to do first update
« Reply #12 on: March 23, 2016, 04:03:08 am »
@defragster: I tried the wired connection with the local server, but that made no difference.

@driffster: I tried about 40 times, but it was taking so long, I gave up. I took your advice and flashed the v1 update firmware with the USB to TTL connector, and I now have my first Oak connected to the particle cloud.

mspohr

  • Newbie
  • *
  • Posts: 15
Re: Repeated failures to do first update
« Reply #13 on: March 23, 2016, 09:35:13 am »
I'm stuck here also.
I've tried the standard method (with retries and the slow server) about 25 times with no success.
I've tried the oakupsrv method but for some reason it won't run on my OSX system (odd command not found message).
I've tried setting up a separate WiFi "B" access point with speed limited to 1 Meg and no encryption and this also has repeated failures.
No success.
Any suggestions?

dougal

  • Sr. Member
  • ****
  • Posts: 289
Re: Repeated failures to do first update
« Reply #14 on: March 23, 2016, 11:48:40 am »
@mspohr: Try
Code: [Select]
sudo ./oakupsrv
If you get errors, try downloading the latest OakUpdateTool source from GitHub. If you still get errors, try checking out the pull request I posted there.