Digistump Forums

The Oak by Digistump => Oak Support => Topic started by: lawrie on March 22, 2016, 08:57:05 am

Title: Repeated failures to do first update
Post by: lawrie on March 22, 2016, 08:57:05 am
I am trying to update my two Oaks. I have had them a while, but waited until the software appeared to be more stable.

After repeated failures, I looked at the troubleshooting guide and downloaded the Windows software for local updates. The Oak connects successfully and starts the process, but it then fails the firmware update after a random number of bytes have been written. It varies from about 200,000 to 700,000 bytes. The same thing happens on both my devices. Here is the log of the 21st attempt:

2016-03-22 15:21:36 New connection from: 192.168.0.34
2016-03-22 15:21:36 Starting firmware transfer to: 192.168.0.34
2016-03-22 15:22:08 Connection lost to: 192.168.0.34
2016-03-22 15:22:08 Firmware request finished for 192.168.0.34 (Reason: [Failure instance: Traceback (failure with no frames): <class 'twisted.internet.error.ConnectionLost'>: Connection to the other side was lost in a non-clean fashion: Connection lost.
])
2016-03-22 15:22:08 Early termination to: 192.168.0.34 (321536 bytes written, fail count = 21)
2016-03-22 15:22:08 Finishing firmware transfer to: 192.168.0.34 (22 transfers done)

I also tried the second solution from the troubleshooting guide - setting up an unencrypted guest access point on my router. That failed the same way.

Is there anything else I can try to get these devices updated?
Title: Re: Repeated failures to do first update
Post by: emardee on March 22, 2016, 12:53:32 pm
Faster wifi seems to have problems for this initial update.

If you can force your wifi back to "g" or even "b only" it will have a better chance of success.

Mine worked best on an old router which I have hung off the inside of my network (eg WAN port of old router on a LAN port of my fancy router). That worked without any changing of settings.
Title: Re: Repeated failures to do first update
Post by: defragster on March 22, 2016, 12:56:04 pm
I had perfect success with my Local server on 7 oaks - with a little extra grief on two of the units. All registered and updated (one was already registered BETA)

Is your computer running the server on a WIRED connection?  If wireless there may be unnecessary Radio air traffic fights.  I used Windows 10 from a CMD PROMPT run as ADMIN with my laptop on wire to wireless router.

Not sure if my steps show I did it differently: http://digistump.com/board/index.php/topic,2103.msg9700.html#msg9700 (http://digistump.com/board/index.php/topic,2103.msg9700.html#msg9700)

I changed no router settings - including required login or default/optimal speed.
Title: Re: Repeated failures to do first update
Post by: emardee on March 22, 2016, 02:00:41 pm
I want to make clear that these slower wifi speeds are only needed to get the initial firmware loaded. Once that is achieved the normal wifi settings can be returned, and the device has no problems with "n" speeds etc.
Title: Re: Repeated failures to do first update
Post by: lawrie on March 22, 2016, 02:05:23 pm
I reduced the speed on my router to the lowest it supported, but I still had no luck.
Title: Re: Repeated failures to do first update
Post by: lawrie on March 22, 2016, 02:25:08 pm
defragster: I could try a wired connection as I have tried most other things, but if too much speed is the problem, that would only make things worse. I did not need to run the local server as admin - it worked running it normally by just double-clicking on the exe file. I don't see how changing to admin would help, but again, I will probably try it. I have been running the local server on Windows 10 and connecting to it from Linux (Ubuntu 14.04).

I expect I am going to have to use a USB to serial connection to get the version of the setup firmware with diagnostics. I have used several different ESP8266 boards successfully, but several of them have problems updating firmware.
Title: Re: Repeated failures to do first update
Post by: emardee on March 22, 2016, 04:32:51 pm
As I understand it (and admittedly I might have understood it wrong!), the issue with wifi isn't speed as such, but collisions on the wifi. A slower wifi connection (b or g) will help with fewer collisions, as will having no other devices on the wifi at the same time.

The speed issue is a separate problem, and this can be fixed by serving the data from a specially tuned slower server.... (either the slower settings offered by the config page, or a local server). However, if your problem is wifi collisions, then a local server may or may not help.

I'm sure someone who understands it better will be able to explain better than me.

Suffice to say, the closer you can get to the "ideal" upload environment as suggested in the troubleshooting, the more chance you have of it working.

If you can find or borrow an old router with a ethernet WAN port, I would suggest trying this by connecting its WAN port to one of your main router's internal LAN ports. It means you can try an older wifi chipset without disturbing your internet connection or you local network. If you also keep the old router's wifi as being only connected to by the OAK, then there should be no wifi collisions.

Title: Re: Repeated failures to do first update
Post by: defragster on March 22, 2016, 07:50:53 pm
I reduced the speed on my router to the lowest it supported, but I still had no luck.

Any reduction in radio traffic can only help as that will add undesirable waits.  In my case I'm far enough from my router that WiFi power is 75% or less, but I did not purposefully do anything to slow my WiFi speeds.

As noted I had 6 - two problem children failed after the first succeeded when I noted I had dropped my LAN cable - meaning my laptop in the same room was fighting for radio time.  I plugged the cable and in the end I got all 6 (plus the redo of my first BETA unit). { restart server on that change as the IP will change when WiFi drops } { Also open Task Manager (or reboot) to be sure there is no remnant of the server app holding ports open if you see a failure on starting server }

I was closer when I tried the original updates (direct to the WEB) and two went 3blink and 4 made all signs of starting the download - non-registered.

If you can arrange the Local server update with the server wired to router I think you'll have the best chance.

I noted 'start CMD Prompt as Admin' - doing that removes any chance of issues you won't see.  May not be required - but one less unknown is worth the trivial extra step.
Title: Re: Repeated failures to do first update
Post by: driffster on March 22, 2016, 08:47:19 pm
If they fail after more or less random times you can just keep trying, (in my opinion going to slower mode just slows the chance to get the update working, so I would avoid that mode).

I can't offer more help on getting a better wifi, but it you have an TTL or FTDI cable is is possible to flash them directly to the latest version, at that point it only needs to connect to the particle server to register. Since it looks like your Oaks are connecting properly (enough to start download) you should not have problems after.

More info can be found here (update is mentioned at the end of the page):

https://github.com/digistump/OakRestore

Title: Re: Repeated failures to do first update
Post by: emardee on March 22, 2016, 09:41:44 pm
I just had a quick hunt in the github issue that discussed this, (as to be honest the actual reasons were vague in my memory). This is the thread that (https://github.com/digistump/OakCore/issues/54) discussed all the issues with the factory-loaded pre-firmware and the problems that has caused with getting the first proper firmware loaded. These two extracts are probably most pertinent, but there is HEAPS of bedtime reading in that thread if you want to know all the ins and outs. I had nothing to do with these discussions or investigations, but just happened to read some of it:

Quote from: jldeon

I've been digging into a dozen or so packet captures to try and figure out the SOCKET READ TIMEOUT issue. It seems like the TCP stack on the ESP is not particularly great at handling packet loss. Every time it fails, I see a pattern of packets dropped and multiple retransmissions of old packets and ACK packets going back and forth. Eventually there hasn't been valid data in long enough that the Oak gives up on the connection.

<snip>

You'll want to do everything you can to ensure a good connection to the internet, as packet loss tends to be fatal. I suggest dropping your router to B only (on the 2.4GHz band) if you can.

<snip>

I went from constant SOCKET TIMEOUT errors (nearly 100%) to only very occasional errors with this setup.
Extract from this post on github (https://github.com/digistump/OakCore/issues/54#issuecomment-191968505)


Quote from: jldeon

Error Analysis

<snip>

SOCKET TIMEOUT

This appears to be some sort of issue in the TCP stack. When packet loss occurs, the ESP8266 doesn't recover well. It seems to cause a lot of extra retransmitted ACK packets, which confuse most standards-compliant TCP servers. They exhaust their retransmissions and sort of give up on trying to figure out what's up.

I can't put my finger on precisely where the bug is, but given the other bugs (the stack dump that I talked about in a previous post and that fri-sch posted about) this might be down in the Espressif WiFi driver. It's also possible it's in the software TCP stack, but that's lwIP and should be pretty stable. I tried modifying the firmware with a ridiculously long timeout (100 seconds) and upping the retransmit count on the server side (to something like 20) and that made the problem a bit better but didn't fix it.

<snip>

Worked around this one in my server code by transmitting more slowly and modifying the server's socket and TCP parameters to try to give the Oak the best chance of surviving the download.

On the client side, anything you can do to reduce or eliminate possible causes of packet loss that would start retransmission helps. You can help the local network side by doing things like change your wifi to B only, get a stronger signal, go to a different, less crowded channel, turn off other wifi devices, etc. On the server side, hosting your own LAN-based server may help avoid internet-related packet loss.
<snip>
Extract of this post on github (https://github.com/digistump/OakCore/issues/54#issuecomment-193259142).

Certainly b only or g only seems to make a difference, but so does not having other devices connected to wifi whilst loading first firmware.

Mine were run at g only speed (as the sole device connected to that wifi access point), and there was plenty of time for the firmware to download and install without issue (took about 45 secs on my network and setup), but when running at n speeds, it repeatedly failed to flash.
Title: Re: Repeated failures to do first update
Post by: defragster on March 22, 2016, 11:38:12 pm
For general 'sketch' OTA installs - I've done many dozens to my Generic_ESP's with no problem - even having multiples active.  Once they are running stable code they have the hardware needed to make it work - especially these newer models with a good antenna like the OAK.  That is direct from the local Arduino host however - and having the luxury of a broader test set of time and users to get where it is.

OAK factory software transition to the updated particle based - with added overhead - is more specific and probably more prone to abort/discard in case of trouble rather than any chance of bricking?
Title: Re: Repeated failures to do first update
Post by: lawrie on March 23, 2016, 03:15:45 am
I flashed the latest setup firmware with a USB to TTL connector and this is the diagnostics I get:

Code: [Select]
OakBoot v1 - N,BP,2

START UPDATE ROM
WIFI
WIFI CONNECT
GO TO UPDATE
START UPDATE
HOST LOOKUP OK
PARSING HTTP HEADER
HTTP/1.1 200 OK

FILE LENGTH: 778096

START WRITING UPDATE - NO OUTPUT SHOULD BE EXPECTED FOR UP TO 120 SECONDS
./+./+./+./+./+./+./+./+./+./+./+./+./+./+./+./+./+./+./+./+./+./+./+./+./+./+./+
 ets Jan  8 2013,rst cause:4, boot mode:(3,6)

wdt reset
load 0x40100000, len 3632, room 16
tail 0
chksum 0xc0
load 0x3ffe8000, len 352, room 8
tail 8
chksum 0x82
csum 0x82

OakBoot v1 - H,BU,0

Anyone know what the rst cause 4 means?
Title: Re: Repeated failures to do first update
Post by: lawrie on March 23, 2016, 04:03:08 am
@defragster: I tried the wired connection with the local server, but that made no difference.

@driffster: I tried about 40 times, but it was taking so long, I gave up. I took your advice and flashed the v1 update firmware with the USB to TTL connector, and I now have my first Oak connected to the particle cloud.
Title: Re: Repeated failures to do first update
Post by: mspohr on March 23, 2016, 09:35:13 am
I'm stuck here also.
I've tried the standard method (with retries and the slow server) about 25 times with no success.
I've tried the oakupsrv method but for some reason it won't run on my OSX system (odd command not found message).
I've tried setting up a separate WiFi "B" access point with speed limited to 1 Meg and no encryption and this also has repeated failures.
No success.
Any suggestions?
Title: Re: Repeated failures to do first update
Post by: dougal on March 23, 2016, 11:48:40 am
@mspohr: Try
Code: [Select]
sudo ./oakupsrv
If you get errors, try downloading the latest OakUpdateTool source from GitHub. If you still get errors, try checking out the pull request I posted there.
Title: Re: Repeated failures to do first update
Post by: nelsonsilvafilho on March 23, 2016, 12:28:06 pm
try: sudo ./oakupsrv
Title: Re: Repeated failures to do first update
Post by: defragster on March 23, 2016, 06:05:59 pm
@defragster: I tried the wired connection with the local server, but that made no difference.

That is unfortunate - I was hoping to see it be the easy way forward.  For me it was much better LOCAL than WWW would have been after they all failed once.

My BETA and 4 units went easy.  There were two I had to restart the server and try again - I was ready to give up on them and send back to Digistump - even sent the email - then setup and did the same thing after moving my USB cable holding the OAK to have a bit better line to my router.  I think it took the pin 1 to GND to actually wake the one unit.

I have Generic ESP-12E and some Tindie custom ESP units I have programmed by wire - in fact I mount then through a Teensy_3.2 and made a Proxy sketch that feeds the USB out serially to the attached units as needed to get them up to speed on Arduino_OTA that works really well.  The OTA from Arduino on sketch 310KB is under 10 seconds versus 36 seconds for the serial wired upload.
Title: Re: Repeated failures to do first update
Post by: gspadari on March 24, 2016, 05:23:28 pm
I've notice that Oak can only connect to Legacy wireless mode and not N mode. Also, if you use a 64 hex as the key, it doesn't work. All tests I've done, were using WPA2-PSK on a OpenWRT router (DLink DIR-600 and TP-Link TL-WA850RE v1)

Also pay attention on https://dashboard.particle.io/ because you can see the "Last Connection" column to notice that the Oak were connected sometime. Also, if the ID dot is blue, means it's connected right now. Grey means it's disconnected.
Title: Re: Repeated failures to do first update
Post by: ScottM on March 26, 2016, 05:14:06 pm
I bought a 10 pack of Oaks on Kickstarter and so far, I have tried to get three of them up and running with no success. This is so very disappointing. They seem to fail at the point after I have selected the wifi connection I want them to connect to and the message on screen says "Saving settings to your device". I am in the basement and my Bell Fibe (Sagecom) router is 2 floors above me. If that isn't good enough, I also have a Dlink wifi extender in the basement next to me as an alternate access point. I've tried using both to no avail.

 
Title: Re: Repeated failures to do first update
Post by: defragster on March 26, 2016, 05:23:21 pm
@ScottM - I got my six - all failed initial update and registration - I resorted to the LOCAL server update method and had very little grief on those 6 (except 2) and also my 7th BETA unit that was on old code.  Since you have 10 to do I can only hope you can try this and have the luck - and faster updates - that I did. 

http://digistump.com/board/index.php/topic,2103.msg9700.html#msg9700 (http://digistump.com/board/index.php/topic,2103.msg9700.html#msg9700)
Title: Re: Repeated failures to do first update
Post by: emardee on March 26, 2016, 06:13:29 pm
I think there are two separate places for connection problems to be with .... with the internet connection... or with the wifi connection.

The local server will only mainly solve issues with the internet connection. However if the problem is with the wifi part of your connection, then the local server probably won't help you.

Therefore if you have definite fringe setup with your wifi, you might be best to optimise the wifi setup before investigating local server, as this is more likely to solve your particular problem.

However, if you've already got the best wifi setup you can achieve, it can't harm to try a local server next anyway.... and I'm sure that many cases both elements (wifi and internet) are contributing to the problem, so local server might allow you to get the best out of a fringe wifi setup etc.
Title: Re: Repeated failures to do first update
Post by: defragster on March 26, 2016, 06:35:37 pm
Indeed - this isn't understood to a single point of failure.  Perhaps I was lucky the LOCAL worked for me - I did 7 OAKS - but only one case.

I didn't have to edit or modify my WiFi behavior (though my initial WWW failures had the OAKS much closer to the same router/SSID).  But going local (if it works as it did for me) - as long as the server is wired - should eliminate one whole source of server supply problems without adding any new ones. It should also run more deterministically with local server bits uploading - and that might allow the WiFi to be better fed.

Due to my location - I know my BroadBand is suspect (won't run an AT&T microcell) - it is a WiFi/microwave 13 mile link for 4Mbit/sec service - that is oddly throttled. But after the OAK initial Update I had no problems doing Particle sketch uploads from Arduino - except one unit I had to re-flash the Server Update before it worked - and then it worked right. I also know my house is WiFi noisy & opaque and I have a box of routers that failed to be usable in 6-12 months.
Title: Re: Repeated failures to do first update
Post by: emardee on March 26, 2016, 06:55:34 pm
yeah, yours sounds like it was more of an internet problem, so local server was definitely the right choice for you. ScottM sounds like he has a less than ideal wifi setup, so I'd recommend starting there with making changes.

Scott, can you take the oak upstairs to update it? or even moving to other locations in the basement might help you get a stronger signal?

Or could you run the firmware update at a friend's house?

Or Some people found updating via mobile phone acting as a hotspot for their mobile data worked for them.

Basically there are plenty of tricks you can try yet, don't give up, I'm sure you get there. And once the firmware is loaded, typically you'll have no more wifi issues... so just get it in by whatever means works best.

Good luck.
Title: Re: Repeated failures to do first update
Post by: ScottM on April 01, 2016, 11:06:44 am
I have tried a number of tricks but still no success. I tried the local method, tried using my phone as a wifi hotspot and tried bring the OAKS to work to use my wifi network there. Still no success. I think I will try loading the update over serial. Am I correct that I can use my 3.3V FTDI cable? If so, I will try it this weekend.
Title: Re: Repeated failures to do first update
Post by: driffster on April 02, 2016, 10:30:05 am
I have tried a number of tricks but still no success. I tried the local method, tried using my phone as a wifi hotspot and tried bring the OAKS to work to use my wifi network there. Still no success. I think I will try loading the update over serial. Am I correct that I can use my 3.3V FTDI cable? If so, I will try it this weekend.

Yes that will work, good luck!
Title: Re: Repeated failures to do first update
Post by: ScottM on April 03, 2016, 01:31:40 pm
I installed Python but can't seem to get PySerial to install. I downloaded the Windows installer (I have Windows 7) and I get the an error message "The program can't start because asi-ms-win-crt-string-|1-1-0.dll is missing from your computer. Try reinstalling the program to fix this problem".  I tried installing as administrator but that didn't work either.
Title: Re: Repeated failures to do first update
Post by: PeterF on April 03, 2016, 09:14:02 pm
Try (re)installing the Microsoft Visual C++ Redistributable 2015 (https://www.microsoft.com/en-us/download/details.aspx?id=48145) - it seems that file may be related to that.
Title: Re: Repeated failures to do first update
Post by: defragster on April 05, 2016, 03:40:56 pm
I put Python 2.7 on my Windows10 machine and didn't have serial and couldn't find an installer to work.

I found this and I put a note in the Wiki somewhere, but I don't find it? Just notes for doing it on Linux.

Looking again at the web I think this works from the command line:
Quote
C:\Users\defragster>python
Python 2.7.11 (v2.7.11:6d1b6a68f775, Dec  5 2015, 20:40:30) [MSC v.1500 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import serial
>>>