DappNode frequently takes down LAN

Foxglove · 3 March 2022 18:04

As per two weeks, the node started taking down my local network every 60 minutes or so.

What I’ve tried:

Disconnecting the DappNode network cable prevents the issue, so it must be caused by the node
Disabling DappNode Wifi and connecting through Wireguard’s VPN instead.
Resetting the Node
Resetting IPFS by pausing, restarting the node, continuing IPFS.
Changing IPFS from local to remote
Changing IPFS back from remote to local

Please advise.

Core DAppNode Packages versions

bind.dnp.dappnode.eth: 0.2.6
core.dnp.dappnode.eth: 0.2.50
dappmanager.dnp.dappnode.eth: 0.2.45, commit: 7b9ad9ad
https.dnp.dappnode.eth: 0.1.2
ipfs.dnp.dappnode.eth: 0.2.15
vpn.dnp.dappnode.eth:
wifi.dnp.dappnode.eth: 0.2.8
wireguard.dnp.dappnode.eth: 0.1.1

System info

dockerComposeVersion: 1.25.5
dockerServerVersion: 20.10.6
dockerCliVersion: 20.10.6
os: debian
versionCodename: bullseye
architecture: amd64
kernel: 5.10.0-8-amd64
Disk usage: 24%

Foxglove · 7 March 2022 10:38

I would love to get some ideas. Having unreliable LAN is a dealbreaker for me.

Context that might be helpful:

I’m running a validator node with Geth, Prysm and Metric Tools. Optional arguments on Geth:

--http.api eth,net,web3,txpool --cache 2048 --maxpeers 10

sebaslogen · 8 March 2022 13:15

I also experienced this problem intermittently but for a couple of days, I just can´t use the network if the DappNode is connected, every hour or so it makes the router restart.
What I tried in the past is completely disabling IPFS (although I wouldn´t get updates on the node), changing Geth to remote, and limiting the peers to 7.

In the current state, the box is unusable.

I´ll try to use a light-Geth, disable IPFS completely, and see if it runs without crashing my router.

I´ve also found this known problem of IPFS that crashes low-end ISP routers: https://github.com/ipfs/go-ipfs/issues/3320

A few hours after I disabled IPFS, the router keeps working fine and the rest of my Dappnode box is also doing just fine.

Foxglove · 9 March 2022 12:26

Disabling IPFS did not solve the issue for me unfortunately! Thanks for the suggestion though.

edit: I noticed IPFS doesn’t respect the high water setting. It’s set to 200; IPFS still maintains a connections with 500+ peers.

annandale · 9 March 2022 18:22

We’ve been operating a dappnode ethereum beacon chain validator (geth / prysm) for over a year and it was very stable until the end of last month. beginning on feb 28, it began to throw network adapter reset errors and go offline. a few days later it happened again. beginning last friday, the adapter reset errors take the host offline least once a day. the machine doesn’t crash, but the network connection drops and doesn’t reconnect.

concurrently, we’re running a gnosis beacon chain validator on dappnode os. it had been running nicely from late december until mid february, when it began to crash every week, then almost every day, and now every few hours.

I’ve tried implementing fixes suggested on the debian forums [Disabling TSO, GSO and GRO; disabling TCP checksum, disabling aspm, etc], but have yet to find a solution that works.

annandale · 10 March 2022 13:52

all went down again within an hour of me leaving the work yesterday. a ubuntu-based node running on the same class c also lost network connectivity as well. this morning, i powered down all dappnode validator host machines & the ubuntu machine, unplugged them and the hub, & reconnected then powered them back on. i’m “pausing” the ipfs process on the dappnodes & will see if that helps. if that doesn’t work, i may be down to checking the hardware to verify that everything is seated properly (to “check the box” on the basic troubleshooting list). running low on better ideas.

sebaslogen · 10 March 2022 15:08

Well, this time the IPFS trick did not last more than a day and the router is back to rebooting so I simply disconnected the Dappnode until I can spend more time debugging the problem.
I hope someone can find a solution because exiting the validator has not been an option I considered until now, with this awful problem.

annandale · 10 March 2022 19:55

i’ve noticed that, whenever these NIC reset errors occur to our gnosis beacon chain validator host, the os crashes and the xdai nethermind execution client data repository is corrupted. not sure whether all of this is causation or correlation…

i started to consider the scenario of exiting the validator set also. we have a fairly robust perimeter firewall, yet these problems began only a day or two before the Ukraine situation blew up, & just when the cybersecurity forums began warning of an upswing in threat-actor activity. but maybe i’m just paranoid…

Foxglove · 14 March 2022 19:43

You’re describing different symptoms. I think it’s a good idea to open a thread dedicated to the issue of resetting adapters on your node!

Foxglove · 16 March 2022 09:50

A combination of three settings solved the issue:

Set IPFS to Remote
Disable IPFS locally
Throttle Geth by setting --maxpeers=7

Each vertical line is a router reset. I resolved the issue at 03/15 23:00.

Generally I’m very happy with the DappNode product. This resource set me on the right track. To further improve the product I’d like to make some suggestions.

Setting IPFS to remote should automatically disable IPFS on the node
I don’t think running IPFS locally should be the default setting
The recommended monitoring tooling do not offer any insight in network consumption in terms of bandwidth. The grafana dashboard supports it, but no metrics are collected out of the box.
No moderator/support agent responded to this thread over the course of two weeks, even though it is apparently a well-known issue that IPFS can hog bandwidth and take down cheaper routers.

Hope this helps.

system · 23 March 2022 09:51

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.