I’ve been running a home server based on debian for ~10 years now and recently decided to replace it with an HP EliteDesk 705 65W G4 Desktop Mini PC, but the new machine keeps crashing.
The machine will run fine for a few hours, then suddenly begins:
- Rejecting SSH connections immediately
- Returning “Command not found” for any commands run in existing ssh connections (e.g., ls)
- Giving I/O errors in response to running processes like
- Not showing any display output to connected monitors
I typically run a few home services in docker containers and initially thought an oddity of my config might be causing the crashes, so I decided to select a random existing github repo with a few containers and run it from scratch. I decided to use this HTPC download box repo, which seems to have a few linuxserver.io containers and should be a reasonable approximation for the lower bound of the workload my services would put on the machine.
Steps I have followed to create the crash:
- Install headless debian (netinstall image); configure the OS by following the below steps:
- Set hostname:
- Set domain:
- Add a new user, add user to sudoers, set up SSH to allow for only keys and only nick can log in (including adding my desktop’s public key to ~/.ssh/authorized_keys)
usermod -aG sudo nick
sudo nano /etc/ssh/sshd_config; the specific settings you want are:
- Restart ssh:
sudo service sshd restart
- Install necessary services:
sudo apt-get install docker docker-compose git
- Add your user to the docker group:
sudo usermod -aG docker nick
- Generate a new ssh key and add it to your GitHub account:
ssh-keygen -t ed25519, then copy the public key to GH
- Set your global git vars:
git config --global user.name 'Nick';
git config --global user.email email@example.com
Wait 2 days, verify no crash occurs
- Run the following commands:
sudo mkdir htpc-download-box
sudo chown -R nick:nick htpc-download-box
git clone firstname.lastname@example.org:sebgl/htpc-download-box.git
docker-compose up -d
(Note: I do no configuration whatsoever of the containers in the docker-compose file, I just start them running and then confirm I can access them via browser. I use the exact .env.example as the .env for the project.)
Wait a few hours, observe that server has crashed. Unable to log in via SSH and other issues as stated above. Interestingly, I can still view the web UI of some containers (e.g., sonarr) but when trying to browse the filesystem via that web UI, I am unable to see any folders and manually typing the path indicates that the user has no permissions to view that folder.
Since I observe crashes when running either my actual suite of services or the example repo detailed here, I must conclude it’s an issue with the machine itself. I have tested the nvme drive with smartmontools and both the short and long tests report no errors.
I am not familiar enough with Linux to know how to proceed from here (maybe give it another 10 years!) – what logs can I examine to determine what might cause the crash? Should I be setting up additional logging of some sort to try to ascertain the cause?
All of the issues are so general (I/O errors, SSH refusal, etc.) that Googling for the past week has not gotten me anywhere; I was sure the clean reinstall and using a new repo would not crash and I could then incrementally add my actual docker containers until a crash occurred, therefore finding the problematic container via trial and error, but I am now at a complete loss for how to proceed.