/var/log/spark/apps/ folder was deleted on our EMR cluster. I created a new hdfs folder with the same name and changed the permissions. Now each spark application is successfully writing logs to this hdfs folder.
However, something else was in that folder that allowed the Spark History Server that you can connect to through ssh tunneling to display the list of application logs. It worked just fine prior to the folder getting deleted, but now it does not display any spark application logs (complete or incomplete), even though
hdfs dfs -ls /var/log/spark/apps/ shows that the folder is full of logs.
The Spark History Server accessed through the EMR AWS Console still works, but this is less ideal as it significantly lags behind the Spark History Server accessed through an ssh tunnel.
What other item do I need to restore to this folder so that the Spark History Server opened through ssh tunneling shows these logs?
On a Windows computer, the following PowerShell code still opens the Spark History Server UI correctly, but the UI does not show any logs:
Start-Process powershell "-noexit", ` "`$host.ui.RawUI.WindowTitle` = 'Spark HistoryServer'; ` Start-Process chrome.exe http://localhost:8158 ; ` ssh -N -L 8158:ip-10-226-66-190.us-east-2.compute.internal:18080 email@example.com"
I have also stopped and restarted the Spark History Server.
sudo stop spark-history-server
sudo start spark-history-server
sudo -s ./$SPARK_HOME/sbin/start-history-server.sh