After 3 years of running hedgedoc without any issues, now hedgedoc is not responding. (One time, I got the message in the browser: “I’m busy right now. Please try again later”. But now only timeouts).
The node process is consuming 100% cpu.
Could you provide me with hints where to look for troubleshoting?
I already tried:
Restart
Additional Info:
2 days ago (before restart) there were some of these messages in the log (not sure if connected to the issue):
Jun 01 01:09:36 yarn[3318013]: 2024-05-31T23:09:36.139Z error: Operation timeout
Jun 01 01:09:36 yarn[3318013]: 2024-05-31T23:09:36.139Z error: read history failed: SequelizeConnectionAcquireTimeoutError: Operation timeout
After the last restart, the log file shows many entries like this, which seems to be normal:
yarn[3478]: 2024-06-03T08:32:29.568Z info: deserializeUser: 16907837-22fe-4447-b148-86f55565ae81
But now also sometimes these messages:
MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 disconnect listeners added to [Socket]. Use emitter.setMaxListeners() to increase limit
MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 operation listeners added to [Socket]. Use emitter.setMaxListeners() to increase limit
MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 get_operations listeners added to [Socket]. Use emitter.setMaxListeners() to increase limit
MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 selection listeners added to [Socket]. Use emitter.setMaxListeners() to increase limit
50 open notes shouldn’t be a problem usually — the demo instance for example always has about 200 notes opened simultaneously. Of course the load also depends on your hardware, but HedgeDoc in general has quite a low resource usage. Maybe the amount of open file descriptors or other (u)limits in the system were reached and this caused some malfunctions and issues?
Regarding your question about the HedgeDoc configuration: There’s the variable CMD_TOOBUSY_LAG which defines the maximum CPU time in milliseconds for one tick of the internal processing loop. It defaults to 70 ms and usually that’s fine. You can play around with the value a bit — higher values may result in a lesser frequency of “I’m busy right now” messages but increase the overall lag of the application. If this was a single situation however and your instance is running fine otherwise, I wouldn’t change too much.