libera/#infobot/ Thursday, 2018-12-06

TimRikerhmm. cpu spike is visible now.... happens every day, eats up all the ram on the box.03:31
TimRikeradded timeout script with memory limit. still not sure what causes the cpu/mem to spike.04:07
DocScrutinizer05hi tim!04:50
DocScrutinizer05I just thought I'm dreaming, my keep-alive script made infobot reply "OK" to "noreplyplease" :-)04:51
DocScrutinizer05TimRiker: I have an estimated 30 logs of infobot ssh restart04:52
DocScrutinizer05some weird stuff, though not much useful. Most informative: it seems the infobot process doesn't get restarted, it keeps same process number04:53
DocScrutinizer05~factinfo noreplyplease04:53
infobotnoreplyplease -- created by TimRiker <~TimRiker@bzflag/projectlead/TimRiker> 11m 54s ago; it has been requested 7 times, last by DocScrutinizer05, 27s ago.04:53
DocScrutinizer05hmmm, what's the purpose?04:54
DocScrutinizer05I gonna stop the /ping-ibbot script for now, until it's clear what's the deal with this ok reply04:55
TimRikerDocScrutinizer05, your query makes the bot query four other infobot instances every time. avoiding that overhead.04:55
DocScrutinizer05oh, ok04:55
DocScrutinizer05hmmm, so I have to live with 3 always highhlighted query tabs04:56
TimRikercheck out the mrtg logs here: every day the box eats all 8G of ram on the server.04:57
DocScrutinizer05yeag, the ibot process eats ram, that's also obvious from my logs here04:57
TimRikerI installed "timeout" and limited the ram. Not found the cause, but it should stop it from slowing down the whole machine every day.04:58
DocScrutinizer05I noticed the downtime moved ahead from locat time 2:19 to meanwhile 2:2805:02
DocScrutinizer05well, once yesterday05:03
DocScrutinizer05I still glare at cron.daily and logrotate (and systemd who ate both) for that daily ibot madness05:04
DocScrutinizer05the convatenated logs I pm'ed you the pastebin URL are maybe most interestimn at very beginning and last 200somelines05:06
DocScrutinizer05seems the restart doesn'T do anything at all anyway05:07
DocScrutinizer05btw my ssh session to infobot hangs when I run ssh under timeout05:07
TimRikeribot does not have a cron job. it uses the built in log rotation. which might be buggy.05:08
DocScrutinizer05quite possible05:09
DocScrutinizer05during the last weeks I've seen the bot come back within minutes after 2:19 "regular" offline05:09
DocScrutinizer05two or three times.  usually it takes 90 minutes though05:10
DocScrutinizer05the restart job anyway doesn't restart infobot. there's some bug in it05:12
DocScrutinizer05maybe it only restarts some wrapper, or whatever05:12
TimRikerI don't remember what calls the log to html code... looking.05:13
TimRikerah. processLog() calls scripts/processlog05:16
DocScrutinizer05I mean syslogs. There once been some /etc/cron.daily/logrotate which became obsolete with advent of systemd05:19
DocScrutinizer05no idea which "smart" automatism systemd introduced instead05:20
DocScrutinizer05quite easily could conflict with internal logrotates. Or got messed up during migration sysvinit->systemd05:21
DocScrutinizer05also see /etc/logrotate.d/*05:22
DocScrutinizer05or /etc/logrotate.d/*apt*05:22
TimRikeribot. yeah, no entries in logrotate.d either. there's just the loopibot script that calls/restart infobot, and infobot calls processlogs when the date changes and it closes and opens a log file.05:24
TimRikerno cron, logrotate, etc involved other that to start the loop in reboot.05:24
infobotTimRiker meant: no cron, logrotate, etc involved other than to start the loop in reboot.05:24
DocScrutinizer05you *could* use a `at` (or cron) batchjob timed to attach ibot process to strace at 00:17 UTC and detach at 00:30 maybe ;-)05:26
DocScrutinizer05somehow that sneaky critter that causes this must get shot05:27
DocScrutinizer05IOW there must be a timed trigger causing this05:27
TimRikeryep. today's change was to use timeout. We'll see if that helps. I should get on when the logs rotate and watch what happens.05:29
DocScrutinizer05is there a NOOP command to ibot?05:30
DocScrutinizer05something I could (try to) send and not get any reply?05:31
DocScrutinizer05HMMM a (IRC) /ping maybe?05:34
TimRikerwhat's the point in sending if you don't want a reply? How do you know if it's not listening?05:35
DocScrutinizer05IRC server tells me05:35
TimRikerif you're depending on the irc server, can't you just add it to your notify list?05:36
DocScrutinizer05notify is not at all any reliable05:37
DocScrutinizer05[6 Dec 2018 05:35:22] [CTCP] Sending CTCP-PING request to infobot.05:37
DocScrutinizer05[6 Dec 2018 05:35:22] [CTCP] Received CTCP-1544070922 reply from infobot: .05:37
DocScrutinizer05might work, it will not open a query at least05:37
DocScrutinizer05but won't notice stale bot?05:38
TimRikerlooks like it does not reply to ctcp clientinfo05:41
DocScrutinizer05let me know when I should test the restart ssh05:41
TimRikerwhenever you like. is it not restarting?05:42
DocScrutinizer05nag, as I said above and shows in the logs I pmed you, the ibot process keeps same PID over several restart runs05:42
DocScrutinizer05ok I shoot05:43
TimRikerit does a killall infobot which should work.05:43
TimRikerremember that when you kill the bot, it tries to do a graceful shutdown and restart, so it's not instant unless something is broken.05:45
TimRikerrunning killall infobot on the command line works almost immediately.05:46
DocScrutinizer05maybe not when bot hangs ;-) might need SIGKILL not SIGTERM05:47
TimRikerssh and running the script works fine. hmm. perhaps I should wait a second and then do a -9 ?05:48
DocScrutinizer05actually I'm pretty sure a plain killall (without brute force signal) won't help when process in swap hell or the like05:48
DocScrutinizer05make that 10 s05:48
TimRikerthe loop sleeps 10 seconds, so it has to be less than that or we'll kill it twice.05:49
DocScrutinizer05aaah wait05:49
TimRikersetting it to 3 seconds. it should be able to shut down that quickly.05:49
DocScrutinizer05it restarts immediately when killed?05:49
TimRikerit waits 10 seconds and then restarts.05:49
DocScrutinizer05sorry, I fail to envision it. What I know is: sometimes it takes quite a while to flush all buffers etc, before a process ends gracefully on plain kill05:51
DocScrutinizer05kill infobot; sleep 10; kill -s0 infobot&&kill -s9 infobot05:56
DocScrutinizer05ooh, doesn't work with killall05:57
DocScrutinizer05killall knows --older-than   though ;-)05:58
DocScrutinizer05killall infobot; sleep 20; killall -s9  --older-than 30s infobot05:59
DocScrutinizer05gotta say thank you to Werner Almesberger for it, once more when I meet him next time06:01
DocScrutinizer05TimRiker: I'm puzzled by the ~/timeout/timeout -m 50000006:06
DocScrutinizer05what does it do?06:07
DocScrutinizer05oooh -m emeory?06:07
* DocScrutinizer05 clearly needs a smarter timeout06:08
DocScrutinizer05we'll see06:14
DocScrutinizer05haha yeah!  >>Sometimes waitpid call from inside SIGALRM handler returns -1 for no apparent reason. This return value is ignored, and the appropriate warning is printed, but the cause of such a behavior is still unknown.<<06:20
infobotSince Thu Dec  6 04:47:41 2018, there have been 0 modifications, 0 questions, 0 dunnos, 0 morons and 0 commands.  I have been awake for 56m 7s this session, and currently reference 119780 factoids.  I'm using about 80496 kB of memory. With 0 active forks. Process time user/system 2.81/0.18 child 0/006:43
infobot- Uptime for infobot -06:43
infobotNow: 56m 18s running infobot 1.5.4 (SVN) -- linux06:43
infobot1: 59d 8h 41m 19s running infobot 1.5.4 (SVN) -- linux, ended Sun Nov 14 18:39:57 201006:44
infobot2: 57d 3h 9m 23s running infobot 1.5.4 (SVN) -- linux, ended Fri Jun 26 20:39:27 200906:44
infobot3: 53d 8h 24m 4s running infobot 1.5.4 (SVN) -- linux, ended Tue Nov  7 05:34:20 201706:44

Generated by 2.17.0 by Marius Gedminas - find it at!