Forum : The BROKeN BuBBLe

Networked message handling/thread going to 100% cpu & no imports

From Khelair@VERT/TINFOIL to All on Saturday, December 06, 2014 21:36:44

I know I've mentioned this before, but the bug in synchronet that a few people have talked about that pegs a thread @ 100% of cpu usage after one of the networks (I believe- though this is a glorified assumption at this point) tries to pull messages, has been bugging me a lot more often recently. Basically at least once a day now I'm finding that after a prolonged period of time no messages have been imported to any of the networked subs, and inevitably, after I check the cpu stats, the sbbs process is pegged at 100%. A kill -15 won't kill it, after awhile I kill -9 it, restart it, and things seem to be working again. This time around I haven't noticed any particular sub-boards being corrupted in the process, but I've trimmed down the number of sub-boards that I'm reading lately due to not enough time, and FIDONet not posting anything for me in an error that my RC doesn't seem to be able to help me fix.
So I'm not sure exactly which networked function it may be, but when it happens it shuts down importing of all networked messages across 5 networks. It's really a hinderance, and I'd rather not have to fall back on setting up a shell script to run every hour to check for pegged usage for too long and then kill it off and restart it. That just can't be good for anything.
Can anybody give me some more information on how to get around this, since I still can't get a more recent version compiled on OBSD? I guess I could just default to disabling networked bases, one at a time (my preliminary suspect is FIDO), until it doesn't seem to happen any more, but that seems like it'd be unreliable and a really time-consuming way to get to the bottom of this.
Any input appreciated.

---
� Synchronet � Tinfoil Tetrahedron BBS telnet://tinfoil.synchro.net

From Access Denied@VERT/PHARCYDE to Khelair on Sunday, December 07, 2014 09:14:42

Hello Khelair,

On 06 Dec 14 21:36, Khelair wrote to All:

Can anybody give me some more information on how to get
around this, since I still can't get a more recent version compiled on OBSD? I guess I could just default to disabling networked bases, one
at a time (my preliminary suspect is FIDO), until it doesn't seem to happen any more, but that seems like it'd be unreliable and a really time-consuming way to get to the bottom of this. Any input
appreciated.

When this is occurring, take a look in your /sbbs/data/ directory for *.now. If
one exists (usually fidoin.now or fidoout.now or something similar) no other events will run until that one is done. So if no others are running, and one of
those .now files exist, *that* is the one causing other events not to run.

With that, you can narrow down exactly which event is doing this. After you know that, and if it's fidoin.now, you can check /sbbs/data/sbbsecho.log for any errors importing messages during that timeframe. If it's fidoout.now check the same log for exporting errors.

Sometimes it's a DOS event while processing door games for interBBS. If one game hangs during processing, it will stay locked up, and your DOS emulator would continue to run, pinging your CPU at 100%. Then again, you're running OpenBSD so you may not have any DOS games being processed, unless you're using something like DOSCMD or maybe got DOSEMU to compile for it..

Regards,
Nick

--- GoldED+/LNX 1.1.5-b20130910
* Origin: thePharcyde_ telnet://bbs.pharcyde.org (Wisconsin) (723:1/701)
� Synchronet � thePharcyde_ telnet://bbs.pharcyde.org (Wisconsin)

From Digital Man@VERT to Khelair on Sunday, December 07, 2014 18:29:19

Re: Networked message handling/thread going to 100% cpu & no imports
By: Khelair to All on Sat Dec 06 2014 09:36 pm

I know I've mentioned this before, but the bug in synchronet that a few people have talked about that pegs a thread @ 100% of cpu usage after one
of the networks (I believe- though this is a glorified assumption at this point) tries to pull messages, has been bugging me a lot more often recently. Basically at least once a day now I'm finding that after a prolonged period of time no messages have been imported to any of the networked subs, and inevitably, after I check the cpu stats, the sbbs process is pegged at 100%. A kill -15 won't kill it, after awhile I kill
-9 it, restart it, and things seem to be working again. This time around I haven't noticed any particular sub-boards being corrupted in the process, but I've trimmed down the number of sub-boards that I'm reading lately due to not enough time, and FIDONet not posting anything for me in an error
that my RC doesn't seem to be able to help me fix.
So I'm not sure exactly which networked function it may be, but when it happens it shuts down importing of all networked messages across 5
networks. It's really a hinderance, and I'd rather not have to fall back on setting up a shell script to run every hour to check for pegged usage for too long and then kill it off and restart it. That just can't be good for anything.
Can anybody give me some more information on how to get around this,
since I still can't get a more recent version compiled on OBSD? I guess I could just default to disabling networked bases, one at a time (my preliminary suspect is FIDO), until it doesn't seem to happen any more, but that seems like it'd be unreliable and a really time-consuming way to get
to the bottom of this.
Any input appreciated.

Are all of these "network" fidonet technology nets (FTNs)? If so, then the process that handles importing and exporting would be SBBSecho, not sbbs. Which
process exactly do you see with a 100% CPU utilization? What is the log output at the time that is occuring? What version of SBBS and SBBSecho are you using? Without more details, it's really hard to help.

digital man

Synchronet "Real Fact" #19:
Michael Swindell was directly responsible for Synchronet's commercial success. Norco, CA WX: 67.0�F, 54.0% humidity, 0 mph WSW wind, 0.00 inches rain/24hrs

---
� Synchronet � Vertrauen � Home of Synchronet � telnet://vert.synchro.net

From Khelair@VERT/TINFOIL to Digital Man on Monday, December 08, 2014 14:40:17

Re: Networked message handling/thread going to 100% cpu & no imports
By: Digital Man to Khelair on Sun Dec 07 2014 18:29:19

Are all of these "network" fidonet technology nets (FTNs)? If so, then
the process that handles importing and exporting would be SBBSecho, not sbbs. Which process exactly do you see with a 100% CPU utilization? What is the log output at the time that is occuring? What version of SBBS and SBBSecho are you using? Without more details, it's really hard to help.

Okay, so just wanted to give you an update on this... True to normal form for me, now that I've mentioned it I haven't had a crash for a bit. I understand about the FTN using sbbsecho as the subsystem. I guess my thoughts on it defaulting to one of the Fido conferences could be bogus. Now that I think about it a little bit more I think it's corrupted dove-net conferences at times, too.
Anyway, I've cut 'n pasted your bit above here so I have it ready for my next crash. I'll get at everything and post it when it happens here.
As far as which process, I don't have threads display on, or haven't for previous crashes, so when I run /sbbs/exec/sbbs, I display one 'zombie' (I think that's the right term) for the process, or else it's shelled out or something, because it's in parenthesis, unkillable, and never uses CPU. The other sbbs process stays active, and then when it skyrockets to 100%, if I have an existing login I can use it, but no messages are imported at all until I kill it off and restart. Again, no go on signal 15, only signal 9 cuts through.
SBBS version is 3.16. I've tried upgrading to current, can't get it to compile on OpenBSD and I haven't had the time to try to track that all the way down yet. SBBSECHO is v2.26-OpenBSD (rev 1.234).

---
� Synchronet � Tinfoil Tetrahedron BBS telnet://tinfoil.synchro.net

From Khelair@VERT/TINFOIL to Access Denied on Tuesday, December 09, 2014 20:36:43

Re: Re: Networked message handling/thread going to 100% cpu & no imports
By: Access Denied to Khelair on Sun Dec 07 2014 09:14:42

When this is occurring, take a look in your /sbbs/data/ directory for *.now. If one exists (usually fidoin.now or fidoout.now or something similar) no other events will run until that one is done. So if no others are running, and one of those .now files exist, *that* is the one causing other events not to run.

With that, you can narrow down exactly which event is doing this. After you know that, and if it's fidoin.now, you can check /sbbs/data/sbbsecho.log for any errors importing messages during that timeframe. If it's fidoout.now check the same log for exporting errors.

Well I caught a couple of atypical ones now. Straight up crashes, where I've got an open session and I come back awhile later and the connection is terminated. These ones appear to be happening right around the time that qnet-qwk.now is being created, though they don't appear to have anything in the associated .lo? file.

---
� Synchronet � Tinfoil Tetrahedron BBS telnet://tinfoil.synchro.net

From Khelair@VERT/TINFOIL to Digital Man on Tuesday, December 09, 2014 22:00:13

Re: Networked message handling/thread going to 100% cpu & no imports
By: Digital Man to Khelair on Sun Dec 07 2014 18:29:19

Are all of these "network" fidonet technology nets (FTNs)? If so, then
the process that handles importing and exporting would be SBBSecho, not sbbs. Which process exactly do you see with a 100% CPU utilization? What is the log output at the time that is occuring? What version of SBBS and SBBSecho are you using? Without more details, it's really hard to help.

Damn I'm not sure if my last reply got out on this or not, sorry if I'm stuttering here... Anyway, no, it appears that the existance of /sbbs/data/qnet-qwk.now coincides with the last 2 crashes, however they've been atypical. I was talking about the core pegged thread ones, and these ones were straight up crashes that caused a socket disconnect from logged in clients and everything, not just a zombiesh process.
Found nothing in the logs corresponding to the time.
SBBS is 3.16 OBSD

---
� Synchronet � Tinfoil Tetrahedron BBS telnet://tinfoil.synchro.net

From Access Denied@VERT/PHARCYDE to Khelair on Wednesday, December 10, 2014 17:15:14

Hello Khelair,

On 09 Dec 14 20:36, Khelair wrote to Access Denied:

Well I caught a couple of atypical ones now. Straight up crashes,
where I've got an open session and I come back awhile later and the connection is terminated. These ones appear to be happening right
around the time that qnet-qwk.now is being created, though they don't appear to have anything in the associated .lo? file.

For one, you don't ever have to associate QWK messages with .?lo files whatsoever. Two completely different transfer protocols. My question for you would be, are you hosting a QWK network? Or maybe it's when you're polling VERT
for Dovenet?

Maybe check your system log and see if there's any odd things going on right around the time it crashes.

Regards,
Nick

--- GoldED+/LNX 1.1.5-b20130910
* Origin: thePharcyde_ telnet://bbs.pharcyde.org (Wisconsin) (723:1/701)
� Synchronet � thePharcyde_ telnet://bbs.pharcyde.org (Wisconsin)

From Khelair@VERT/TINFOIL to Access Denied on Wednesday, December 10, 2014 21:41:22

Re: Re: Networked message handling/thread going to 100% cpu & no imports
By: Access Denied to Khelair on Wed Dec 10 2014 17:15:14

don't appear to have anything in the associated .lo? file.

For one, you don't ever have to associate QWK messages with .?lo files whatsoever. Two completely different transfer protocols. My question for you would be, are you hosting a QWK network? Or maybe it's when you're polling VERT for Dovenet?

I meant what I said about .lo? files, as in the ones that accumulate in /sbbs/data/logs/*.lo? (.log & .lol).

Maybe check your system log and see if there's any odd things going on right around the time it crashes.

Yep, that's what I referenced doing in the above file extensions. ;)

---
� Synchronet � Tinfoil Tetrahedron BBS telnet://tinfoil.synchro.net

From Access Denied@VERT/PHARCYDE to Khelair on Thursday, December 11, 2014 17:12:44

Hello Khelair,

On 10 Dec 14 21:41, Khelair wrote to Access Denied:

I meant what I said about .lo? files, as in the ones that accumulate
in /sbbs/data/logs/*.lo? (.log & .lol).

Maybe check your system log and see if there's any odd things
going on right around the time it crashes.

Yep, that's what I referenced doing in the above file extensions.
;)

I don't think those logs give you all information about your system, do they? Maybe you compiled it that way for your OS?

Otherwise, check your system log. I use syslog-ng on Gentoo here, and it logs to /var/log/messages (aside from the stuff in the /sbbs/data/logs directory).

Regards,
Nick

--- GoldED+/LNX 1.1.5-b20130910
* Origin: thePharcyde_ telnet://bbs.pharcyde.org (Wisconsin) (723:1/701)
� Synchronet � thePharcyde_ telnet://bbs.pharcyde.org (Wisconsin)

From mark lewis@VERT to Khelair on Thursday, December 11, 2014 22:15:51

On Wed, 10 Dec 2014, Khelair wrote to Access Denied:

don't appear to have anything in the associated .lo? file.

For one, you don't ever have to associate QWK messages with .?lo files

someone confused .lo? files with .?lo files... the latter are binkley style mailer files ;)

)\/(ark

* Origin: (1:3634/12)

---
� Synchronet � Vertrauen � Home of Synchronet � telnet://vert.synchro.net

From Nicholas Boel@VERT to mark lewis on Thursday, December 11, 2014 22:57:06

Hello mark,

On 11 Dec 14 22:15, mark lewis wrote to Khelair:

For one, you don't ever have to associate QWK messages with .?lo
files

someone confused .lo? files with .?lo files... the latter are binkley style mailer files ;)

I did. But then again, I originally wasn't referring to anything in /sbbs/data/logs, either. I was referring to the system log (ie: /var/log/messages in some Linux distros, journalctl on Archlinux, etc. ie2: your SYSTEM log, not your BBS logs, and if installed normally, Synchronet will automatically log to your system logs if you don't tell it not to, or don't run
as a daemon.

Regards,
Nick

--- GoldED+/LNX 1.1.5-b20130910
* Origin: thePharcyde_ telnet://bbs.pharcyde.org (Wisconsin) (1:154/701)
� Synchronet � Vertrauen � Home of Synchronet � telnet://vert.synchro.net

System Info

Sysop:	MCMLXXIX
Location:	Prospect, CT
Users:	325
Nodes:	10 (0 / 10)
Uptime:	09:35:12
Calls:	510
Messages:	220574

Networked message handling/thread going to 100% cpu & no imports

System Info