[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[cobalt-users] FIX - can't su to root, email stopped working, gui stopped working, postgres database is down, virtual sites disappeared



Forgive the long subject line, it's so this will show up in a variety 
of searches.
Yesterday my RaQ went nucking futs. I'm gonna post the symptoms, the 
fix (found via the archives), some bitching, and a little secret that 
I never previously wanted to divulge.

First, the symptoms:
Suddenly I couldn't su to root from admin. Checked the groups file, 
everything was fine, admin was still in the wheel. Changing the 
password from the GUI didn't help - it was still telling me incorrect 
password when I tried to su, even though I know I was using the 
correct password.
No one could get their email via POP. It was telling them incorrect 
username/password. They could, however, login to FTP and get their 
mail from the Neomail interface with that same username and password.
Then the GUI started having serious problems. Hitting it with a new 
browser, it would look fine. Change pages and I'd get an "internal 
server error" page, with logs showing premature end of script 
headers. Hit it with a new browser, and again it would look fine 
until I changed pages.
And the RaQ kept sending me emails about the postgres database being 
down, which of course I never saw until I got the damn thing fixed 
because I couldn't get my mail from POP and was too busy to screw 
with webmail.

Sounds like a hacker's work, right? Wrong.
The RaQ has been just about falling over each morning while 
processing the logs. It's not like it's a horrendous job, either - 
there are only 82 sites on this box and none of them are what you'd 
really call high-traffic.
But, given how the RaQ handles things, it got done processing the 
logs and had no memory left for anything else (it's got 256MB in it), 
and the postgres db got corrupted.
Which caused all hell to break loose.

The way I figured it out was that people could still get in through 
ftp. I know ftp works off of the /etc/passwd file. Since the Cobalt 
"handles" the entire POP process with its pop-before-relay, that 
means that it was probably dependent on the postgresql database to 
get the usernames and passwords rather than using what everything 
else does... the passwd file.
The GUI, of course, runs completely off of the postgres db 
(proprietary software, don't ya know), which would explain why it 
(when it *did* give me a page) was showing no virtual sites and no 
users listed.
The entire time, the websites were serving up fine, the machine was 
still *receiving* email fine (showing that it wasn't a problem with 
sendmail), and ftp was working fine.

Which all led me back to the postgres database. No matter how many 
times I restarted it, even rebooted the machine, it would not stay 
up. This led me to digging through the archives for a way to "fix" 
the postgres db, which I found an exact step-by-step for posted back 
in January by Andrew, and if you need it, you can find it here:
http://list.cobalt.com/pipermail/cobalt-users/2002-January/059762.html

The thing is, you have to be root to do these things, and I could not 
su to root from admin.

Here's the little secret I never wanted to divulge before. 
I've known for quite a long time now that there is a backdoor built 
into the Cobalt SSH2 package. I found it out of pure curiosity one 
day. Perhaps other people have too, but just didn't want to say 
anything because it *is* such a glaring hole but makes a nice saving 
grace if you ever find yourself in the situation I was in yesterday.
If you've got the Cobalt SSH2 package installed on your machine, you 
can shell to your machine and login AS ROOT. When the login prompt 
comes up, where you would normally type "admin", just type "root" (no 
quotes). Give it your password and you're in.
For those of you who like the thought of having this in place should 
you ever find yourself without root access, enjoy.
For those of you who are security minded, check your sshd. You'll 
find "PermitRootLogin yes" in there somewhere. Just change this to no 
and restart sshd.
There's the secret. If everyone knew about it and no one was saying 
anything for whatever reasons... sorry.

So using this method I was able to get in as root, wipe out the 
postgres files, rebuild them and run meta-verify on them to restore 
the sites and users to the database. Everything immediately started 
working perfectly again.

That's the method, the fix, and the secret. Now the bitch.
I'm sick of getting emails every day about how "var is very close to 
being full". It's nowhere near close to being full. There's hardly 
anything in there. There are, however, symlinks to all of the mail, 
which is kept in the /home partition.
The stupid swatch program counts all symlinked data when it's 
figuring up totals - resulting in an annoying daily email,  a false 
alarm red flashing light whenever I go into the GUI, and probably a 
lot of scared webmasters out there  wondering what the hell they've 
missed when they do a df and don't get anything that would be cause 
to worry.

The SSH2 backdoor either needs to be publicized, or fixed. It's a 
security hole. I just publicized it, y'all can decide whether you 
want to fix it or not.
The RaQ choking on the logs with 256MB of RAM on 82 sites that only 
received 24,463 total "web hits" last week (according to Jens' 
TrafficLight program) is just ridiculous.
And the swatch not knowing how to correctly judge disk usage (not 
having the correct command passed to it to ignore symlinked data)  
when the symlinks were put there by Cobalt and the swatch was as well 
- the oversight or non-common-sense of the matter just blows my mind.

Anyway, thought I would put this out for anyone else who was having 
the same problems listed in the subject line - hopefully there's a 
fix in this email for you.
--
CarrieB