3004-010 Failed setting terminal ownership and mode

Hopefully you or someone else is only encountering this while trying to login to an AIX system. If you already have a login session open on the system it may be an easy fix, if you don’t, and NO ONE else does either, it’s a bit more complicated.

Backstory

This is based on a real world customer experience I had on Halloween night 2019. Though I did find some online info to help it took a while. I’m doing this in hopes that it’s another hit that will help others with more detail.

So customer’s users started getting this error when trying to login, though all had been working fine all day. The short story is, it was the result of the /etc/group file getting overwritten by mistake with another file.

Before reading on, IF you have encountered the 3004-010 error AND you still have an active login somewhere on the system, you can simply open the /etc/group file and cut/paste contents from another /etc/group file from another system, or restore it from a mksysb, and save a lot of time and grief. Remember right now you can’t do any new scp or anything remotely to the box.

The long story, after the problem was encountered it was decided to reboot the system for reasons I am unaware. It was hard rebooted via the HMC. Upon bootup, many messages, 0513-012 and 0481-002, about failures to start daemons/process were encountered because of user id and no recognized group in /etc/group. This is the point in which I got engaged via an SOS call. Specifics shown below:

Recovery

In this case it required another bootable AIX image and boot into SMS mode to use it. The typical options are boot from a mksysb or AIX install ISO of the same level via either a NIM server, VIOS virtual optical, or physical DVD. In this exact case, the machines were full system LPARs w/o a NIM server or VIOS, we actually resorted to physical media.

Via Media

We did this by activating the profile, via HMC, then chose advanced, then SMS, then ok (twice). After a few minutes we get a prompt to press 1 for console and hit enter, then 1 for English, then on next menu chose 3 to enter single user mode.

Chose the rootvg disk (or what you think it is) and it will it show lvs that belong to that vg and if it looks like rootvg then chose that one. If not, repeat until you find it. Then Chose #1 to access root volume group. This mounted filesystems for us.

We ran /etc/methods/cfg64 so our commands would work right. We copied /etc/group to /etc/group.orig. We did find the contents of /etc/group actually had /etc/sudoers contents in it. It would appear the sudoers file was copied over the group file some how. But what it was wasn’t as important as what it wasn’t. In this case they had some previous copies of /etc/group so we copied one over from a couple days back simply to get the system bootable again.

Once booted, we were able to login as root, then scp a prod version of the group file over and all was well again.

Via Clone

However, another potential option is, if available, booting from an existing clone of rootvg. In my case the customer did have a clone, however, it really didn’t come to light they did until AFTER we were able to successfully boot and login. That got me thinking we could’ve booted into SMS mode, and chose the clone rootvg disk instead, and finished normal booting. Then we could wake up the original rootvg, copy the cloned /etc/group file over, put the original rootvg back to sleep, set bootlist and reboot from the original rootvg again. So I tested this theory and proved it to be successful as follows.

In this test I intentionally clobbered the /etc/group file and rebooted the system to produce the errors and problem as previously described. I did then SMS boot from the cloned disk. I woke up the “old_rootvg”, which is the real/original/current rootvg that the bad group file is on, via:

alt_rootvg_op –W –d hdisk0

This mounts the other rootvg filesystems with alt_inst in front of their paths. I checked both versions of the group file group existed. They did and differed greatly. I actually saw /etc/hosts info in the original version of the group file.  I simply copied the group file from my clone copy over the original version that is currently available in /alt_inst/etc as shown below. If though the group file is trash I still make a copy of it first.

cp –p /alt_inst/etc/group /alt_inst/tmp/group

cp –p etc/group /alt_inst/etc/group

Then I put the original rootvg back to sleep <VERY IMPORTANT STEP as you can corrupt it if you don’t prior to rebooting> and then change bootlist to boot from it as shown both below in text and screenshots. Then after successful reboot, login as normal and enjoy the rest of your day.

alt_rootvg_op –W –s hdisk0

bootlist –m normal hdisk0 hdisk1

shutdown -Fr