Cisco Voice Servers Version 11.5 Could Not Load modules.dep

About 6 months ago we updated 3/4 of our Cisco Telephony environment from 8.5 to 11.5. The only reason we didn’t do it all is because UCCX 11.5 wasn’t out yet so it went to 11. While there were a few bumps in the road; resizing VMs, some COP files, etc. the update went well. Unfortunately once it was done we starting having a glorious issue where after a reboot the servers sometimes failed to boot, presenting “FATAL: Could not load /lib/modules/2.6.32-573.18.1.el6.x86_64/modules.dep: No such file or directory”. Any way you put it, this sucked.

The first time this happened I call TAC and while they had seen it, they had no good answer except for rebuild the VM, restore from backup. Finally after the 3rd time (approximately 3 months after install) the bug had been officially documented and (yay) it included a work around. The good news is that the underlying issue at this point has been fixed in 11.5(1.11900.5) and forward so if you are already there, no problems.

The issue lies with the fact that the locked down build of RHEL 6 that any of the Cisco Voice server platforms are built on don’t handle VMware Tools updates well. It’s all good when you perform a manual update from their CLI and use their “utils vmtools refresh” utility, but many organizations, mine included, choose to make life easier and enable vCenter Update Manager to automatically upgrade the VMware tools each time a new version is available and the VM is rebooted.

So how do you fix it? While the bug ID has the fix in it, if you aren’t a VMware regular they’ve left out a few steps and it may not be the easiest thing to follow. So here I’m going to run down the entire process and get you (and chances are, myself when this happens in the future) back up and running.

0. Go out to the cisco.com site and download the recovery CD for 11.5. You should be able to find that here, but if not or if you need a different version browse through the downloads to Downloads Home > Products > Unified Communications > Call Control > Unified Communications Manager (CallManager) > Unified Communications Manager Version 11.5 > Recovery Software. Once done upload this to any of the datastores available to host your failing VM resides on.
1. If you’ve still got the VM running, shut it down by right clicking the VM>Power>Power Off in the vCenter Web UI or the ESXi embedded host client.
2. Now we need to make a couple of modifications to the VM’s settings to tell it 1) attach the downloaded ISO file and check the “Connected at boot” box and 2) Under VM Options> Boot Options to “Force BIOS setup” at next boot. By default VMs do not look at attached ISOs as the first boot device. Once both of these are done it’s time to boot the VM.
3. I personally like to launch the VMware Remote Console first and then boot from there, that way I’ve already got the screen up. After you power on the BIOS in a VM is the same old Phoenix BIOS we all know and love. Simply tell the VM to boot to CD before hard drive, move to Exit and “Save and Exit” and your VM will reboot directly into the recovery ISO.
4.  Once you get up to the Recovery Disk menu screen as shown to the left we need to get out to a command prompt. To do this hit Alt-F2 and you’ll be presented with a standard bash prompt.
5. So the root cause of all this issue is that the initramfs file is improperly sized after an automatic upgrade of VMware tools has been processed. So now that we have our prompt we first need to verify that we are actually seeing the issue we expect. To do this run the command “ find / -name initramfs* .” This command should produce the full path and filename of the file. So to get the size of this file you now need to run an ls -lh against it. In my example your full command would be “ ls -lh /mnt/part1/boot/initramfs-2.6.32-573.18.1.el6.x86_64.img .” If you aren’t particularly used to the Linux CLI once you get past …initr you should be able to hit tab to autocomplete. This should respond by showing you that that file is incorrectly sized somewhere between 11-15 MB.
6. Now we need to perform a chroot on the directory that contains boot objects. In most cases this should simply be “ chroot /mnt/part1 “

7. Finally we need to manually re-run the VMware Tools installer to to get the file properly sized. These are included locally on the Recovery Disk so just run the command “ /usr/bin/vmware-config-tools.pl -d ” There are various steps throughout the process where it is going to ask for input. Unless you know you have a reason to differ just hit enter at each one until it completes.

Once the VMware Tools installation is done up arrow to where you checked the size of initramfs…img file above and rerun the command. You should now see file size changed to 24 MB or so.

8. Now we just need to do a little clean up before we reboot. You need to make sure you go into Settings for your VM and tell it not to connect the ISO at boot. Once you make that change you should be able to flip back over to your console and simply type reboot  or shutdown -r 0  to reboot back to full functionality.

 

Quieting the LogPartitionLowWaterMarkExceeded Beast in Cisco IPT 9.0.x Products

As a SysAdmin I’m used to waking up, grabbing my phone and seeing the 20 or so e-mails that  the various systems and such have sent me over night, gives me an idea of how the day will go and what I need start with. Every so often though you get that morning where the 20 becomes 200 and you just want to roll over and go back to bed. This morning I had about 200, the vast majority of which was from my Cisco Unified Contact Center Express server with the subject “LogPartitionLowWaterMarkExceeded.” Luckily I’ve had this before and know what to do with it but on the chance you are getting it too here’s what it means and how to deal with it in an efficient manner.

WTF Is This?!?

Or at least that was my response the first time I ran into this. If you are a good little voice administrator one of the first things you do when installing your phone system or taking one over due to job change is setup the automatic alerting capability in the Cisco Unified Real Time Monitoring Tool (or RTMT, you did install that, right?) so that when things go awry you know in theory before the users do. One of the downsides to this system is it is an either on or off alerting system meaning what ever log events are saved within the system are automatically e-mailed at the same frequency.

This particular error message is the by-product of a bug (CSCul18667) in the 9.0.x releases of all the Cisco IP Telephony products in which the JMX logs produced by the at the time new Unified Intelligence Center didn’t get automatically deleted to maintain space on the log partition. While this has long since been fixed phone systems are one of those things that don’t get updated as regularly as they should and such it is still and issue. The resulting effect is that when you reach the “warning” level of partition usage (Low Water Mark) it starts logging ever 5 minutes that the level has been reached.

Just Make the Screaming Stop

Now that we know what the issue is how do we fix it?

Go back to the RTMT application, and connect to the affected component server. Once there you will need to navigate to the Trace & Log Central tool then double-click on the Remote Browse option. remote-browse
Once in the Remote Browse dialog box choose “Trace Files” and then we really only need one of the services selected, Cisco Unified Intelligence Center Serviceability Service and then Next, Next, Finish. select-cuic
Once it is done gathering all of the log files it will tell you your browse is ready. You then need to drill all the way down through the menu on each node until you reach “jmx.” Once you double-click on jmx you will see the bonanza of logs. It is best to just click one, Ctrl+A to select all and then just hit the Delete button. browse-to-node
After you hit delete it will probably take it quite a while to process through. You will then want to click on the node name and hit refresh to check but when done you should be left with just the currently active log file. Afterwards if you have multiple nodes of the application you will need to repeat this process for the other. all-clean

And that’s it really. Once done the e-mail bleeding will stop and you can go about the other 20 things you need to get done this day. If you are experiencing this and if possible I would recommend being smarter than me and just update your CIPT components to a version newer than 9.0 (11.5 is the current release), something I am hoping to begin the process of in the next month or so.

3 steps to really reset a Cisco 7900 Phone

Recently had some issues with one of our phones at the office and you know how it goes, reboot it. What you may not know is that there are different levels of “reboot” for the 7900 series phones, each of which are a little more pervasive. In this post I’ll outline how to go about performing these 3 ways to reset your desk phone to cure what may or may not be ailing you.

I. The Simple Reset

Sure you could go into ccmadmin and hit the reset button but that doesn’t work as well if you are standing right in front of it.  A quick reset can be performed by doing the following directly from the device

  1. Hit the settings button on the device
  2. Hit **#** on the keypad
  3. You should then see the screen display the “Resetting…” message followed by a reboot

II. Configuration Erase

When you boot your 7900 series IP phone as part of the boot sequence it reaches out to your Publisher’s TFTP server to grab a copy of either its specific configuration file or if none exist the default configuration file. Once this occurs it is stored locally to allow for quicker subsequent reboots. From time to time this locally cached copy will get gummed up and it is necessary to erase it and have it download a fresh copy. To do this the steps are

  1. Hit the settings button on the device
  2. Hit the **# buttons in order, afterwards you will see “Settings Unlocked!” display on the screen and a “More” soft button appear on the screen
  3. Hit the “More” soft button followed by the “Erase” soft button.
  4. You should then see the screen display the “Resetting…” message followed by a reboot

III. Factory Reset

This is the big daddy, if neither of the previous fixes worked then this process will erase not only the configuration but any firmware updates you have pushed to it as well, resulting in a phone as fresh as when it left the factory from a software perspective. To perform this process do the following steps:

  1. Unplug the power cable and/or the switch cable if using PoE
  2. Plug the device back in, pressing and holding the “#” key before the Speaker button flashes on and off
  3. Continue to hold the # button until each line button flashes on and off in sequence (amber).
  4. Next release the # and in order hit 123456789*0#
  5. After the sequence is done correctly the line buttons will flash red and then the phone will reboot.
  6. The phone will go through multiple reboot processes as various firmware loads and configuration files are downloaded.
  7. Do not remove power in any way until the reset process is completed in its entirety. You will know that this is done when the phone either correctly registers to CUCM or display the “Registering…” message on the screen.

That’s it, if you’ve made it this far without fixing your issue then you either need to get back in CUCM and check you configurations of the device or contact TAC for a replacement device.

Allowing Supervisors to Modify Skill Levels in UCCX 9

Since we installed Cisco’s Call Manager Express call center system a couple of years ago I could set my watch by the requests from our group of supervisors to modify the skill level of our various agents for the various Customer Service Queues (CSQs).  Generally at the same time they will request access to do this themselves.  Imagine my excitement when UCCX 9 was released and one of the features was a mobile browser application called, creatively, Mobile Skill Manager to do just that.  Further you can imagine my chagrin when after upgrading we quickly realized that the app doesn’t work particularly well at this point, either in a mobile browser or through any of the major standard browsers.

So to twitter I went trying to find a way to make this happen and lo and behold I found the answer within the System Parameters of UCCX.  Start by logging into the web interface and look in System> System Parameters.  Then under the Application Parameters section you will find an option called “Supervisor Access.”  By default this will be set to No Access to Teams, and if you want to provide access you will need to choose one of the other two options depending on  your need, Access to All Teams or Access to Supervisor’s Teams only.  For us we chose the former because we are a relatively small call center where all the Supervisors cross train.

uccx9-after-screenshotSo what does this do?  Changing this setting allows Supervisors access to a subset of the menu when they log in with their own credentials at the /appadmin web link, specifically it allwos them access to the RmCm Subsystem which controls the various settings related to CSQs, Resources (Agents), and Skills.  You may want to provide this access with a little guidance because with this they will be able to create and delete CSQs, Skills and Resources as well and most likely you won’t want them to do this.

While I am happy to have this option, I believe we can do it better.  In a perfect world this the base functionality would be built into the Supervisor Desktop application or the new Finesse web interface, with a capability to turn access on and off.  Further I’ve heard tale of an IP service application being developed by CTI Logic to allow desktop phone access to perform this task.  Both of those would be extremely nice to have as less interfaces for the user to know is always a good thing.