Category Archives: linux

too smart for it’s own good, can’t format a USB drive, device is busy error

One of those simple things that you assumed will work but you hit a snag on the last minute.

Last week, we’re on the process of exporting a few virtual machine images and database data for a proof-of-concept/benchmarking activity with a partner. The data is ready, all that needs to be done is copy it over a USB HDD.

The disk was already partitioned using parted, next step is to format it.

$ mkfs.ext4 /dev/sdb1
mke2fs 1.41.12 (17-May-2010)
/dev/sdb1 is mounted; will not make a filesystem here!

We’re dumbfounded on why this fails. We didn’t mount the filesystem!

After a few minutes of googling around and figuring out what /dev/dm-X, we stumbled on this article

These simple steps is just what we need to do:
$ multipath -ll
$ multipath -f XXXXXXXXXXXXX

Thanks for nothing multipath!

Can’t locate Perl module File/ in Opsview

I’ve been seeing these errors in our Opsview installation:

Can't locate File/ in @INC (@INC contains: /usr/local/nagios/bin/../lib /usr/local/nagios/bin/../etc /usr/local/nagios/bin/../perl/lib /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5 .) at /usr/local/nagios/bin/../lib/Opsview/ line 23.

These errors were generated by one particular crontab job:


I was able to install File::Slurp module through yum…

yum install -y perl-File-Slurp.noarch

Then, it’s asking for another module… “IPC::Run”

yum install -y perl-IPC-Run.noarch

Then, it’s asking for another module… “IPC::Run::SafeHandles”. But since there are no RPM packages for this, I am forced to do this the perl way… CPAN.

yum install -y perl-CPAN.x86_64

And then installed cpanm right after (as recommended here)

cpan App::cpanminus

Retrying the installation of “IPC::Run::SafeHandles”

cpanm IPC::Run::SafeHandles

Then, I’m stuck… The module IPC::Run::SafeHandles won’t install because it requires “List::MoreUtils::XS”. Which I cannot install because of an error with this unhelpful message:
Screen Shot 2017-08-23 at 1.41.54 PM

I did a few a google search about the module and I stumbled on this exciting discussion.

Hmmmm…. PUREPERL… maybe I need to invoke this via cpanm. Browsing through cpanm’s manual, it has a a –pp option for pure Perl.

cpanm --pp IPC::Run::SafeHandles

It installed without a hitch! 🙂

But as for the Opsview script, it still required 2 more modules:

yum install perl-Proc-Simple
yum install perl-Log-Log4perl

Now I have to find out what caused these Perl modules to be gone missing in the first place…

[Puppet] Hardening AWS Linux 2014.09 based on CIS benchmark

Update [2015-07-07]: Puppet module is practically done for hardening AWS Linux 2014.09, you can check it out here:

It’s been almost a year since I posted here. Work is very challenging nowadays…

The latest project that I’m part of is now dealing with financial services. Yup, this means a lot of security exercises that need to be done to comply with PCI-DSS (Payment Card Industry Data Security Standards). I find these exercises challenging, a new lens that let’s you understand a lot of things and even makes you paranoid sometimes. IT Security is core – I learned a lot in this area for the past few months.

Anyway, right now I’m working with OS hardening based on the benchmark provided by Center for Internet Security. They provide guidelines on how to do this. Just download the document for your OS here:

I’m working mostly in AWS nowadays – It’s a good thing that CIS released a benchmark for AWS Linux 2014.09 version.

We’re a Puppet shop so the first thing I did was to check if there are modules for AWS Linux. the closest one that I’ve found is for RHEL:

Close but not close enough… but definitely better than nothing 🙂

The beauty of OSS is you can always fork a project and Github is a wonder-tool! So fork I went… I’m already done with CIS Scored guidelines 1.x.x to 3.x.x — a few more to go. Once done, I’m hoping that I can merge this back to master if the original author will allow 🙂

If you’re interested in this project, just drop me a message here:

rsyncd won’t bind problem, determine what pid uses a port

We had a problem with one of our server. It’s rsyncd is not responding anymore. It’s listening to the port but it’s not accepting requests.

Here’s what the log says:

[root@SERVER ~]# tail /var/log/messages
Jun 21 13:19:46 SERVER xinetd[28270]: Swapping defaults
Jun 21 13:19:46 SERVER xinetd[28270]: readjusting service amanda
Jun 21 13:19:46 SERVER xinetd[28270]: bind failed (Address already in use (errno = 98)). service = rsync
Jun 21 13:19:46 SERVER xinetd[28270]: Service rsync failed to start and is deactivated.
Jun 21 13:19:46 SERVER xinetd[28270]: Reconfigured: new=0 old=1 dropped=0 (services)
Jun 21 13:21:34 SERVER xinetd[28270]: Exiting...
Jun 21 13:22:09 SERVER xinetd[32476]: bind failed (Address already in use (errno = 98)). service = rsync
Jun 21 13:22:09 SERVER xinetd[32476]: Service rsync failed to start and is deactivated.
Jun 21 13:22:09 SERVER xinetd[32476]: xinetd Version 2.3.14 started with libwrap loadavg labeled-networking options compiled in.
Jun 21 13:22:09 SERVER xinetd[32476]: Started working: 1 available service

We tried stopping xinetd but there is still a process bound to the 873 port:

[root@SERVER ~]# service xinetd stop
Stopping xinetd: [ OK ]
[root@SERVER ~]# telnet localhost 873
Connected to localhost.localdomain (
Escape character is '^]'.
telnet> quit
Connection closed.

If only we could determine what process is still bound to the 873 port…

Well, there’s an app for that: lsof -i tcp:<port>

[root@SERVER ~]# lsof -i tcp:873
rpc.statd 1963 rpcuser 7u IPv4 4798 TCP *:rsync (LISTEN)
[root@SERVER ~]# kill 1963
[root@SERVER ~]# kill 1963
-bash: kill: (1963) - No such process
[root@SERVER ~]# telnet localhost 873
telnet: connect to address Connection refused
telnet: Unable to connect to remote host: Connection refused

Now that the process is dead, we restarted xinetd…

[root@SERVER ~]# service xinetd start
Starting xinetd: [ OK ]
[root@SERVER ~]# tail /var/log/messages
Jun 21 13:21:34 SERVER xinetd[28270]: Exiting...
Jun 21 13:22:09 SERVER xinetd[32476]: bind failed (Address already in use (errno = 98)). service = rsync
Jun 21 13:22:09 SERVER xinetd[32476]: Service rsync failed to start and is deactivated.
Jun 21 13:22:09 SERVER xinetd[32476]: xinetd Version 2.3.14 started with libwrap loadavg labeled-networking options compiled in.
Jun 21 13:22:09 SERVER xinetd[32476]: Started working: 1 available service
Jun 21 13:23:06 SERVER xinetd[32476]: Exiting...
Jun 21 13:25:18 SERVER rpc.statd[1963]: Caught signal 15, un-registering and exiting.
Jun 21 13:25:18 SERVER portmap[3556]: connect from to unset(status): request from unprivileged port
Jun 21 13:25:31 SERVER xinetd[3912]: xinetd Version 2.3.14 started with libwrap loadavg labeled-networking options compiled in.
Jun 21 13:25:31 SERVER xinetd[3912]: Started working: 2 available services

… and that solves the problem. 🙂


How to setup large partitions (>2TB RAID arrays) in CentOS 6.2 with a Supermicro Blade SBI-7125W-S6

We’re on the process of retiring our non-blade servers to free up space and reduce power usage. This move affects our 1U backups servers so we have to migrate it to blades as well.

I was setting-up a blade server as a replacement for one of our backup servers when I encountered a problem…

But before I get into that, here’s the specs of the blade:

  • Supermicro Blade SBI-7125W-S6 (circa 2008)
  • Intel Xeon E5405
  • 8 GB DDR2
  • LSI RAID 1078
  • 6 x 750 GB Seagate Momentus XT (ST750LX003)

The original plan was to set-up these drives as a RAID 5 array, about 3.5+ TB in size. The RAID controller can handle the size. So Rich, my colleague who did the initial setup of  the blade & the hard drives, did not encounter a problem.

I was cruising through the remote installation process until I hit a snag in the disk partitioning stage. The installer won’t use the entire space of the RAID array. It will only create partition(s) as long as the total size is 2TB.

I find it unusual because I’ve created bigger arrays before using software RAID and this problem did not manifest. After a little googling I found out that it has something to do with the limitations of Master Boot Record (or MBR). The solution is to use the GUID partition table (or GPT) as advised by this discussion.

I have two options at this point,

  1. go as originally planned, use GPT, and hope that the SBI-7125W-S6 can boot with it, or…
  2. create 2 arrays, one small (that will use MBR so the server can boot) and one large (that will use GPT  so that the disk space can be used in its entirety)

I tried option #1, it failed. The blade won’t boot at all. Primarily because the server has a BIOS, not an EFI.

And so I’m left with option #2…

The server has six drives. To implement option #2, my plan was to create this setup:

  • 2 drives at RAID 1 – will become /dev/sda, MBR, 750GB, main system drive (/)
  • 4 drives at RAID5 – will become /dev/sdb, GPT, 2.x+TB, will be mounted later

The LSI RAID 1078 can support this kind of setup, so I’m in luck. I decided to use RAID 1 & RAID 5 because redundancy is the primary concern, size is secondary.

This is where IPMI shines, I can reconfigure the RAID array remotely using the KVM console of IPMIView like I’m physically there at the data center 🙂 With the KVM access, I created 2 disk groups using the Web BIOS of the RAID controller.

Now that the arrays are up, I went through the CentOS 6 installation process again. The installer detected the 2 arrays, so no problem there. I configured /dev/sda with 3 partitions and  left /dev/sdb unconfigured (it can be configured easily later once CentOS is up).

In case you’re wondering, I added a 3.8GB LVM PV since this server will become a node of our ganeti cluster, to store VM snapshots.

The CentOS installation booted successfully this time. Now that the system’s working, it’s time to configure /dev/sdb.

I installed the EPEL repo first, then parted:

$ wget -c 
$ wget -c 
$ rpm -Uvh epel-release-6-5.noarch.rpm 
$ rpm --import 0608B895.txt 
$ yum install parted

Then, I configured /dev/sdb to use GPT, then formatted the whole partition as ext4:

$ parted /dev/sdb mklabel gpt 
$ parted /dev/sdb 
(parted) mkpart primary ext4 1 -1 
(parted) quit 
$ mkfs.ext4 -L data /dev/sdb

To mount the /dev/sdb, I need to find out its UUID first:

$ ls -lh /dev/disk/by-uuid/ | grep sdb 
lrwxrwxrwx. 1 root root 9 May 12 15:07 858844c3-6fd8-47e9-90a4-0d10c0914eb5 -> ../../sdb

Once I have the right UUID, I added this line in /etc/fstab. /dev/sdb will be mounted in /home/backup/log-dump/:

UUID=858844c3-6fd8-47e9-90a4-0d10c0914eb5 /home/backup/log-dump ext4 noatime,defaults 1 1

The partition is now ready to be mounted and used:

$ useradd backup
$ mkdir -p /home/backup/log-dump
$ mount /home/backup/log-dump
$ chown backup.backup -R /home/backup/log-dump

There, another problem solved. Thanks to the internet and the Linux community 🙂

After a few of days of copying files to this new array, this is what it looks like now:

/dev/sdb is almost used up already 🙂


Ubuntu 10.04 amd64 on Lenovo Thinkpad E125, making the LAN, Wifi, Video and Sound to work

UPDATE: I gave the official release of Ubuntu 12.04 LTS another try and everything worked out of the box! Nice!!!

So I guess I’ll have to give Unity another chance… (so far, I find the HUD useful)

After almost 2 years since my laptop died on me, I decided to buy a replacement. I proposed the idea to my wife, June, and she approved (maybe because I’ve been using her laptop for the past 2 years 🙂 ).

I’m targeting a >= 12″ inch netbook since I’ve learned in the last 2 years that I don’t need that much processing power . My usage pattern can settle for an Atom or Brazos CPU since I’m using laptops mostly as a terminal, the grunt work are done in servers. Besides,  I don’t want to haul a 2kg+ brick.

There’s a plethora of netbooks from different OEMs nowadays so there is a lot to choose from. I narrowed my list to these two: Lenovo Thinkpad Edge E125 or HP DM1. It was a tough choice to make. But after scouring a few stores (in Cyberzone MegaMall) and weighing my options, I settled with the E125.

I chose E125 because of these reasons:

  • Keyboard is better, IMHO
  • 2 DIMM slots (I’m planning to upgrade it to 8GB in the future)
  • no OS pre-installed

I tried installing Ubuntu 11.10 and Ubuntu 12.04 beta but these 2 are not that stable for my needs when I tested it. My SBW Huawei dongle is experiencing intermittent connections for one and I’m not convinced to switch to Unity yet.

This is the rundown of how I found device drivers for the Lenovo Thinkpad Edge E125.

core packages:

sudo apt-get install build-essential linux-image-generic linux-headers-genericcdbs fakeroot dh-make debhelper debconf libstdc++6 dkms libqtgui4 wget execstack libelfg0 ia32-libs


Download the driver from the Qualcomm website (direct link).

mkdir -p ~/drivers/lan-atheros && cd ~/drivers/lan-atheros
mv ~/Downloads/alx-linux-v2.0.0.6.rar ./
sudo make && sudo make install
sudo modprobe alx


sudo add-apt-repository ppa:lexical/hwe-wireless
sudo apt-get update
sudo apt-get install rtl8192ce-dkms
sudo modprobe r8192ce_pci


I encountered a problem with the sound configuration. Sounds are not playing in headphones if you plug one. It will just continue playing in the laptop speaker instead. I was able to fix it by upgrading ALSA to version 1.0.25, just use this guide on how to do it (just replace 1.0.23 with 1.0.25).


I was able to install the latest ATI Catalyst drivers by following this guide. The installation was successful when I installed the driver manually.

card reader:

Download the driver from the Realtek website. Make sure that you switched to superuser (not sudo) when running make, it will fail if you don’t.

mkdir ~/drivers/cardreader-realtek/ && cd ~/drivers/cardreader-realtek/
mv ~/Downloads/rts_pstor.tar.bz2 ./
tar -xjvf rts_pstor.tar.bz2
cd rts_pstor
sudo su
make install

additional packages:

sudo apt-get install vim-gtk ubuntu-restricted-extras pidgin-otr pidgin-libnotify openssh-server subversion rapidsvn


How to configure a virtualized Munin server to monitor 100+ servers in CentOS/RHEL

We use Munin primarily to gather historical data. The data  in turn is used for capacity planning, (e.g. server upgrades). The graphs are a good tool also to determine unusual server behavior (eg. spikes in memory, cpu usage, etc. ). We use it also as indicators or pointers to what caused a server crash.

Since we consolidated our servers and migrated it to virtualized ones, our Munin server was also affected. When we virtualized our Munin server, the first few days was a disaster. It simply can’t handle the load because the disk I/O required is too great!

To determine what part we can tweak to improve performance, it’s important to take a look how Munin generates those lovely graphs first. The Munin server process has four steps:

  1. munin-update -> updates the RRD files, if you have a lot of nodes, the disk I/O will be hammered!
  2. munin-limits
  3. munin-graph -> generates graphs out of the RDD files, multiple CPU cores is a must!
  4. munin-html
We only need to tweak steps #1 and #3 to increase its performance. But before I go with the details, here’s the specs of our Munin server:
  • OS: CentOS 6.2 x86_64
  • CPU: 4 cores
  • RAM: 3.5GB
  • HDD: 10GB
  • Munin: version 1.4.6

Note: Add the EPEL repository to install Munin 1.4.6 using yum.

Yup. I need that much RAM to address #1. Since it’s way cheaper to buy more memory than buying an SSD or an array of 10k/15k RPM drives, I used tmpfs to solve the disk I/O problem. This will make all RRD updates done in memory. This is not a new idea, this approach has been used for years already.

I added these lines in /etc/fstab:

# tmpfs for munin files
/var/lib/munin /var/lib/munin tmpfs size=1280M,nr_inodes=1m,mode=775,uid=munin,gid=munin,noatime 0 0
/var/www/munin /var/www/munin tmpfs size=768M,nr_inodes=1m,mode=775,uid=munin,gid=munin,noatime 0 0

 And this is how it looks like in production once mounted and in use:

[root@munin ~]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/vda1 9.6G 6.3G 3.0G 69% /
tmpfs 1.8G 0 1.8G 0% /dev/shm
/var/lib/munin 1.3G 937M 344M 74% /var/lib/munin
/var/www/munin 768M 510M 259M 67% /var/www/munin

Since all RRD files are now stored in RAM, these files will simply disappear to oblivion if the server was rebooted for any reason. To compensate, I added these maintenance scripts in root’s cron:

[root@munin ~]# crontab -l
# create RRD files backup
*/15 * * * * mkdir -p $HOME/munin-files/munin-lib/ &&  rsync --archive /var/lib/munin/* $HOME/munin-files/munin-lib/ > /dev/null 2>&1

# restore RRD files at reboot
@reboot mkdir -p /var/www/munin/ /var/lib/munin/ && chown -R munin.munin /var/www/munin/ /var/lib/munin/ && cp -a -r $HOME/munin-files/munin-lib/* /var/lib/munin/

# cleanup: remove inactive rrd and png files
@daily find /var/lib/munin/ -type f -mtime +7 -name '*.rrd' | xargs rm -f
@daily find $HOME/munin-files/munin-lib/ -type f -mtime +7 -name '*.rrd' | xargs rm -f
@daily find /var/www/munin/ -type f -mtime +7 -name '*.png' | xargs rm -f

What it does are:

  1. creates a backup of the RRD files every 15 minutes
  2. restores the RRD files from #1 in case the server was rebooted/crashed
  3. deletes inactive RRD and PNG (graphs) files to reduce tmpfs usage
As of date, our Munin server is currently monitoring 131 servers which equates to 18,000+ RRD files, and disk I/O is not an issue during munin-update, thanks to tmpfs.

[root@munin ~]# pcregrep '^\s*\[' /etc/munin/munin.conf | wc -l
[root@munin ~]# find /var/lib/munin/ -type f -name '*.rrd' | wc -l

This is the typical cpu usage of our munin server for a day, iowait is neglible.

As for #3, the munin-graph step, this simply requires pure brute CPU computation power, multiple cores and some configuration tweaks. As reflected in the CPU graph above, I allotted 4 cores for our Munin server and about 75% of that is constantly in use. The KVM hypervisor of our Munin server has a Xeon E5504, not really the best there is but it gets the job done.

Since I allotted 4 cores for the Munin server VM, I set max_graph_jobs to 4:

[root@munin ~]# grep max_graph_jobs /etc/munin/munin.conf
# max_graph_jobs.
max_graph_jobs 4

Note: munin-graph was one process only in older versions of Munin. I recommend you use the 1.4.6 version.

Test your configurations, see how it behaves. You have to calibrate this value depending on what your CPU is and how many core it has (e.g if you have a Xeon X56xx, 4 cores may be an overkill).

This graph contains enough information to check what steps of the munin server you need to tweak…

As reflected in the graph above the munin-graph took about 200 secs maximum to finish. If this value goes beyond 300 (Munin’s master process runs every 5 minutes) , I may have to add a core and change max_graph_jobs to 5, or move the VM to a better hypervisor, else the graphs will be 5+ mins late or filled with gaps.

That’s it. This is how I managed our Munin server to monitor 100+ servers. Of course this only applies to Munin 1.4.x, I read that Munin 2.0 will be a lot different. Hopefully, Munin 2.0 can support hundreds of nodes out of the box, no tweaking needed… I guess we’ll see… 🙂