Category Archives: linux

[Puppet] Hardening AWS Linux 2014.09 based on CIS benchmark

Update [2015-07-07]: Puppet module is practically done for hardening AWS Linux 2014.09, you can check it out here: https://github.com/proletaryo/cis-puppet

It’s been almost a year since I posted here. Work is very challenging nowadays…

The latest project that I’m part of is now dealing with financial services. Yup, this means a lot of security exercises that need to be done to comply with PCI-DSS (Payment Card Industry Data Security Standards). I find these exercises challenging, a new lens that let’s you understand a lot of things and even makes you paranoid sometimes. IT Security is core – I learned a lot in this area for the past few months.

Anyway, right now I’m working with OS hardening based on the benchmark provided by Center for Internet Security. They provide guidelines on how to do this. Just download the document for your OS here: https://benchmarks.cisecurity.org/downloads/multiform/index.cfm

I’m working mostly in AWS nowadays – It’s a good thing that CIS released a benchmark for AWS Linux 2014.09 version.

We’re a Puppet shop so the first thing I did was to check if there are modules for AWS Linux. the closest one that I’ve found is for RHEL: https://github.com/arildjensen/cis-puppet

Close but not close enough… but definitely better than nothing 🙂

The beauty of OSS is you can always fork a project and Github is a wonder-tool! So fork I went… I’m already done with CIS Scored guidelines 1.x.x to 3.x.x — a few more to go. Once done, I’m hoping that I can merge this back to master if the original author will allow 🙂

If you’re interested in this project, just drop me a message here: https://github.com/proletaryo/cis-puppet

rsyncd won’t bind problem, determine what pid uses a port

We had a problem with one of our server. It’s rsyncd is not responding anymore. It’s listening to the port but it’s not accepting requests.

Here’s what the log says:

[root@SERVER ~]# tail /var/log/messages
Jun 21 13:19:46 SERVER xinetd[28270]: Swapping defaults
Jun 21 13:19:46 SERVER xinetd[28270]: readjusting service amanda
Jun 21 13:19:46 SERVER xinetd[28270]: bind failed (Address already in use (errno = 98)). service = rsync
Jun 21 13:19:46 SERVER xinetd[28270]: Service rsync failed to start and is deactivated.
Jun 21 13:19:46 SERVER xinetd[28270]: Reconfigured: new=0 old=1 dropped=0 (services)
Jun 21 13:21:34 SERVER xinetd[28270]: Exiting...
Jun 21 13:22:09 SERVER xinetd[32476]: bind failed (Address already in use (errno = 98)). service = rsync
Jun 21 13:22:09 SERVER xinetd[32476]: Service rsync failed to start and is deactivated.
Jun 21 13:22:09 SERVER xinetd[32476]: xinetd Version 2.3.14 started with libwrap loadavg labeled-networking options compiled in.
Jun 21 13:22:09 SERVER xinetd[32476]: Started working: 1 available service

We tried stopping xinetd but there is still a process bound to the 873 port:

[root@SERVER ~]# service xinetd stop
Stopping xinetd: [ OK ]
[root@SERVER ~]# telnet localhost 873
Trying 127.0.0.1...
Connected to localhost.localdomain (127.0.0.1).
Escape character is '^]'.
^]
telnet> quit
Connection closed.

If only we could determine what process is still bound to the 873 port…

Well, there’s an app for that: lsof -i tcp:<port>

[root@SERVER ~]# lsof -i tcp:873
COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME
rpc.statd 1963 rpcuser 7u IPv4 4798 TCP *:rsync (LISTEN)
[root@SERVER ~]# kill 1963
[root@SERVER ~]# kill 1963
-bash: kill: (1963) - No such process
[root@SERVER ~]# telnet localhost 873
Trying 127.0.0.1...
telnet: connect to address 127.0.0.1: Connection refused
telnet: Unable to connect to remote host: Connection refused

Now that the process is dead, we restarted xinetd…

[root@SERVER ~]# service xinetd start
Starting xinetd: [ OK ]
[root@SERVER ~]# tail /var/log/messages
Jun 21 13:21:34 SERVER xinetd[28270]: Exiting...
Jun 21 13:22:09 SERVER xinetd[32476]: bind failed (Address already in use (errno = 98)). service = rsync
Jun 21 13:22:09 SERVER xinetd[32476]: Service rsync failed to start and is deactivated.
Jun 21 13:22:09 SERVER xinetd[32476]: xinetd Version 2.3.14 started with libwrap loadavg labeled-networking options compiled in.
Jun 21 13:22:09 SERVER xinetd[32476]: Started working: 1 available service
Jun 21 13:23:06 SERVER xinetd[32476]: Exiting...
Jun 21 13:25:18 SERVER rpc.statd[1963]: Caught signal 15, un-registering and exiting.
Jun 21 13:25:18 SERVER portmap[3556]: connect from 127.0.0.1 to unset(status): request from unprivileged port
Jun 21 13:25:31 SERVER xinetd[3912]: xinetd Version 2.3.14 started with libwrap loadavg labeled-networking options compiled in.
Jun 21 13:25:31 SERVER xinetd[3912]: Started working: 2 available services

… and that solves the problem. 🙂

References:

How to setup large partitions (>2TB RAID arrays) in CentOS 6.2 with a Supermicro Blade SBI-7125W-S6

We’re on the process of retiring our non-blade servers to free up space and reduce power usage. This move affects our 1U backups servers so we have to migrate it to blades as well.

I was setting-up a blade server as a replacement for one of our backup servers when I encountered a problem…

But before I get into that, here’s the specs of the blade:

  • Supermicro Blade SBI-7125W-S6 (circa 2008)
  • Intel Xeon E5405
  • 8 GB DDR2
  • LSI RAID 1078
  • 6 x 750 GB Seagate Momentus XT (ST750LX003)

The original plan was to set-up these drives as a RAID 5 array, about 3.5+ TB in size. The RAID controller can handle the size. So Rich, my colleague who did the initial setup of  the blade & the hard drives, did not encounter a problem.

I was cruising through the remote installation process until I hit a snag in the disk partitioning stage. The installer won’t use the entire space of the RAID array. It will only create partition(s) as long as the total size is 2TB.

I find it unusual because I’ve created bigger arrays before using software RAID and this problem did not manifest. After a little googling I found out that it has something to do with the limitations of Master Boot Record (or MBR). The solution is to use the GUID partition table (or GPT) as advised by this discussion.

I have two options at this point,

  1. go as originally planned, use GPT, and hope that the SBI-7125W-S6 can boot with it, or…
  2. create 2 arrays, one small (that will use MBR so the server can boot) and one large (that will use GPT  so that the disk space can be used in its entirety)

I tried option #1, it failed. The blade won’t boot at all. Primarily because the server has a BIOS, not an EFI.

And so I’m left with option #2…

The server has six drives. To implement option #2, my plan was to create this setup:

  • 2 drives at RAID 1 – will become /dev/sda, MBR, 750GB, main system drive (/)
  • 4 drives at RAID5 – will become /dev/sdb, GPT, 2.x+TB, will be mounted later

The LSI RAID 1078 can support this kind of setup, so I’m in luck. I decided to use RAID 1 & RAID 5 because redundancy is the primary concern, size is secondary.

This is where IPMI shines, I can reconfigure the RAID array remotely using the KVM console of IPMIView like I’m physically there at the data center 🙂 With the KVM access, I created 2 disk groups using the Web BIOS of the RAID controller.

Now that the arrays are up, I went through the CentOS 6 installation process again. The installer detected the 2 arrays, so no problem there. I configured /dev/sda with 3 partitions and  left /dev/sdb unconfigured (it can be configured easily later once CentOS is up).

In case you’re wondering, I added a 3.8GB LVM PV since this server will become a node of our ganeti cluster, to store VM snapshots.

The CentOS installation booted successfully this time. Now that the system’s working, it’s time to configure /dev/sdb.

I installed the EPEL repo first, then parted:

$ wget -c http://download.fedoraproject.org/pub/epel/6/i386/epel-release-6-6.noarch.rpm 
$ wget -c https://fedoraproject.org/static/0608B895.txt 
$ rpm -Uvh epel-release-6-5.noarch.rpm 
$ rpm --import 0608B895.txt 
$ yum install parted

Then, I configured /dev/sdb to use GPT, then formatted the whole partition as ext4:

$ parted /dev/sdb mklabel gpt 
$ parted /dev/sdb 
(parted) mkpart primary ext4 1 -1 
(parted) quit 
$ mkfs.ext4 -L data /dev/sdb

To mount the /dev/sdb, I need to find out its UUID first:

$ ls -lh /dev/disk/by-uuid/ | grep sdb 
lrwxrwxrwx. 1 root root 9 May 12 15:07 858844c3-6fd8-47e9-90a4-0d10c0914eb5 -> ../../sdb

Once I have the right UUID, I added this line in /etc/fstab. /dev/sdb will be mounted in /home/backup/log-dump/:

UUID=858844c3-6fd8-47e9-90a4-0d10c0914eb5 /home/backup/log-dump ext4 noatime,defaults 1 1

The partition is now ready to be mounted and used:

$ useradd backup
$ mkdir -p /home/backup/log-dump
$ mount /home/backup/log-dump
$ chown backup.backup -R /home/backup/log-dump

There, another problem solved. Thanks to the internet and the Linux community 🙂

After a few of days of copying files to this new array, this is what it looks like now:

/dev/sdb is almost used up already 🙂

References:

Ubuntu 10.04 amd64 on Lenovo Thinkpad E125, making the LAN, Wifi, Video and Sound to work

UPDATE: I gave the official release of Ubuntu 12.04 LTS another try and everything worked out of the box! Nice!!!

So I guess I’ll have to give Unity another chance… (so far, I find the HUD useful)

After almost 2 years since my laptop died on me, I decided to buy a replacement. I proposed the idea to my wife, June, and she approved (maybe because I’ve been using her laptop for the past 2 years 🙂 ).

I’m targeting a >= 12″ inch netbook since I’ve learned in the last 2 years that I don’t need that much processing power . My usage pattern can settle for an Atom or Brazos CPU since I’m using laptops mostly as a terminal, the grunt work are done in servers. Besides,  I don’t want to haul a 2kg+ brick.

There’s a plethora of netbooks from different OEMs nowadays so there is a lot to choose from. I narrowed my list to these two: Lenovo Thinkpad Edge E125 or HP DM1. It was a tough choice to make. But after scouring a few stores (in Cyberzone MegaMall) and weighing my options, I settled with the E125.

I chose E125 because of these reasons:

  • Keyboard is better, IMHO
  • 2 DIMM slots (I’m planning to upgrade it to 8GB in the future)
  • no OS pre-installed

I tried installing Ubuntu 11.10 and Ubuntu 12.04 beta but these 2 are not that stable for my needs when I tested it. My SBW Huawei dongle is experiencing intermittent connections for one and I’m not convinced to switch to Unity yet.

This is the rundown of how I found device drivers for the Lenovo Thinkpad Edge E125.

core packages:

sudo apt-get install build-essential linux-image-generic linux-headers-genericcdbs fakeroot dh-make debhelper debconf libstdc++6 dkms libqtgui4 wget execstack libelfg0 ia32-libs

lan:

Download the driver from the Qualcomm website (direct link).

mkdir -p ~/drivers/lan-atheros && cd ~/drivers/lan-atheros
mv ~/Downloads/alx-linux-v2.0.0.6.rar ./
sudo make && sudo make install
sudo modprobe alx

wifi:

sudo add-apt-repository ppa:lexical/hwe-wireless
sudo apt-get update
sudo apt-get install rtl8192ce-dkms
sudo modprobe r8192ce_pci

sound:

I encountered a problem with the sound configuration. Sounds are not playing in headphones if you plug one. It will just continue playing in the laptop speaker instead. I was able to fix it by upgrading ALSA to version 1.0.25, just use this guide on how to do it (just replace 1.0.23 with 1.0.25).

video:

I was able to install the latest ATI Catalyst drivers by following this guide. The installation was successful when I installed the driver manually.

card reader:

Download the driver from the Realtek website. Make sure that you switched to superuser (not sudo) when running make, it will fail if you don’t.

mkdir ~/drivers/cardreader-realtek/ && cd ~/drivers/cardreader-realtek/
mv ~/Downloads/rts_pstor.tar.bz2 ./
tar -xjvf rts_pstor.tar.bz2
cd rts_pstor
sudo su
make
make install
depmod
quit

additional packages:

sudo apt-get install vim-gtk ubuntu-restricted-extras pidgin-otr pidgin-libnotify openssh-server subversion rapidsvn

references:

How to configure a virtualized Munin server to monitor 100+ servers in CentOS/RHEL

We use Munin primarily to gather historical data. The data  in turn is used for capacity planning, (e.g. server upgrades). The graphs are a good tool also to determine unusual server behavior (eg. spikes in memory, cpu usage, etc. ). We use it also as indicators or pointers to what caused a server crash.

Since we consolidated our servers and migrated it to virtualized ones, our Munin server was also affected. When we virtualized our Munin server, the first few days was a disaster. It simply can’t handle the load because the disk I/O required is too great!

To determine what part we can tweak to improve performance, it’s important to take a look how Munin generates those lovely graphs first. The Munin server process has four steps:

  1. munin-update -> updates the RRD files, if you have a lot of nodes, the disk I/O will be hammered!
  2. munin-limits
  3. munin-graph -> generates graphs out of the RDD files, multiple CPU cores is a must!
  4. munin-html
We only need to tweak steps #1 and #3 to increase its performance. But before I go with the details, here’s the specs of our Munin server:
  • OS: CentOS 6.2 x86_64
  • CPU: 4 cores
  • RAM: 3.5GB
  • HDD: 10GB
  • Munin: version 1.4.6

Note: Add the EPEL repository to install Munin 1.4.6 using yum.

Yup. I need that much RAM to address #1. Since it’s way cheaper to buy more memory than buying an SSD or an array of 10k/15k RPM drives, I used tmpfs to solve the disk I/O problem. This will make all RRD updates done in memory. This is not a new idea, this approach has been used for years already.

I added these lines in /etc/fstab:

# tmpfs for munin files
/var/lib/munin /var/lib/munin tmpfs size=1280M,nr_inodes=1m,mode=775,uid=munin,gid=munin,noatime 0 0
/var/www/munin /var/www/munin tmpfs size=768M,nr_inodes=1m,mode=775,uid=munin,gid=munin,noatime 0 0

 And this is how it looks like in production once mounted and in use:

[root@munin ~]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/vda1 9.6G 6.3G 3.0G 69% /
tmpfs 1.8G 0 1.8G 0% /dev/shm
/var/lib/munin 1.3G 937M 344M 74% /var/lib/munin
/var/www/munin 768M 510M 259M 67% /var/www/munin

Since all RRD files are now stored in RAM, these files will simply disappear to oblivion if the server was rebooted for any reason. To compensate, I added these maintenance scripts in root’s cron:

[root@munin ~]# crontab -l
# create RRD files backup
*/15 * * * * mkdir -p $HOME/munin-files/munin-lib/ &&  rsync --archive /var/lib/munin/* $HOME/munin-files/munin-lib/ > /dev/null 2>&1

# restore RRD files at reboot
@reboot mkdir -p /var/www/munin/ /var/lib/munin/ && chown -R munin.munin /var/www/munin/ /var/lib/munin/ && cp -a -r $HOME/munin-files/munin-lib/* /var/lib/munin/

# cleanup: remove inactive rrd and png files
@daily find /var/lib/munin/ -type f -mtime +7 -name '*.rrd' | xargs rm -f
@daily find $HOME/munin-files/munin-lib/ -type f -mtime +7 -name '*.rrd' | xargs rm -f
@daily find /var/www/munin/ -type f -mtime +7 -name '*.png' | xargs rm -f

What it does are:

  1. creates a backup of the RRD files every 15 minutes
  2. restores the RRD files from #1 in case the server was rebooted/crashed
  3. deletes inactive RRD and PNG (graphs) files to reduce tmpfs usage
As of date, our Munin server is currently monitoring 131 servers which equates to 18,000+ RRD files, and disk I/O is not an issue during munin-update, thanks to tmpfs.

[root@munin ~]# pcregrep '^\s*\[' /etc/munin/munin.conf | wc -l
131
[root@munin ~]# find /var/lib/munin/ -type f -name '*.rrd' | wc -l
18635

This is the typical cpu usage of our munin server for a day, iowait is neglible.

As for #3, the munin-graph step, this simply requires pure brute CPU computation power, multiple cores and some configuration tweaks. As reflected in the CPU graph above, I allotted 4 cores for our Munin server and about 75% of that is constantly in use. The KVM hypervisor of our Munin server has a Xeon E5504, not really the best there is but it gets the job done.

Since I allotted 4 cores for the Munin server VM, I set max_graph_jobs to 4:

[root@munin ~]# grep max_graph_jobs /etc/munin/munin.conf
# max_graph_jobs.
max_graph_jobs 4

Note: munin-graph was one process only in older versions of Munin. I recommend you use the 1.4.6 version.

Test your configurations, see how it behaves. You have to calibrate this value depending on what your CPU is and how many core it has (e.g if you have a Xeon X56xx, 4 cores may be an overkill).

This graph contains enough information to check what steps of the munin server you need to tweak…

As reflected in the graph above the munin-graph took about 200 secs maximum to finish. If this value goes beyond 300 (Munin’s master process runs every 5 minutes) , I may have to add a core and change max_graph_jobs to 5, or move the VM to a better hypervisor, else the graphs will be 5+ mins late or filled with gaps.

That’s it. This is how I managed our Munin server to monitor 100+ servers. Of course this only applies to Munin 1.4.x, I read that Munin 2.0 will be a lot different. Hopefully, Munin 2.0 can support hundreds of nodes out of the box, no tweaking needed… I guess we’ll see… 🙂

ganeti and KVM Virtualization

We’ve been using KVM Virtualization for almost 2 years now and we’re happy with it. But as the number of hypervisors & VM instances increases, so is the complexity of server management which can be frustrating at times.

I realized that we have to find a way to manage it somehow.  I’ve been scouring the net for possible solutions. I’ve read about OpenStack & Eucalyptus but the disparity of deploying VM instances against our current deployment is big that migrating one will be difficult.

I have 6 requirements for the target platform:

  1. cost
  2. centralized management
  3. learning curve / ease of deployment
  4. migration constraints (lesser, the better)
  5. performance / high availability
  6. community support

My boss forwarded me this blog about ganeti a few months ago. I was skeptical to try it at first because deployment was debian-centric. We’re using CentOS so that could be a problem. But after reading the documentation + mailing-lists, I realized that migrating to ganeti will be less painful than other solutions (in theory), so I decided to install a test cluster and ran it for a few weeks.

Testing phase is over and ganeti is promising (drbd + live migration rocks!). Our current cluster has 5 nodes but that will surely change as we go into full production 🙂

applications and browser add-ons/extensions that I really find useful

Here’s my list:

1. vi / vim / gvim
This is my editor of choice. Knowing the keyboard shortcuts is a must. The learning curve is a bit steep but it’s all worth it.

This is the my current vim settings, ~/.vimrc :

set nu
set nowrap
set tabstop=4
set shiftwidth=4
set expandtab
set wmw=0
set guioptions=mic
set autoindent
set statusline=%F%m%r%h%w\ [FORMAT=%{&ff}]\ [TYPE=%Y]\ [ASCII=\%03.3b]\ [HEX=\%02.2B]\ [POS=%04l,%04v][%p%%]\ [LEN=%L]
set laststatus=2
set guifont=Bitstream\ Vera\ Sans\ Mono\ 12
set backupdir=~/.vimbackup,/tmp
syntax enable
colorscheme desert

" open tabbed left/right
nmap <C-h> :tabprev<CR>
nmap <C-l> :tabnext<CR>

" move the current line up/down
nmap <C-j>  :m+<CR>==
nmap <C-k> :m-2<CR>==

" move the selected block up/down
vmap <C-j>  :m'>+<CR>gv=gv
vmap <C-k> :m'<-2<CR>gv=gv

" indent/unindent selected
vmap <Tab> >gv
vmap <S-Tab> <gv

2. gnome-terminal
A good terminal is a good friend if you’re a Linux admin/developer. Yeah, I know some server settings can be configured with GUI tools, especially if you’re using a RHEL-based distribution,  but if most (if not all) of your Linux servers are headless and you’re in a cramped space and you have to access it remotely, your terminal is still your best friend.

I’m using these keyboard shortcuts patterned from vim 🙂

3. rapidsvn
This is my preferred GUI svn client. It’s simple and it works, and that’s all I need.

4. meld
This a diff viewer for GNOME. It can browse SVN working copies as well.

5. tomboy notes
This one is a life saver. It’s built for saving small tidbits of information and I don’t have to worry if I press save (it doesn’t have one). I find the features search and synchronization to Ubuntu One very useful. I seldom use the “note linking” feature though.

6. gnome-dictionary (with offline enabled)
Having a handy-dandy dictionary while you’re reading some articles is always a good thing. You’ll never know when you’ll encounter those pesky new words. By default, gnome-dictionary needs an internet connection but you can an install an offline dictionary as well. Here’s a good guide that I followed to make it offline.

7. pidgin
This was the default IM client of Ubuntu before it was replaced by Empathy. One good reason why I can’t let go of Pidgin is because of its Off-the-Record plugin (pidgin-otr), basically, encryption support. The last time I checked, Empathy is not interested in OTR, so I’m not interested in switching in the foreseeable future either.

8. browsers & add-ons/extensions
I interchangeably use Firefox and Chrome. I was planning to switch to Chrome completely but it doesn’t support the Offline feature of Gmail.

These are the add-ons or extensions that I use…

Firefox:
downthemall!, this is a nice download manager
gmarks, this is how I keep my bookmarks in-sync with Chrome, its keyboard shortcuts are a gem: ctrl-d to add, tap Home twice to search
flashblock, nice tool to block annoying, huge flash applications/ads

Google Chrome:
chrome bird, nice twitter client with built-in support for URL shorteners
google bookmarks, extension to access my Google bookmarks. But unlike its Firefox counterpart it doesn’t have keyboard shortcuts… 😦
flashblock, same as the firefox add-on

9. Ubuntu One
Sync files. That’s it. I sync my documents to Ubuntu One so I don’t have to bring my work laptop every time I travel. All I need to do is set-up June’s netbook to access my Ubuntu One account and I’m ready to access my documents in case I need to.

By the way, this only works in Ubuntu and the first 2Gb free 🙂

10. Wammu
This application is very helpful if you need to send/receive SMS in your Ubuntu machine. See my previous post for more details.

11. gnucash
This is an accounting software. It’s a big help if you need to keep track of your finances (I think we all need to). Learning curve can be steep if you don’t have a background in simple accounting (debit/credit). The gnucash’s built-in help can assist if you find it difficult.

12. frogr
If you have a flickr account and you need to upload hundreds of photos, this is a good tool you can use in Linux. I’ve tried other tools but I’ve settled with this one because it can handle intermittent connections better. It’s not in the Ubuntu repository yet but you can download frogr here.

🙂