Troubleshooting Mail : Basic Tips

In a shared or managed hosting environment, many people would agree that one of the most commonly reported issue with our helpdesk is email.

It has become a best practice to perform basic email troubleshooting , depending on the error / bounce back reported. These steps could be performed before or after the issue has been fixed — either to confirm it is working or to reproduce a particular problem.

The purpose of this document is to describe how mail gets from one users email client to another users mailbox. Each action that takes place can be tested and verified by an some level of admin with a few exceptions. After going through this document you should understand what takes place when a user sends email or has email sent to them.

Test SMTP Authentication

SMTP users BASE64 encoding to transmit usernames and passwords. You will need to take the username of the client ( ) and the password and convert them to BASE64. Links provided below to do this:


telnet 25
helo mailserver
auth login
BASE64 Encoding of
BASe64 Encoding of PASSWORD

Test Sucessful if: 235 Authentication successful

Testing if user is local or remote

Our mailserver will only accept email for a remote host if you authenticate first. Without authentication it will only accept email for local domains.

telnet 25
helo mailserver
mail from:
rcpt to:

Local Domain if: 250 ok
Remote Domain if: 553 sorry, that domain isn’t in my list of allowed rcpthosts

Looking up an MX record

Two or three lookups should be done off of various DNS servers., a public recursive NS server (, and the NS server specified in the WHOIS of the domain. Ideally these will all match, if they don’t then connections to the incorrect mailserver may be made.


set type=mx

Testing to see if remote server will accept a message

Similar to the test we use to see if our mail server will take the message as a local delivery you can do the same test on the remote server to see if it will take the message.

telnet 25
helo mailserver
mail from:

Test Sucessful if: 250 ok

Send an entire message to a mailserver

telnet 25
helo mailserver
auth login
BASE64 Encoding of USERNAME
BASe64 Encoding of PASSWORD
mail from:
rcpt to:
To:     StarDot Sales
Subject: This is a test email
Dear SDH,
This is the body of your email.

Test Sucessful if: 250 ok

Check to see if a remote server is in an RBL

You will need to find out what the IP address of the machine that is actually sending the email out to the internet. There are many tools out there to check you rdomain / IP address againts many RBL lists :


RBL’s Checked Against:


SpamAssassin Troubleshooting

Via the webmail (Squirrel Mail) place the address that is trying to email the customer in the whitelist. When the email arrives check the email headers which will show any SpamAssassin checks that applied points to the message. Details on the SA tests and how to check email headers is available on the links below:

* Headers:
* SA Tests:

Testing for weak SSL ciphers for security audits

During security audits, such as a PCI-DSS compliance audit, it is very commonplace to test the cipher mechanism that a website / server uses and supports to ensure that weak / outdated cipher methods are not used.

Weak ciphers allow for an increased risk in encryption compromise, man-in-the-middle attacks and other related attack vectors.

Due to historic export restrictions of high grade cryptography, legacy and new web servers are often able and configured to handle weak cryptographic options.

Even if high grade ciphers are normally used and installed, some server misconfiguration could be used to force the use of a weaker cipher to gain access to the supposed secure communication channel.

Testing SSL / TLS cipher specifications and requirements for site

The http clear-text protocol is normally secured via an SSL or TLS tunnel, resulting in https traffic. In addition to providing encryption of data in transit, https allows the identification of servers (and, optionally, of clients) by means of digital certificates.

Historically, there have been limitations set in place by the U.S. government to allow cryptosystems to be exported only for key sizes of, at most, 40 bits, a key length which could be broken and would allow the decryption of communications. Since then, cryptographic export regulations have been relaxed (though some constraints still hold); however, it is important to check the SSL configuration being used to avoid putting in place cryptographic support which could be easily defeated. SSL-based services should not offer the possibility to choose weak ciphers.

Testing for weak ciphers : examples

In order to detect possible support of weak ciphers, the ports associated to SSL/TLS wrapped services must be identified. These typically include port 443, which is the standard https port; however, this may change because a) https services may be configured to run on non-standard ports, and b) there may be additional SSL/TLS wrapped services related to the web application. In general, a service discovery is required to identify such ports.

The nmap scanner, via the “–sV” scan option, is able to identify SSL services. Vulnerability Scanners, in addition to performing service discovery, may include checks against weak ciphers (for example, the Nessus scanner has the capability of checking SSL services on arbitrary ports, and will report weak ciphers).

Example 1. SSL service recognition via nmap.

[root@test]# nmap -F -sV localhost

Starting nmap 3.75 ( ) at 2005-07-27 14:41 CEST
Interesting ports on localhost.localdomain (
(The 1205 ports scanned but not shown below are in state: closed)

443/tcp   open  ssl             OpenSSL
901/tcp   open  http            Samba SWAT administration server
8080/tcp  open  http            Apache httpd 2.0.54 ((Unix) mod_ssl/2.0.54 OpenSSL/0.9.7g PHP/4.3.11)
8081/tcp  open  http            Apache Tomcat/Coyote JSP engine 1.0

Nmap run completed -- 1 IP address (1 host up) scanned in 27.881 seconds

Example 2. Identifying weak ciphers with Nessus. The following is an anonymized excerpt of a report generated by the Nessus scanner, corresponding to the identification of a server certificate allowing weak ciphers

 https (443/tcp)
 Here is the SSLv2 server certificate:
 Version: 3 (0x2)
 Serial Number: 1 (0x1)
 Signature Algorithm: md5WithRSAEncryption
 Issuer: C=**, ST=******, L=******, O=******, OU=******, CN=******
 Not Before: Oct 17 07:12:16 2002 GMT
 Not After : Oct 16 07:12:16 2004 GMT
 Subject: C=**, ST=******, L=******, O=******, CN=******
 Subject Public Key Info:
 Public Key Algorithm: rsaEncryption
 RSA Public Key: (1024 bit)
 Modulus (1024 bit):
 Exponent: 65537 (0x10001)
 X509v3 extensions:
 X509v3 Basic Constraints:
 Netscape Comment:
 OpenSSL Generated Certificate
 Page 10
 Network Vulnerability Assessment Report 25.05.2005
 X509v3 Subject Key Identifier:
 X509v3 Authority Key Identifier:
 Signature Algorithm: md5WithRSAEncryption
 Here is the list of available SSLv2 ciphers:

The SSLv2 server offers 5 strong ciphers, but also 0 medium strength and 2 weak "export class" ciphers.
The weak/medium ciphers may be chosen by an export-grade or badly configured client software. They only offer a limited protection against a brute force attack

Solution: disable those ciphers and upgrade your client software if necessary.
See or  This SSLv2 server also accepts SSLv3 connections. This SSLv2 server also accepts TLSv1 connections.
 Vulnerable hosts
 (list of vulnerable hosts follows)

Example 3. Manually audit weak SSL cipher levels with OpenSSL. The following will attempt to connect to with SSLv2.

[root@test]# openssl s_client -no_tls1 -no_ssl3 -connect
depth=0 /C=US/ST=California/L=Mountain View/O=Google Inc/
verify error:num=20:unable to get local issuer certificate
verify return:1
depth=0 /C=US/ST=California/L=Mountain View/O=Google Inc/
verify error:num=27:certificate not trusted
verify return:1
depth=0 /C=US/ST=California/L=Mountain View/O=Google Inc/
verify error:num=21:unable to verify the first certificate
verify return:1
Server certificate
subject=/C=US/ST=California/L=Mountain View/O=Google Inc/
issuer=/C=ZA/ST=Western Cape/L=Cape Town/O=Thawte Consulting cc/OU=Certification Services Division/CN=Thawte Premium Server CA/
No client certificate CA names sent
Ciphers common between both SSL endpoints:
RC4-MD5         EXP-RC4-MD5     RC2-CBC-MD5
SSL handshake has read 1023 bytes and written 333 bytes
New, SSLv2, Cipher is DES-CBC3-MD5
Server public key is 1024 bit
Compression: NONE
Expansion: NONE
    Protocol  : SSLv2
    Cipher    : DES-CBC3-MD5
    Session-ID: 709F48E4D567C70A2E49886E4C697CDE
    Master-Key: 649E68F8CF936E69642286AC40A80F433602E3C36FD288C3
    Key-Arg   : E8CB6FEB9ECF3033
    Start Time: 1156977226
    Timeout   : 300 (sec)
    Verify return code: 21 (unable to verify the first certificate)

These tests usually provide a very in-depth and reliable method for ensuring weak and vulnerable ciphers are not used in order to comply with said audits.

Personally, I prefer the nessus audit scans. Usually the default “free” plugins are enough to complete these types of one-off audits. There are, however, commercial nessus plugins designed just for PCI-DSS compliance audits and are available for purchase from the nessus site.

How to repair damaged MySQL tables

Once in a while something will happen to a server and the mysql database will get corrupted.

A specific instance comes to mind on one of our Cacti monitoring servers.

The /var partition filled up due to too many messages being sent to the root user in /var/spool/. This caused MySQL to crash as well since the cacti poller couldnt write to the poller_output table in MySQL.

The result was all graphs being blank within cacti.

In any case, a thorough analysis of the mysql database was in order and I decided to post this quick tutorial for performing quick / lengthy table checks for offline and online MySQL databases.

Stage 1: Checking your tables


myisamchk *.MYI 


myisamchk -e *.MYI 

if you have more time. Use the -s (silent) option to suppress unnecessary information.

If the mysqld server is stopped, you should use the –update-state option to tell myisamchk to mark the table as “checked.”

You have to repair only those tables for which myisamchk announces an error. For such tables, proceed to Stage 2.

If you get unexpected errors when checking (such as out of memory errors), or if myisamchk crashes, go to Stage 3.

Stage 2: Easy safe repair

First, try :

myisamchk -r -q tbl_name 

(-r -q means “quick recovery mode”).

This attempts to repair the index file without touching the data file. If the data file contains everything that it should and the delete links point at the correct locations within the data file, this should work, and the table is fixed. Start repairing the next table. Otherwise, use the following procedure:

1. Make a backup of the data file before continuing.

2. Use

myisamchk -r tbl_name 

(-r means “recovery mode”)

This removes incorrect rows and deleted rows from the data file and reconstructs the index file.

3. If the preceding step fails, use

myisamchk --safe-recover tbl_name

Safe recovery mode uses an old recovery method that handles a few cases that regular recovery mode does not (but is slower).

Note: If you want a repair operation to go much faster, you should set the values of the sort_buffer_size and key_buffer_size variables each to about 25% of your available memory when running myisamchk.

If you get unexpected errors when repairing (such as out of memory errors), or if myisamchk crashes, go to Stage 3.

Stage 3: Difficult repair

You should reach this stage only if the first 16KB block in the index file is destroyed or contains incorrect information, or if the index file is missing. In this case, it is necessary to create a new index file. Do so as follows:

1. Move the data file to a safe place.

2. Use the table description file to create new (empty) data and index files:

   shell> mysql db_name
     mysql> SET AUTOCOMMIT=1;
     mysql> TRUNCATE TABLE tbl_name;
     mysql> quit

3. Copy the old data file back onto the newly created data file. (Do not just move the old file back onto the new file. You want to retain a copy in case something goes wrong.)

Go back to Stage 2. :

myisamchk -r -q should work.

(This should not be an endless loop.)

You can also use the REPAIR TABLE tbl_name USE_FRM SQL statement, which performs the whole procedure automatically. There is also no possibility of unwanted interaction between a utility and the server, because the server does all the work when you use REPAIR TABLE. See Section, “REPAIR TABLE Syntax”.

Stage 4: Very difficult repair

You should reach this stage only if the .frm description file has also crashed. That should never happen, because the description file is not changed after the table is created:

1. Restore the description file from a backup and go back to Stage 3. You can also restore the index file and go back to Stage 2. In the latter case, you should start with

myisamchk -r

2. If you do not have a backup but know exactly how the table was created, create a copy of the table in another database. Remove the new data file, and then move the .frm description and .MYI index files from the other database to your crashed database. This gives you new description and index files, but leaves the .MYD data file alone. Go back to Stage 2 and attempt to reconstruct the index file.

Thats it!

How to setup a slave DNS Nameserver with Bind

When implementing redundancy as far as DNS is concerned, automated is always better. In a hosting environment, new zone files are constantly being created.

This need for a DNS master/slave implementation where new zone files are transferred between the master nameserver and the slave became apparent as operations grew and geographic DNS redundancy became apparent.

Obviously some commercial dns products provide this type of functionality out-of-the-box, but I will show you how to do this with a simple Bind DNS distribution.

I wrote this tutorial to help you, hopefully, to create an automated DNS slave / zone file transfer environment. Obviously you can create as many slave servers as you feel necessary.


1. Edit /etc/named.conf and add the following to the options section where xx.xx.xx.xx is the ip of your slave server.:

allow-transfer { xx.xx.xx.xx; };

2. Create a script with the following, where somedirectory is the directory on your SLAVE server to store the slave zones and where yy.yy.yy.yy is your MASTER server ip and somewwwdir is a directory browsable via http and finally someslavefile.conf is the output file to write you slave config:

for domain in `/bin/grep ^zone /etc/named.conf |/bin/grep "type master" |/bin/awk '{print $2}' |/bin/awk -F" '{print $2}'`


/usr/bin/printf "zone "${domain}" { type slave; file "/var/named/slaves/somedirectory/${domain}.db"; masters { yy.yy.yy.yy; }; };n"

done > /var/www/html/somewwwdir/someslavefile.conf

3. Test the script to ensure it is writing out the appropriate format.

4. Run the script as any user with permission to write to an http visible directory via cron.

0 4 * * * /path/to/script > /dev/null 2>&1


1. Transfer the rndc.key file from your master server to the slave :

scp MASTERSERVER:/etc/rndc.key /etc/ns1rndc.key

2. Edit ns1rndc.key and change the name of the key definition.

3. Edit named.conf and add the following to the options section:

allow-transfer { zz.zz.zz.zz; };

4. Append the following to the named.conf file:

include "/etc/ns1rndc.key";
include "/path/to/someslavefile.conf";

5. Run the following commands

touch /path/to/someslavefile.conf
mkdir /var/named/slaves/somedirectory/
chown -R named:named /var/named/slaves/somedirectory/
/etc/init.d/named restart

6. Create a script:

/usr/bin/wget http://yy.yy.yy.yy/somewwwdir/someslavefile.conf  -O /var/named/slaves/someslavefile.conf
/etc/init.d/named restart

7. Add to root’s crontab

0 4 * * * /path/to/script

In the second slave script, you see that the transfer is done via wget. This can be replaced by many other more secure methods. If ssh based key authentication is employed, a simple scp or even rsync can be utilized to accomplish the actual zone transfer.

Quick tips using FIND , SSH, TAR , PS and GREP Commands

Administering hundreds of systems can be tedious. Sometimes scripting repetitive tasks, or replicating tasks across many servers is necessary.

Over time, I’ve jotted down several quick useful notes regarding using various linux/unix commands. I’ve found them very useful when navigating and performing various tasks. I decided to share them with you, so hopefully you will find them a useful reference at the very least!

To find files within a time range and add up the total size of all those files :

find /opt/uploads -mtime -365 -printf "%sn"|awk '{sum+=$0}END{print sum}'

To watch a command’s progress :

watch -n1 'du -h -c --max-depth=1'

Transfer a file / folders, compress it midstrem over the network, uncompress the file on the recieving end:

ssh -l root '(cd /opt/uploads/ && tar -czf - . -C /opt/uploads)' | tar -xzf -

Below will return any XYZ PID that is older than 10 hours.

ps -ef | grep XYZ | awk '{ print $7 ":" $2 }' | awk 'BEGIN { FS =":" }; {if ($1 > 10) print $4}'

Check web logs on www server for specific ip address access:

grep "ip_address" [a-h]*/logs/*access*4		<-- check a-h websites
grep "ip_address" [A-Z,0-9]*/logs/*access*4	<-- check A-Z, 0-9 websites

Those are just a few of the useful commands that can be applied to many different functions. I particularly like sending files across the network and compressing them mid stream :)

The above kind of administration is made even easier when you employ ssh key based authentication -- your commands can be scripted to execute across many servers in one sitting (just be careful) ;)

MySQL Replication : Setting up a Simple Master / Slave

It is often necessary, when designing high availability environments to implement a database replication scenario with MySQL.

This simple how-to is intended to setup a simple master / slave relationship.


1. Select a master server. It can be either one.

2. Make sure all databases that you want to replicate to the slave already exist! The easist way is to just copy the database dirs inside your MySQL data directory intact over to your slave, and then recursively chown them to “mysql:mysql”. Remember, the binary structures are file-system dependant, so you can’t do this between MySQL servers on different OS’s. In this instance you will want to use mysqldump most likely.

3. Create /etc/my.cnf if you do not already have one:

socket=/tmp/mysql.sock [enter YOUR path to mysql.sock here]
binlog-do-db=bossdb     # input the database which should be replicated
binlog-ignore-db=mysql1 # input the database that should be ignored for replication
binlog-ignore-db=mysql2  # input the database that should be ignored for replication

4. Permit your slave server to replicate by issuing the following SQL command (substituting your slave’s IP and preferred password):

mysql> GRANT REPLICATION SLAVE ON *.* TO 'replicate'@'' IDENTIFIED BY 'somepass';

5. Shut down and restart MySQL daemon and verify that all is functional.


1. Create /etc/my.cnf if you do not already have one:

socket=/tmp/mysql.sock [enter YOUR path to mysql.sock here]
server-id=2 [MUST be different to master]

2. Shut down and restart MySQL on slave.

3. Issue the following SQL command to check status:

mysql> show slave statusG;

Ensure that the following two fields are showing this :

Slave_IO_Running: Yes
Slave_SQL_Running: Yes

If not, try to issue the following command :

mysql> start slave;

This will manually start the slave process. Note that only updated tables and entries after the slave process has started will be sent from the master to the slave — it is not a differential replication.


Just update some data on the master, and query that record on the slave. The update should be instantaneous.

Test creating a table on the master MySQL server database :

mysql> use replicateddb;
Database changed

mysql> CREATE TABLE example4( id INT NOT NULL AUTO_INCREMENT,  PRIMARY KEY(id),  name VARCHAR(30),   age INT);
Query OK, 0 rows affected (0.04 sec)

And check the database on the slave to ensure that the recently created table on the master was replicated properly.

Detect ARP poisoning on LAN

ARP Poisoning : Potential MITM attack

Occasionally during security audits it may be necessary to check your LAN for rogue machines. All the potential rogue machine in your LAN needs to do is poison your ARP cache so that the cache thinks that the attacker is the router or the destination machine. Then all packets to that machine will go through the rogue machine, and it will be, from the network’s standpoint, between the client and the server, even though technically it’s just sitting next to them. This is actually fairly simple to do, and is also fairly easy to detect as a result.

In this sample case, the rogue machine was in a different room but still on the same subnet. Through simple ARP poisoning it convinced the router that it was our server, and convinced the server that it was the router. It then had an enjoyable time functioning as both a password sniffer and a router for unsupported protocols.

By simply pinging all the local machines (nmap -sP will do this quickly) and then checking the ARP table (arp -an) for duplicates, you can detect ARP poisoning quite quickly.

$ arp -an| awk '{print $4}'| sort | uniq -c | grep -v ' 1 '
    5 F8:F0:11:15:34:51

Then I simply looked at the IP addresses used by that ethernet address in ‘arp -an’ output, ignoring those that were blatantly poisoned (such as the router) and looked up the remaining address in DNS to see which machine it was.

Below is a script I wrote to automate this process (perhaps in a cron job) , and send out an alert email if any ARP poisoning is detected.

ARP Poisoning Check Script

This can ideally run as a cronjob (i.e. 30 * * * *)

# Star Dot Hosting
# detect arp poisoning on LAN

currentmonth=`date "+%Y-%m-%d %H:%M:%S"`

rm $logpath/arpwatch.log

echo "ARP Poisoning Audit: " $currentmonth >> $logpath/arpwatch.log
echo -e "-----------------------------------------" >> $logpath/arpwatch.log
echo -e >> $logpath/arpwatch.log

arp -an | awk '{print $4}' | sort | uniq -c | grep -v ' 1 '

if [ "$?" -eq 0 ]
        arp -an | awk '{print $4}' | sort | uniq -c | grep -v ' 1 ' >> $logpath/arpwatch.log 2>&1
        cat $logpath/arpwatch.log | mail -s 'Potential ARP Poisoning ALERT!'
echo -e "No potential ARP poisoning instances found..." >> $logpath/arpwatch.log