Troubleshooting Mail : Basic Tips

In a shared or managed hosting environment, many people would agree that one of the most commonly reported issue with our helpdesk is email.

It has become a best practice to perform basic email troubleshooting , depending on the error / bounce back reported. These steps could be performed before or after the issue has been fixed — either to confirm it is working or to reproduce a particular problem.

The purpose of this document is to describe how mail gets from one users email client to another users mailbox. Each action that takes place can be tested and verified by an some level of admin with a few exceptions. After going through this document you should understand what takes place when a user sends email or has email sent to them.

Test SMTP Authentication

SMTP users BASE64 encoding to transmit usernames and passwords. You will need to take the username of the client ( user@domain.com ) and the password and convert them to BASE64. Links provided below to do this:

* http://www.webpan.com/customers/Email/base64_conversion.htm
* http://makcoder.sourceforge.net/demo/base64.php
* http://www.opinionatedgeek.com/dotnet/tools/Base64Encode/

Test Sucessful if: 235 Authentication successful

Testing if user is local or remote

Our mailserver will only accept email for a remote host if you authenticate first. Without authentication it will only accept email for local domains.

Local Domain if: 250 ok
Remote Domain if: 553 sorry, that domain isn’t in my list of allowed rcpthosts

Looking up an MX record

Two or three lookups should be done off of various DNS servers. ns1.stardothosting.com, a public recursive NS server (168.144.1.130), and the NS server specified in the WHOIS of the domain. Ideally these will all match, if they don’t then connections to the incorrect mailserver may be made.

WINDOWS COMMAND PROMPT

Testing to see if remote server will accept a message

Similar to the test we use to see if our mail server will take the message as a local delivery you can do the same test on the remote server to see if it will take the message.

Test Sucessful if: 250 ok

Send an entire message to a mailserver

Test Sucessful if: 250 ok

Check to see if a remote server is in an RBL

You will need to find out what the IP address of the machine that is actually sending the email out to the internet. There are many tools out there to check you rdomain / IP address againts many RBL lists :

* http://www.robtex.com/rbl/
* http://www.anti-abuse.org/multi-rbl-check/
* http://checker.msrbl.com/

RBL’s Checked Against:

* relays.ordb.org
* sbl-xbl.spamhaus.org
* bl.spamcop.net

SpamAssassin Troubleshooting

Via the webmail (Squirrel Mail) place the address that is trying to email the customer in the whitelist. When the email arrives check the email headers which will show any SpamAssassin checks that applied points to the message. Details on the SA tests and how to check email headers is available on the links below:

* Headers: http://www.spamcop.net/fom-serve/cache/19.html
* SA Tests: http://spamassassin.apache.org/tests_3_1_x.html

Testing for weak SSL ciphers for security audits

During security audits, such as a PCI-DSS compliance audit, it is very commonplace to test the cipher mechanism that a website / server uses and supports to ensure that weak / outdated cipher methods are not used.

Weak ciphers allow for an increased risk in encryption compromise, man-in-the-middle attacks and other related attack vectors.

Due to historic export restrictions of high grade cryptography, legacy and new web servers are often able and configured to handle weak cryptographic options.

Even if high grade ciphers are normally used and installed, some server misconfiguration could be used to force the use of a weaker cipher to gain access to the supposed secure communication channel.

Testing SSL / TLS cipher specifications and requirements for site

The http clear-text protocol is normally secured via an SSL or TLS tunnel, resulting in https traffic. In addition to providing encryption of data in transit, https allows the identification of servers (and, optionally, of clients) by means of digital certificates.

Historically, there have been limitations set in place by the U.S. government to allow cryptosystems to be exported only for key sizes of, at most, 40 bits, a key length which could be broken and would allow the decryption of communications. Since then, cryptographic export regulations have been relaxed (though some constraints still hold); however, it is important to check the SSL configuration being used to avoid putting in place cryptographic support which could be easily defeated. SSL-based services should not offer the possibility to choose weak ciphers.

Testing for weak ciphers : examples

In order to detect possible support of weak ciphers, the ports associated to SSL/TLS wrapped services must be identified. These typically include port 443, which is the standard https port; however, this may change because a) https services may be configured to run on non-standard ports, and b) there may be additional SSL/TLS wrapped services related to the web application. In general, a service discovery is required to identify such ports.

The nmap scanner, via the “–sV” scan option, is able to identify SSL services. Vulnerability Scanners, in addition to performing service discovery, may include checks against weak ciphers (for example, the Nessus scanner has the capability of checking SSL services on arbitrary ports, and will report weak ciphers).

Example 1. SSL service recognition via nmap.

Example 2. Identifying weak ciphers with Nessus. The following is an anonymized excerpt of a report generated by the Nessus scanner, corresponding to the identification of a server certificate allowing weak ciphers

Example 3. Manually audit weak SSL cipher levels with OpenSSL. The following will attempt to connect to Google.com with SSLv2.

These tests usually provide a very in-depth and reliable method for ensuring weak and vulnerable ciphers are not used in order to comply with said audits.

Personally, I prefer the nessus audit scans. Usually the default “free” plugins are enough to complete these types of one-off audits. There are, however, commercial nessus plugins designed just for PCI-DSS compliance audits and are available for purchase from the nessus site.

How to repair damaged MySQL tables

Once in a while something will happen to a server and the mysql database will get corrupted.

A specific instance comes to mind on one of our Cacti monitoring servers.

The /var partition filled up due to too many messages being sent to the root user in /var/spool/. This caused MySQL to crash as well since the cacti poller couldnt write to the poller_output table in MySQL.

The result was all graphs being blank within cacti.

In any case, a thorough analysis of the mysql database was in order and I decided to post this quick tutorial for performing quick / lengthy table checks for offline and online MySQL databases.

Stage 1: Checking your tables

Run:

or

if you have more time. Use the -s (silent) option to suppress unnecessary information.

If the mysqld server is stopped, you should use the –update-state option to tell myisamchk to mark the table as “checked.”

You have to repair only those tables for which myisamchk announces an error. For such tables, proceed to Stage 2.

If you get unexpected errors when checking (such as out of memory errors), or if myisamchk crashes, go to Stage 3.

Stage 2: Easy safe repair

First, try :


(-r -q means “quick recovery mode”).

This attempts to repair the index file without touching the data file. If the data file contains everything that it should and the delete links point at the correct locations within the data file, this should work, and the table is fixed. Start repairing the next table. Otherwise, use the following procedure:

1. Make a backup of the data file before continuing.

2. Use

(-r means “recovery mode”)

This removes incorrect rows and deleted rows from the data file and reconstructs the index file.

3. If the preceding step fails, use

Safe recovery mode uses an old recovery method that handles a few cases that regular recovery mode does not (but is slower).

Note: If you want a repair operation to go much faster, you should set the values of the sort_buffer_size and key_buffer_size variables each to about 25% of your available memory when running myisamchk.

If you get unexpected errors when repairing (such as out of memory errors), or if myisamchk crashes, go to Stage 3.

Stage 3: Difficult repair

You should reach this stage only if the first 16KB block in the index file is destroyed or contains incorrect information, or if the index file is missing. In this case, it is necessary to create a new index file. Do so as follows:

1. Move the data file to a safe place.

2. Use the table description file to create new (empty) data and index files:

3. Copy the old data file back onto the newly created data file. (Do not just move the old file back onto the new file. You want to retain a copy in case something goes wrong.)

Go back to Stage 2. :

(This should not be an endless loop.)

You can also use the REPAIR TABLE tbl_name USE_FRM SQL statement, which performs the whole procedure automatically. There is also no possibility of unwanted interaction between a utility and the server, because the server does all the work when you use REPAIR TABLE. See Section 12.5.2.6, “REPAIR TABLE Syntax”.

Stage 4: Very difficult repair

You should reach this stage only if the .frm description file has also crashed. That should never happen, because the description file is not changed after the table is created:

1. Restore the description file from a backup and go back to Stage 3. You can also restore the index file and go back to Stage 2. In the latter case, you should start with

2. If you do not have a backup but know exactly how the table was created, create a copy of the table in another database. Remove the new data file, and then move the .frm description and .MYI index files from the other database to your crashed database. This gives you new description and index files, but leaves the .MYD data file alone. Go back to Stage 2 and attempt to reconstruct the index file.

Thats it!

How to setup a slave DNS Nameserver with Bind

When implementing redundancy as far as DNS is concerned, automated is always better. In a hosting environment, new zone files are constantly being created.

This need for a DNS master/slave implementation where new zone files are transferred between the master nameserver and the slave became apparent as operations grew and geographic DNS redundancy became apparent.

Obviously some commercial dns products provide this type of functionality out-of-the-box, but I will show you how to do this with a simple Bind DNS distribution.

I wrote this tutorial to help you, hopefully, to create an automated DNS slave / zone file transfer environment. Obviously you can create as many slave servers as you feel necessary.

MASTER Server

1. Edit /etc/named.conf and add the following to the options section where xx.xx.xx.xx is the ip of your slave server.:

2. Create a script with the following, where somedirectory is the directory on your SLAVE server to store the slave zones and where yy.yy.yy.yy is your MASTER server ip and somewwwdir is a directory browsable via http and finally someslavefile.conf is the output file to write you slave config:

3. Test the script to ensure it is writing out the appropriate format.

4. Run the script as any user with permission to write to an http visible directory via cron.

SLAVE SERVER

1. Transfer the rndc.key file from your master server to the slave :

2. Edit ns1rndc.key and change the name of the key definition.

3. Edit named.conf and add the following to the options section:

4. Append the following to the named.conf file:

5. Run the following commands

6. Create a script:

7. Add to root’s crontab

In the second slave script, you see that the transfer is done via wget. This can be replaced by many other more secure methods. If ssh based key authentication is employed, a simple scp or even rsync can be utilized to accomplish the actual zone transfer.

Quick tips using FIND , SSH, TAR , PS and GREP Commands

Administering hundreds of systems can be tedious. Sometimes scripting repetitive tasks, or replicating tasks across many servers is necessary.

Over time, I’ve jotted down several quick useful notes regarding using various linux/unix commands. I’ve found them very useful when navigating and performing various tasks. I decided to share them with you, so hopefully you will find them a useful reference at the very least!

To find files within a time range and add up the total size of all those files :

To watch a command’s progress :

Transfer a file / folders, compress it midstrem over the network, uncompress the file on the recieving end:

Below will return any XYZ PID that is older than 10 hours.

Check web logs on www server for specific ip address access:

Those are just a few of the useful commands that can be applied to many different functions. I particularly like sending files across the network and compressing them mid stream 🙂

The above kind of administration is made even easier when you employ ssh key based authentication — your commands can be scripted to execute across many servers in one sitting (just be careful) 😉

MySQL Replication : Setting up a Simple Master / Slave

It is often necessary, when designing high availability environments to implement a database replication scenario with MySQL.

This simple how-to is intended to setup a simple master / slave relationship.

PREPARATION OF MASTER SERVER

1. Select a master server. It can be either one.

2. Make sure all databases that you want to replicate to the slave already exist! The easist way is to just copy the database dirs inside your MySQL data directory intact over to your slave, and then recursively chown them to “mysql:mysql”. Remember, the binary structures are file-system dependant, so you can’t do this between MySQL servers on different OS’s. In this instance you will want to use mysqldump most likely.

3. Create /etc/my.cnf if you do not already have one:

4. Permit your slave server to replicate by issuing the following SQL command (substituting your slave’s IP and preferred password):

5. Shut down and restart MySQL daemon and verify that all is functional.

PREPARATION OF SLAVE

1. Create /etc/my.cnf if you do not already have one:

2. Shut down and restart MySQL on slave.

3. Issue the following SQL command to check status:

Ensure that the following two fields are showing this :

If not, try to issue the following command :

This will manually start the slave process. Note that only updated tables and entries after the slave process has started will be sent from the master to the slave — it is not a differential replication.

TESTING

Just update some data on the master, and query that record on the slave. The update should be instantaneous.

Test creating a table on the master MySQL server database :

And check the database on the slave to ensure that the recently created table on the master was replicated properly.

Detect ARP poisoning on LAN

ARP Poisoning : Potential MITM attack

Occasionally during security audits it may be necessary to check your LAN for rogue machines. All the potential rogue machine in your LAN needs to do is poison your ARP cache so that the cache thinks that the attacker is the router or the destination machine. Then all packets to that machine will go through the rogue machine, and it will be, from the network’s standpoint, between the client and the server, even though technically it’s just sitting next to them. This is actually fairly simple to do, and is also fairly easy to detect as a result.

In this sample case, the rogue machine was in a different room but still on the same subnet. Through simple ARP poisoning it convinced the router that it was our server, and convinced the server that it was the router. It then had an enjoyable time functioning as both a password sniffer and a router for unsupported protocols.

By simply pinging all the local machines (nmap -sP 192.168.1.0/24 will do this quickly) and then checking the ARP table (arp -an) for duplicates, you can detect ARP poisoning quite quickly.

Then I simply looked at the IP addresses used by that ethernet address in ‘arp -an’ output, ignoring those that were blatantly poisoned (such as the router) and looked up the remaining address in DNS to see which machine it was.

Below is a script I wrote to automate this process (perhaps in a cron job) , and send out an alert email if any ARP poisoning is detected.

ARP Poisoning Check Script

This can ideally run as a cronjob (i.e. 30 * * * *)

Simple!