Architecture
Mail Servers
The MailArchiva server archives emails from external mail systems such as Microsoft Exchange, Postfix, Sendmail and others. It can either accept SMTP or Sendmail milter traffic from these external mail systems or it can fetch mail from them using IMAP or POP. The MailArchiva Server can run on any server on your network provided it has TCP/IP connectivity to your mail server. For optimal performance, and to minimize changes to the server hosting your mail system, it is recommended that the MailArchiva server runs on a dedicated server platform.
Web Console
In addition to archiving e-mails, the server provides a web interface that is used to administer the product. This interface, referred to as the “Server Console”, also provides the capability for users to search and retrieve e-mails. Unlike traditional mail clients, MailArchiva's search function allows users with sufficient privileges to search across all emails in an entire company, not just a single mailbox.
Authentication
Access to the server console is restricted to authenticated users only. An authenticated user may assume an administrator, auditor, user or custom-defined role. Each of these roles implies a different set of entitlements, which are discussed in Logins. For simplicity’s sake, the server may be configured to authenticate users using credentials contained in a simple XML configuration file (Basic Authentication).
Alternatively, the server may be setup to authenticate users using Microsoft Active Directory (Active Directory Authentication) or using basic LDAP authentication. The benefit of authenticating with Active Directory or an LDAP server is that user accounts can be managed centrally using standard administration tools.
Archiving Process
Emails are typically received via MailArchiva's inbuilt SMTP, IMAP or Milter interfaces. When data is received, it is immediately written to the Receive Queue. Once data is written to the Receive Queue, the data is considered safe. Meaning, if the server is rebooted, MailArchiva will continue processing the remaining items on the queue.
A series of archiving threads retrieve items from the Receive Queue, generate a unique ID for each document, and store data in the active volume. Depending on the unique ID of the document, the document is stored in one of 4096 AES encrypted archive files. By dispersing data across multiple archive files, MailArchiva is able to achieve higher performance through parallel writes.
Once data has been written to the volume, it is sent to the indexer. The indexer parses the contents of the documents and indexes all their fields. The search index generated by the indexer is subsequently used to provide high-speed search functionality.
Storage Scheme
MailArchiva stores archived data in logical volumes that can be rolled out periodically (e.g. monthly or otherwise). Within each volume store directory, data is dispersed evenly across 4096 RAES encrypted ZIP files (with .zz extension). Inside each zip, an archived email consists of an .eml file. This storage scheme is deliberately designed to attain the following goals. (1) volumes do not contain files that are not so large that popular backup products cannot back them up (2) volumes do not contain files that are so plentiful that, backup's take too long time to complete (3) volume contain just the right amount of files that make them easy to copy and move around (4) use of standard driven formats enabling volume data to be accessed 50 years later (5) very high compression to reduce storage costs (6) writes to multiple files simultaneously for improved archiving performance.
Communication Ports
* by default, these ports can be changed
Performance
The performance of MailArchiva is largely dependent on the performance characteristics of the hardware environment within which it runs. When planning your hardware configuration, it is important to consider factors such as motherboard/chip architecture, CPU speed, number of cores, amount of memory, Ethernet speed, and storage configuration. In larger sites, the server may require more CPU power and larger amounts of memory.
Ethernet
Considering the large volume of traffic that will be passing between your mail server and MailArchiva, it is a good idea (especially in larger sites) to install a 1 GB or higher Ethernet link between them. This is especially important if you plan on connecting MailArchiva to an Ethernet-based networked storage device, since the same pipe may be used for both the retrieval and storage of emails.
Storage
The choice of storage hardware and configuration varies greatly depending on the volume of emails the archiving server is expected to handle. In small environments (0-100 mailboxes), two in-built SATA drives organized in a RAID configuration is often sufficient. At larger sites, since high speed searching across large indexes requires low latency disk access times, it is advisable to keep the search engine index and email archive store information on separate drives.
While MailArchiva is capable of, and is indeed optimized for, archiving to remote Network Attached Storage (NAS) devices, it is never a good idea to store the index data remotely as this will adversely affect the performance of searches. In addition to archiving to NAS devices, enterprise level customers can comfortably configure MailArchiva to archive emails to Storage Area Networks (SAN).
Found this information useful? Visit mailarchiva.com to learn more about MailArchiva.