Volumes
MailArchiva organises archived data into one or more logical volumes. Each volume consists of an index and a store. A volume's index contains information used purely for searching purposes. A volume store contains the actual archived information. If the store data is deleted from disk, it is not possible to recover archived information unless a backup is available. The index data, however, is used purely for searching purposes and can be regenerated at any stage from the information residing in the store.
All Volume functions are accessible in Configuration->Volumes from the server console. To further your understanding of MailArchiva's volume features, click on the topics that interest you below:
Store Encryption Password
Volume store data is encrypted using a chosen store encryption password. This measure ensures that sensitive archived remains private and intact. All store data is encrypted using AES-128 encryption and compressed using standard ZIP compression. The Store Encryption Password must be chosen carefully as it is not possible to change it once it has been set.
Big Note: Please ensure that the store encryption password can never be forgotten. Also, that the password is correctly documented and will survive termination of employment.
New Volume
Click the New Volume button in Configuration->Volumes to create a new logical volume. After doing so, a new volume will appear with its index and store paths editable.
Both the index and store paths can refer to any path location on disk. By selecting the down arrow inside each text box, a folder selection dialog appears, enabling one to select an appropriate index and store path. Alternatively, simply type in the location of an appropriate index and store path (e.g. “c:\store1” and “c:\index2”).
A volume's index and store path cannot refer to the same location on the disk. Each volume must have its own unique store and index path. There cannot be any overlap with the store and index paths defined by any other volume. Furthermore, if the index and store path specified do not exist, they will be created automatically. Click Save when you're satisfied the index and store path are correct.
Volume Formats
MailArchiva supports a variety of storage formats, each of which have their trade-offs (see table below).
How To Switch Volume Format
To switch volume formats:
- Login to the MailArchiva web console
- In Configuration->Archive, set New Volume Format to desired Format, Save.
- In Configuration->Volumes, create a New Volume, Save.
Using Object Stores
MailArchiva supports archiving data to Blackblaze B2, Amazon S3/Glacier, EMC Atmos, OpenStack Swift, Azure Blob Storage, Azure Archival Storage, Rackspace Cloud Files and Minio object storage services. By default, the MailArchiva setup wizard will create a local volume. However, the product also supports the creation of volumes residing in remote cloud object stores. It is possible for MailArchiva to reference both local and remote volumes at the same time.
The process for configuring an archival to an external object store as follows:
- Either create an account with a public object store service (e.g. Backblaze B2, AWS or Azure), or install an on-premise object store service (e.g. Minio or Open Swift).
- Define an object store connection in Configuration->Connections, Save
- Test the object store connection to ensure that it is able to connect.
- In Configuration->Archive, set New Volume Format to External, Save.
- By default, the MailArchiva setup wizard will create a local volume. This volume cannot be switched to an external one and ought to be deleted (if empty).
- Create a new logical volume by clicking New in Configuration->Volumes. When entering the store path, click the down arrow to select the object store connection. In the index path, enter or select a local path on disk. Index data is not suited to residing in object storage, and thus the index location must still refer to a local route.
- Click Save to save changes.
MailArchiva stores object store data in a bucket whose name is specified during the creation of the object store connection. The data resides in a sub directory whose name is taken from the ID of the volume.
Using Remote Storage
While it is possible to refer to locations on a remote NAS, it is not recommended to place the index data at a remote location. For performance reasons, MailArchiva’s search engine requires very low latency when accessing the index. It is however, entirely acceptable for store data to reside at a remote SAN or NAS since this data is accessed relatively infrequently.
Unfortunately, it appears certain NASes and SAN's simply cannot cope with the sophisticated file locking needs of the V2 storage format. To circumvent the problem, it is possible to switch to the older (less sophisticated) V1 storage format. To do this, change the Volume Storage Format to V1 in Configuration->Archive. Thereafter, close the existing ACTIVE volume (if necessary) and create a new Volume in Configuration->Volumes. Data written to the new volume, will be stored in the less sophisticated V1 storage format.
Windows
In the Windows version of MailArchiva, it is possible to specify a UNC path referring to a remote store path location (e.g. \\server\store\store0). Before doing this, ensure that the MailArchiva service is running under an administrator account.
Follow the steps below to grant the appropriate permissions so that MailArchiva is able to read/write to the remote network drive:
- Open the Windows Services Control Panel applet (not the MailArchiva task tray icon configuration!)
- Select the MailArchiva Server Service
- Right click, select Properties, select Logon Tab
- Enter the Domain administrator account login account details
- Save and restart the MailArchiva server.
If, after restarting, the volume store path is set to a remote location and the volume is shown as EJECTED, it is likely there is still a permissions issue. To resolve: logout the workstation, login to the Windows Service account chosen above, and attempt to access the Windows share from Explorer. Create a text file with arbitrary text inside it and open the text file to ensure that the share can be written/read using the chosen account.
Linux
Specifying UNC paths are not supported in the Unix versions of MailArchiva. Rather, a mount point must be defined in your /etc/fstab file and set to the base location of your NAS or SAN
disk. Refer to Network Attached Storage for more information on how to setup Linux mount points correctly.
After the appropriate mount point has been created, enter the equivalent of /mnt/archive/store0 for the volume store path.
Volume Status
When a new volume is created and the configuration saved, it will be assigned the “UNUSED” status. When the first email or document is archived, the server will automatically switch over to the first unused volume on the list and set its status to “ACTIVE”. This volume will stay active until such time as its maximum size is exceeded, the disk is full, or the volume is explicitly closed.
Only one volume can be active at a time and once the active volume is closed, no further data can be written to it and it cannot be reopened from the server console. The purpose of this behaviour is to ensure that archive data is stored chronologically across multiple volumes.
If at any stage during the archiving process, the server finds that an active volume is not available, it will always activate the next unused volume on its list. Assuming there are no remaining unused volumes available, the server will stop the archiving process until a new volume is added.
When using removable disks, it is not recommended to remove the disk containing the active volume data without closing/unmounting the volume first. Any physical disk containing a closed volume may be removed provided that the volume whose store path refers to it is unmounted first.
When users search for emails, the search is conducted across both active and closed volumes.
Rollover Volume
In addition to defining volumes manually, one can configure MailArchiva to create and rollover to new volumes based on certain conditions, such as when the volume is full or when a certain time period has elapsed. This feature is useful for two reasons:
- It allows one to keep volumes to a defined size so that they can be backed up on DVD media.
- It allows one to store archive information on a monthly, quarterly, annual basis so that the information can be organized chronologically.
During volume rollover, a new volume index and store path is automatically chosen. The index and store path of a newly created volume will be based on the store and index paths of the last volume that was created. There are two ways in which the paths are chosen:
- By date - A date in the format YYYYMM is appended to both the store and index path (e.g. C:\store\201001 and C:\index\201001) (any_strYYYYMM). The store and index path of the next volume will have a new date appended. Regular expresson match is (.*\\D)([0-9]{6,6})([a-z]{0,1}).*$. if there is more than one path with the same name, it proceed proceeed as follows, ../store/vol201711b /index/vol2017b, /store/vol201711c /index/vol2017c and so forth.
- By number - A numeric value is appended to both the store and index path (e.g. C:\store\store1 and C:\index\index1). The store and index path of the next volume will have an incremented value appended. ../store1 ../index0 (store1/index1, store1/index2, etc.) : regular expression match (.*\\D)(\\d{1,3})$.
If it is desired that volumes increment by date, enter a store and index path for the first volume with a YYYYMM appended. If volumes are to be incremented by number, enter a numeric value.
The following rollover options are available:
- By size - rollover to the next volume when the maximum allocated size is reached.
- By month - rollover on a monthly basis
- By quarter - rollover on a quarterly basis
- By year - rollover on a yearly basis
For a rollover to be successful, all of the following conditions must be met:
- There must be an active volume
- The active volume must confirm to store and index path rollover format (as defined in online help)
- The prospective rolled over paths must be accessible, writeable and have enough disk space.
- There must be no existing store or index in the prospective rolled over path
- The active volume path must have a created date (it should do so anyway)
When creating a new volume during rollover, the base folder for the store and index paths is taken from the store and index paths of the active volume (the one due to be closed). Thus, when starting out, create a volume with the needed base index and store paths. The next time rollover occurs, MailArchiva should create a volume that uses the same base path.
At the time of rollover, if there is already an UNUSED volume present, MailArchiva will rollover to it and deliberately neglect to create a new one. This mini-feature is a convenient way to change the base index and store path of a forthcoming volume. If you do not intend to change the base path, simply ensure that there are no UNUSED volumes defined and MailArchiva will create a new volume in accordance with the volume index and store paths above.
As of MailArchiva v6.2.2, if a rollover is not successful, a notification will be sent alerting the Administrator as to why a rollover could not be performed. Furthermore, under such conditions, the active volume will be closed and received traffic will build up in the Receive Queue until a new volume is manually created.
Reindex Volume
It is necessary to reindex a volume or multiple volumes, in three situations:
a) Occasionally, when upgrading to a major release (for example, from MailArchiva V2 to MailArchiva V3)
b) When a volume’s search index is corrupted
c) There are expected emails missing from the search results
Before reindexing a volume, consider whether it is desirable/necessary/practical to index attachment content. Including attachment content in the index will result in considerably larger indexes. Therefore, searches on such indexes, particularly with high doc counts, will be slower to perform. Furthermore, greater disk space will be consumed by the index. If searching for attachment content is required, ensure that the indexes are located on dedicated fast SAS/SSD local disks. To disable attachment indexing, refer to Configuration->Index.
To reindex a volume, click on the Reindex button next to the desired volume. To reindex all Volumes, click on the Reindex button at the top of all volumes. A full Reindex of five million emails can take a day or more to complete. While reindexing is taking place, after logging out of the console, it is still possible to perform searches on indexed data.
Preparation
- Ensure that the server has the correct memory settings applied. For instance:
- Heap should be set to around two thirds the size of the installed physical memory. (for example, assuming 2GB physical RAM, set heap to 1384MB)
- Indexing requires available virtual memory. (free space outside of the heap/permgen)
Monitoring The Reindex Process
How to know whether a reindex is working / has been completed successfully?
- Verify that the reindex process is running by visiting Status->Processes.
- Check the doc counts in Configuration->Volumes and Status->Volumes
- During a reindex, the doc count should be visibly increased. If not, there is a problem that must be investigated.
- Click the View button in Status->Tasks to see the Reindex log. There should be no reported errors.
- After the reindex process has been completed, the doc count should reflect the final tally.
If a volume cannot be reindexed correctly, refer to Indexing Troubles.
Update Volume Index
This action is similar to the Reindex Volume described above, except existing volume index data is not deleted at the start of the procedure. It has the advantage that users will still be able to access all volume data during the reindex process. The downside is that if a volume index is corrupted, deletion of the index may be necessary in order to recover from the corruption.
Convert Volume
The store format changed with the advent of MailArchiva v3. This option converts volumes archived with MailArchiva v2 and lower to the MailArchiva v3 format. The MailArchiva v3 is advantageous since it stores data in a total of 4096 archive files as opposed to store each email in its own file. This strategy is superior in that it is easier to work with the data on disk, the store data consumes less space on disk, and third party backup products are able to backup the store data far more quickly. The conversion process is a time-consuming one and can take several days to complete.
To proceed with the conversion of a volume to the new V3 format, click the Convert Volume button in Configuration->Volumes. MailArchiva will immediately create a new volume with "v3" appended to the store and index path. For example, assuming the original volume had a store path of C:\volume\store1 and index path of C:\volume\index1, the new V3 volume will have a store path of c:\volume\store1v3 and an index path of C:\volume\index1v3.
When the conversion process is completed, the old V1 volume will be removed from MailArchiva's configuration, although the old volume data will still be present on disk. For safety's sake, the conversion process does not delete any source volume store or index data.
Backup Volume
This function initiates the backup of the entire contents of a volume to a remote location (such as an SMBFS mount, or remote object store). Before initiating the manual backup of a volume, it is necessary to configure and enable backups in Configuration->Backup.
Merge Volume
The presence of too many volumes (>200) can impact on search performance. The reason being, MailArchiva needs to search across many volume indexes and then combine the result set. As such, the need may arise to merge volume data. To initiate a merge of volume data, in Configuration->Volumes, select all the volumes that need to be merged. Click the Merge button at the top left, and choose the target volume where the combined volume data should reside and proceed. As with most other operations in MailArchiva, the results of the Merge operation will appear in the Tasks view.
Import Data
The Import Data button allows one to import data from outside sources. Using this function, it is possible to import email data from PST's, EML, MSG, OST, MS Exchange direct import, etc. Refer to Email Migration for specific instructions on how to import data.
Export Data
The Export Data button provides options for exporting data to EML (RFC 2822) format. The purpose of this option is to facilitate the bulk export of data to external systems.
Import Volume
The Import Volume feature is used when old volume data is being restored from backup or when migrating data from another MailArchiva server. In this case, click the Import Volume button, enter the store and index path of the volume to be imported. If the index data was not saved, enter the location of an empty directory and reindex after clicking Save. Refer to Import Volume for instruction on how to import a volume created in another MailArchiva instance.
Rencrypt Volume
MailArchiva may not be able to read an imported volume due to the fact that it was encrypted using a different encryption key to the one that is defined in the system. To recover from the situation, it may be necessary to reencrypt data associated with an old volume. Refer to Rencrypt Volume for more information on this operation.
Mount / Unmount Volume
A mounted volume is considered available for use by the system. Conversely, an unmounted volume is considered inaccessible by the system and cannot be used for archiving/search purposes.
There are a variety of situations wherein it is useful to unmount volumes:
- Moving Volumes - When a volume is unmounted, it's store and index path may be modified. Refer to Move Server for further instructions on how to move volumes.
- Compaction - When a volume is unmounted, the option to compact (further compress) the volume becomes available. Refer to Compact Volume further below.
- Reconfiguring Network Shares - When temporarily disconnecting a network share, it is advisable to unmount the volume first.
- Removable Media - It is recommended to unmount a volume in the situation when volume data resides on removal media and that media is planned to be temporarily removed from the system.
Compact Volume
The Compact Volume feature is used for further compressing volume data. In many instances, further savings of up to 25% in disk space can be obtained by simply compacting old volume data. Before compacting a volume, ensure there is at least 4 GB of space available on the disk pointed to by the volume's store path. The extra space is needed since the compaction process will rewrite all archive files residing in the volume's store path.
To compact a mounted volume with Active, Closed or Backup status, click the Compact button next to the target volume in Configuration→Volumes to begin the compaction process. It is important to note that since compaction involves rewriting data, it is advisable not to restart/ shut down the server until such time as the compaction process has completed.
Rebuild Statistics
The Analytics section offers insights into the nature of traffic flowing through MailArchiva. It relies upon the Druid columnar database for storing and querying statistical information. If the Druid columnar database gets corrupted, it may be necessary to initiate a rebuild of the statistical data by clicking on the Rebuild Statistics button in Configuration->Volumes. The process steps through all data, and regenerates the statistical information needed for presentation of analytical information.
Build Threads
In the search interface, it is possible to view emails in the context of their hierarchical conversation threads. MailArchiva uses sophisticated algorithms to determine how and which emails are related to one another. Throughout the running of the system, it scans and links newly archived email data into conversation threads. A graph database is used to store the threading information. Should this graph database become corrupted, click on the Build Threads button to initiate a rebuild of threading data.
Found this information useful? Visit mailarchiva.com to learn more about MailArchiva.