Volumes

 

MailArchiva organises archived data into one or more logical volumes. Each volume consists of an index and a store. A volume's index contains information used purely for searching purposes. A volume store contains the actual archived information. If the store data is deleted from disk, it is not possible to recover archived information unless a backup is available. The index data, however, is used purely for searching purposes and can be regenerated at any stage from the information residing in the store.

 

All Volume functions are accessible in Configuration->Volumes from the server console. To further your understanding of MailArchiva's volume features, click on the topics that interest you below:

 

Section Description

 

Store Encryption Password
Volume data is encrypted using AES password-based encryption
New Volume Creating your first volume (or a new volume)
Volume Formats Choosing a suitable volume format
Using Object Storage Referencing locations on an object storage service (e.g. Amazon AWS, Azure Blob Storage)
Using Remote Storage Referencing locations on a remote network drive (e.g. NFS, SMBFS)
Volume Status Understanding volume statuses and their meaning
Volume Rollover Configuration volumes to rollover periodically
Volume Reindex Regenerating volume indexes
Volume Index Update Update volume indexes
Volume Backup Initiate manual backup of a volume
Volume Merge Merge the data of two or more volumes into a target volume
Volume Conversion Convert old volumes to the latest format
Volume Reencrypt Reencrypt a volume to use a different key
Import Data Import data from outside sources
Export Data Exporti data to common formats
Import Volume Import backed up volumes or volumes from another server
Unmount/Mount Volume Unmount/mount volumes
Compact Volume Compact volume data
Rebuild Statistics Regenerate analytics data
Build Threads Regenerate threading information


Store Encryption Password
 

Volume store data is encrypted using a chosen store encryption password. This measure ensures that sensitive archived remains private and intact. All store data is encrypted using AES-128 encryption and compressed using standard ZIP compression. The Store Encryption Password must be chosen carefully as it is not possible to change it once it has been set.
 

Big Note: Please ensure that the store encryption password can never be forgotten. Also,  that the password is correctly documented and will survive termination of employment.

 


New Volume
 

Click the New Volume button in Configuration->Volumes to create a new logical volume. After doing so, a new volume will appear with its index and store paths editable.

Both the index and store paths can refer to any path location on disk. By selecting the down arrow inside each text box, a folder selection dialog appears, enabling one to select an appropriate index and store path.  Alternatively, simply type in the location of an appropriate index and store path (e.g. “c:\store1” and “c:\index2”). 

 

Object Store Connection: If there is requirement to archive to remote object stores, it is also possible to select an object store connection in the store path for archival to object store services. Refer to Using Object Stores below.

 

A volume's index and store path cannot refer to the same location on the disk. Each volume must have its own unique store and index path. There cannot be any overlap with the store and index paths defined by any other volume. Furthermore, if the index and store path specified do not exist, they will be created automatically. Click Save when you're satisfied the index and store path are correct.

 

Important Safety Tip:  Please do not place volume store directories anywhere under the MailArchiva program directory (e.g.  C:\Program Files\MailArchiva or /opt/mailarchiva) as the program directory will be recursively deleted during upgrades. Your volume stores should always reside outside the MailArchiva program directory.

 

Volume Formats

 

MailArchiva supports a variety of storage formats, each of which have their trade-offs (see table below).

 

Volume Format Description Advantages Disdvantages Recommendation
V1 (Default MailArchiva v1-v2)

Stores data in individual files. Filename is the hash of specific message headers. Data is stored

in format /123/456/7890abcdefghijk.mrc. Attachments stored separately as /123/456/7890abcdefghijk.mrc.att.

Corruption is only limited to files that are corrupted.

archiving is fast (200+ blobs / sec)

ZLib compression
Low memory footprint

time consuming to backup & copy if the backup works at individual file level (not disk or block level)
Consumes more space on disk due to small files filing half blocks

Defaults to PBEWithMD5AndTripleDES algorithm. It can be changed  in config if so desired.

Too many empty directories
Many wasted inodes (not optimal)

Legacy only
V2 (Default MailArchiva v3-v6) Stored data in 4096 compound archive files (ZIP format) with .zz extension. Filename is the hash of specific message headers.  Data is stored as follows 123.zz/456/7890abcdefghijk.nfo and attachments stored as 123.zz/456/7890abcdefghijk.att. It is easier to backup and copy data since there are far fewer files.
AES-128 encryption
ZIP compression

if disk corruption occurs, it can corrupt compound archive files containing large amounts of data.

uses more memory as an index to compound archives that reside in memory.

limited to around 6 million messages per volume.

Archiving is slower (80 blobs / sec)

Access times are slower

Suitable for small / medium size companies
Not suitable for large companies due to performance and memory constraints
V3 Stores data in individual files. Filename is a hash of specific message headers. Data is stored as follows: 12/34/1234567890abcdefghijk. Attachments not stored separately. Blob is a AES256 Gzipped container including the header ContentType. Folder structure is a balanced tree (16^2 n first layer, 16^2 on second layer) with total 16^4 (65536) folders in the structure. The number of i'nodes used is 16^4 (for structure) + number of blobs.

Corruption is only limited to files that are corrupted.

Archiving is blazing fast! (500 blobs / sec)

AES-256 encryption
Zlib compression
Unlimited volume size

Low memory footprint

time consuming to backup & copy if the backup works at individual file level (not disk or block level)
Consumes more space on disk due to small files filing half blocks

Attachments are not stored separately as our studies indicate that there is negligible disk space saving in real world scenario.

Suitable for large companies or companies that require high performance archiving.
EXTERNAL

Stores data in remote object stores (e.g. Minio, Backblaze, etc.). Data is stored in one bucket as follows: 1234567890abcdefghijk. Attachments are not stored separately.

 

 

usually fast

redundancy - data stored in three or more places

volume size - highly scalable

AES-256 encryption

Zlib compression

time consuming to backup & copy

Attachments are not stored separately as our studies indicate that there is negligible disk space saving in real world scenario.

Suitable for medium to large companies or companies that wish to store their data in the cloud.
 
How To Switch Volume Format

 

To switch volume formats:

 

  1. Login to the MailArchiva web console
  2. In Configuration->Archive, set New Volume Format to desired Format, Save.
  3. In Configuration->Volumes, create a New Volume, Save.

 

Using Object Stores

 

MailArchiva supports archiving data to Blackblaze B2, Amazon S3/Glacier, EMC Atmos, OpenStack Swift, Azure Blob Storage, Azure Archival Storage, Rackspace Cloud Files and Minio object storage services. By default, the MailArchiva setup wizard will create a local volume. However, the product also supports the creation of volumes residing in remote cloud object stores. It is possible for MailArchiva to reference both local and remote volumes at the same time.

 

The process for configuring an archival to an external object store as follows:

 

  1. Either create an account with a public object store service (e.g. Backblaze B2, AWS or Azure), or install an on-premise object store service (e.g. Minio or Open Swift).
  2. Define an object store connection in Configuration->Connections, Save
  3. Test the object store connection to ensure that it is able to connect.
  4. In Configuration->Archive, set New Volume Format to External, Save.
  5. By default, the MailArchiva setup wizard will create a local volume. This volume cannot be switched to an external one and ought to be deleted (if empty).
  6. Create a new logical volume by clicking New in Configuration->Volumes.   When entering the store path, click the down arrow to select the object store connection. In the index path, enter or select a local path on disk. Index data is not suited to residing in object storage, and thus the index location must still refer to a local route.
  7. Click Save to save changes.

MailArchiva stores object store data in a bucket whose name is specified during the creation of the object store connection. The data resides in a sub directory whose name is taken from the ID of the volume.  

 

How To Move Historical Volumes To Object Storage: Create new volume that references an object store connection as above, then use the Volume Merge feature to merge data from historicial volumes into the newly created object store volume.

 

Using Remote Storage

 

 

While it is possible to refer to locations on a remote NAS, it is not recommended to place the index data at a remote location. For performance reasons, MailArchiva’s search engine requires very low latency when accessing the index. It is however, entirely acceptable for store data to reside at a remote SAN or NAS since this data is accessed relatively infrequently.

 

Important Compatiblity Notice: By default, MailArchiva uses a very sophisticated storage engine (we call it V2 storage format) that involves simultaneously appending toa total of 4096 archive files. In some environments,  particularly virtualized ones that involve the use of specific NAS'es or SAN'es, archiving may occur extremely slowly.

Unfortunately, it appears certain NASes and SAN's simply cannot cope with the sophisticated file locking needs of the V2 storage format. To circumvent the problem, it is possible to switch to the older (less sophisticated) V1 storage format. To do this, change the Volume Storage Format to V1 in Configuration->Archive. Thereafter, close the existing ACTIVE volume (if necessary) and create a new Volume in Configuration->Volumes. Data written to the new volume, will be stored in the less sophisticated V1 storage format.


Windows

In the Windows version of MailArchiva, it is possible to specify a UNC path referring to a remote store path location (e.g. \\server\store\store0). Before doing this, ensure that the MailArchiva service is running under an administrator account.

 

Note: Please ensure that the MailArchiva Windows Service is running under a logon account that has sufficient privileges to read and write to the remote drive.


Follow the steps below to grant the appropriate permissions so that MailArchiva is able to read/write to the remote network drive:
 

  1. Open the Windows Services Control Panel applet (not the MailArchiva task tray icon configuration!)
  2. Select the MailArchiva Server Service
  3. Right click, select Properties, select Logon Tab
  4. Enter the Domain administrator account login account details
  5. Save and restart the MailArchiva server.

If, after restarting, the volume store path is set to a remote location and the volume is shown as EJECTED, it is likely there is still a permissions issue. To resolve: logout the workstation, login to the Windows Service account chosen above, and attempt to access the Windows share from Explorer. Create a text file with arbitrary text inside it and open the text file to ensure that the share can be written/read using the chosen account.


Linux

Specifying UNC paths are not supported in the Unix versions of MailArchiva. Rather, a mount point must be defined in your /etc/fstab file and set to the base location of your NAS or SAN
disk. Refer to Network Attached Storage for more information on how to setup Linux mount points correctly.

 

Note: When setting using remote storage devices from Linux, it is imperative ensure that the immutable flag is set on the mount point. Refer to Network Attached Storage for more information.

 

After the appropriate mount point has been created, enter  the equivalent of /mnt/archive/store0 for the volume store path.

 

Note: If you are running MailArchiva in ‘Appliance Mode’, the external mount points can be defined from within the Volumes section of the web console configuration.


Volume Status

 

When a new volume is created and the configuration saved, it will be assigned the “UNUSED” status. When the first email or document is archived, the server will automatically switch over to the first unused volume on the list and set its status to “ACTIVE”. This volume will stay active until such time as its maximum size is exceeded, the disk is full, or the volume is explicitly closed.

 

 

Note: Both CLOSED and ACTIVE volumes are searcheable. At least one ACTIVE or UNUSED volume must be available for archiving to function correctly.
   
Volume Status Description
NEW The volume has just been created and has not been saved.
UNUSED The volume has been saved but it does not contain any information.
ACTIVE The volume is currently being used for archiving purposes.
CLOSED The volume is searchable, however, no further information can be written to it.
UNMOUNTED The volume is not searchable, nor can it be made active.
EJECTED The volume cannot access the volumeinfo file on the volume store path (either the file does not exist or there is a permissions issue)
BACKUP Volume is used for backup purposes. It is not searchable.
UNREADABLE The volume is unreadable as it was most likely encrypted using a different encryption key


Only one volume can be active at a time and once the active volume is closed, no further data can be written to it and it cannot be reopened from the server console. The purpose of this behaviour is to ensure that archive data is stored chronologically across multiple volumes.

If at any stage during the archiving process, the server finds that an active volume is not available, it will always activate the next unused volume on its list. Assuming there are no remaining unused volumes available, the server will stop the archiving process until a new volume is added.

When using removable disks, it is not recommended to remove the disk containing the active volume data without closing/unmounting the volume first. Any physical disk containing a closed volume may be removed provided that the volume whose store path refers to it is unmounted first.

When users search for emails, the search is conducted across both active and closed volumes.

 

Reopening a CLOSED Volume: For good reason, MailArchiva does not allow CLOSED volumes to be made ACTIVE again. This measure is needed to ensure that data kept in volumes stays in chronological order. That being said, if you know what you are doing, a volume can be "reopened" by editing a file called volumeinfo in the root of the volume store path. The volumeinfo file can be edited using a text editor such as Notepad. Look for the status CLOSED and change to ACTIVE. Please ensure that only one volume is  ACTIVE at any time, otherwise the system may become unstable.

 

Rollover Volume

 

In addition to defining volumes manually, one can configure MailArchiva to create and rollover to new volumes based on certain conditions, such as when the volume is full or when a certain time period has elapsed. This feature is useful for two reasons:
 

  • It allows one to keep volumes to a defined size so that they can be backed up on DVD media. 
  • It allows one to store archive information on a monthly, quarterly, annual basis so that the information can be organized chronologically.

 

During volume rollover, a new volume index and store path is automatically chosen. The index and store path of a newly created volume will be based on the store and index paths of the last volume that was created. There are two ways in which the paths are chosen:

 

  • By date - A date in the format YYYYMM is appended to both the store and index path (e.g. C:\store\201001 and C:\index\201001) (any_strYYYYMM). The store and index path of the next volume will have a new date appended.  Regular expresson match is (.*\\D)([0-9]{6,6})([a-z]{0,1}).*$.  if there is more than one path with the same name, it proceed proceeed as follows, ../store/vol201711b /index/vol2017b, /store/vol201711c /index/vol2017c and so forth.
  • By number - A numeric value is appended to both the store and index path (e.g. C:\store\store1 and C:\index\index1). The store and index path of the next volume will have an incremented value appended. ../store1 ../index0 (store1/index1, store1/index2, etc.) : regular expression match (.*\\D)(\\d{1,3})$.

 

If it is desired that volumes increment by date, enter a store and index path for the first volume with a YYYYMM appended. If volumes are to be incremented by number, enter a numeric value.

The following rollover options are available:

 

  • By size - rollover to the next volume when the maximum allocated size is reached.
  • By month - rollover on a monthly basis
  • By quarter - rollover on a quarterly basis
  • By year - rollover on a yearly basis

 

Rollover Period Advice: It is not adviseable to set a monthly rollover. The reason being; over a long period of time, hundreds of volumes will be created. When there are a large number of volumes, search speed will slow down considerably and the sheer number of volumes in use will be difficult to manage.

 

For a rollover to be successful, all of the following conditions must be met:

 

  • There must be an active volume
  • The active volume must confirm to store and index path rollover format (as defined in online help)
  • The prospective rolled over paths must be accessible, writeable and have enough disk space.
  • There must be no existing store or index in the prospective rolled over path
  • The active volume path must have a created date (it should do so anyway)

 

When creating a new volume during rollover, the base folder for the store and index paths is taken from the store and index paths of the active volume (the one due to be closed). Thus, when starting out, create a volume with the needed base index and store paths. The next time rollover occurs, MailArchiva should create a volume that uses the same base path.

 

At the time of rollover, if there is already an UNUSED volume present, MailArchiva will rollover to it and deliberately neglect to create a new one. This mini-feature is a convenient way to change the base index and store path of a forthcoming volume. If you do not intend to change the base path, simply ensure that there are no UNUSED volumes defined and MailArchiva will create a new volume in accordance with the volume index and store paths above.

 

As of MailArchiva v6.2.2, if a rollover is not successful, a notification will be sent alerting the Administrator as to why a rollover could not be performed. Furthermore, under such conditions, the active volume will be closed and received traffic will build up in the Receive Queue until a new volume is manually created.

 

Reindex Volume

 

It is necessary to reindex a volume or multiple volumes, in three situations:


a) Occasionally, when upgrading to a major release (for example, from MailArchiva V2 to MailArchiva V3)

b) When a volume’s search index is corrupted

c) There are expected emails missing from the search results

 

Before reindexing a volume, consider whether it is desirable/necessary/practical to index attachment content. Including attachment content in the index will result in considerably larger indexes. Therefore, searches on such indexes, particularly with high doc counts, will be slower to perform. Furthermore, greater disk space will be consumed by the index. If searching for attachment content is required, ensure that the indexes are located on dedicated fast SAS/SSD local disks. To disable attachment indexing, refer to Configuration->Index.

 

To reindex a volume, click on the Reindex button next to the desired volume. To reindex all Volumes, click on the Reindex button at the top of all volumes. A full Reindex of five million emails can take a day or more to complete. While reindexing is taking place, after logging out of the console, it is still possible to perform searches on indexed data.

 

Note: Index formats have changed from MailArchiva V2 to MailArchiva V3. Thus, when upgrading from MailArchiva V2 to V3, a reindex of all volumes is required. Refer to the Upgrade Instructions for more information.


Preparation

  • Ensure that the server has the correct memory settings applied.  For instance:
    • Heap should be set to around two thirds the size of the installed physical memory. (for example, assuming 2GB physical RAM, set heap to 1384MB)
    • Indexing requires available virtual memory. (free space outside of the heap/permgen)

Monitoring The Reindex Process

 

How to know whether a reindex is working / has been completed successfully? 

  • Verify that the reindex process is running by visiting Status->Processes.
  • Check the doc counts in Configuration->Volumes and Status->Volumes
  • During a reindex, the doc count should be visibly increased. If not, there is a problem that must be investigated.
  • Click the View button in Status->Tasks to see the Reindex log. There should be no reported errors.
  • After the reindex process has been completed, the doc count should reflect the final tally. 

 

If a volume cannot be reindexed correctly, refer to Indexing Troubles.

 

Update Volume Index

 

This action is similar to the Reindex Volume described above, except existing volume index data is not deleted at the start of the procedure. It has the advantage that users will still be able to access all volume data during the reindex process. The downside is that if a volume index is corrupted, deletion of the index may be necessary in order to recover from the corruption.

 

Convert Volume

 

The store format changed with the advent of MailArchiva v3. This option converts volumes archived with MailArchiva v2 and lower to the MailArchiva v3 format. The MailArchiva v3 is advantageous since it stores data in a total of 4096 archive files as opposed to store each email in its own file. This strategy is superior in that it is easier to work with the data on disk, the store data consumes less space on disk, and third party backup products are able to backup the store data far more quickly. The conversion process is a time-consuming one and can take several days to complete.
 

Note: When upgrading from MailArchiva V2 to V3, is not necessary to convert older volumes to V3 format as MailArchiva v3 is capable of reading and writing to older volumes.

 

Note: Conversion of old volumes to the new MailArchiva V3 format is about x10 slower than reindexing them. Therefore, when upgrading, it is recommended to simply reindex old volumes as opposed to converting them. Refer to the Upgrade Instructions for more information.

 

Note: It is not necessary to reindex volumes after they have been converted since the conversion process also involves regenerating the index.

 

To proceed with the conversion of a volume to the new V3 format, click the Convert Volume button in Configuration->Volumes. MailArchiva will immediately create a new volume with "v3" appended to the store and index path. For example, assuming the original volume had a store path of C:\volume\store1 and index path of C:\volume\index1, the new V3 volume will have a store path of c:\volume\store1v3 and an index path of C:\volume\index1v3. 

 

When the conversion process is completed, the old V1 volume will be removed from MailArchiva's configuration, although the old volume data will still be present on disk. For safety's sake, the conversion process does not delete any source volume store or index data.

 

Conversion Rollback: If there is a problem with the conversion process, it is possible to rollback to the old volume, by deleting the newly converted volume in Configuration-Volumes and importing the old volume using Import Volume feature. To import the old volume, click Import Volume and enter the original volumes' store and index path.


Backup Volume

 

This function initiates the backup of the entire contents of a volume to a remote location (such as an SMBFS mount, or remote object store). Before initiating the manual backup of a volume, it is necessary to configure and enable backups in Configuration->Backup. 

 

Merge Volume

 

The presence of too many volumes (>200) can impact on search performance. The reason being, MailArchiva needs to search across many volume indexes and then combine the result set. As such, the need may arise to merge volume data. To initiate a merge of volume data, in Configuration->Volumes, select all the volumes that need to be merged. Click the Merge button at the top left, and choose the target volume where the combined volume data should reside and proceed. As with most other operations in MailArchiva, the results of the Merge operation will appear in the Tasks view.

 

Import Data

 

The Import Data button allows one to import data from outside sources. Using this function, it is possible to import email data from PST's, EML, MSG, OST, MS Exchange direct import, etc. Refer to Email Migration for specific instructions on how to import data.


Export Data
 

The Export Data button provides options for exporting data to EML (RFC 2822) format. The purpose of this option is to facilitate the bulk export of data to external systems.

 

Import Volume

 

The Import Volume feature is used when old volume data is being restored from backup or when migrating data from another MailArchiva server. In this case, click the Import Volume button, enter the store and index path of the volume to be imported. If the index data was not saved, enter the location of an empty directory and reindex after clicking Save. Refer to Import Volume for instruction on how to import a volume created in another MailArchiva instance.

 

Duplicate Volumes: The system will not allow volumes to be imported whose index or store path matches those of an existing volume configured in the system . In addition, each volume must have a unique volume ID. The volume ID is specified in the volumeinfo file present in the store path, or index path (in the case of object stores).

 

Rencrypt Volume

 

MailArchiva may not be able to read an imported volume due to the fact that it was encrypted using a different encryption key to the one that is defined in the system. To recover from the situation, it may be necessary to reencrypt data associated with an old volume. Refer to Rencrypt Volume for more information on this operation. 

 

Mount / Unmount Volume

 

A mounted volume is considered available for use by the system. Conversely, an unmounted volume is considered inaccessible by the system and cannot be used for archiving/search purposes. 

 

There are a variety of situations wherein it is useful to unmount volumes:
 

  • Moving Volumes - When a volume is unmounted, it's store and index path may be modified. Refer to Move Server for further instructions on how to move volumes.
  • Compaction - When a volume is unmounted, the option to compact (further compress) the volume becomes available. Refer to Compact Volume further below.
  • Reconfiguring Network Shares - When temporarily disconnecting a network share, it is advisable to unmount the volume first.
  • Removable Media - It is recommended to unmount a volume in the situation when volume data resides on removal media and that media is planned to be temporarily removed from the system.

 

Compact Volume

 

The Compact Volume feature is used for further compressing volume data. In many instances, further savings of up to 25% in disk space can be obtained by simply compacting old volume data. Before compacting a volume, ensure there is at least 4 GB of space available on the disk pointed to by the volume's store path. The extra space is needed since the compaction process will rewrite all archive files residing in the volume's store path.

 

To compact a mounted volume with Active, Closed or Backup status, click the Compact button next to the target volume in Configuration→Volumes to begin the compaction process. It is important to note that since compaction involves rewriting data, it is advisable not to restart/ shut down the server until such time as the compaction process has completed.

 

Rebuild Statistics

 

The Analytics section offers insights into the nature of traffic flowing through MailArchiva. It relies upon the Druid columnar database for storing and querying statistical information. If the Druid columnar database gets corrupted, it may be necessary to initiate a rebuild of the statistical data by clicking on the Rebuild Statistics button in Configuration->Volumes. The process steps through all data, and regenerates the statistical information needed for presentation of analytical information.

 

Build Threads

 

In the search interface, it is possible to view emails in the context of their hierarchical conversation threads. MailArchiva uses sophisticated algorithms to determine how and which emails are related to one another. Throughout the running of the system, it scans and links newly archived email data into conversation threads. A graph database is used to store the threading information. Should this graph database become corrupted, click on the Build Threads button to initiate a rebuild of threading data. 

© 2005 - 2024 ProProfs

Found this information useful? Visit mailarchiva.com to learn more about MailArchiva.

-