23. Email Threading

The linear review of emails is often a time-consuming and expensive task to perform. One factor is that emails may quote the text of previous emails in the thread, resulting in a lot of redundant text. Take for example these three emails:

Email Threading Example

Marked in red is the redundant text. The text of the first two emails is quoted in full in the last email. When a reviewer reads the last email, he or she has read everything there is to read in this thread. The reality is often more complex, e.g. because people respond to the same root email, remove part of the quoted text, forward it to new recipients, or even alter the quoted text to cover up certain facts. Therefore, it is not always as simple as reading the last email in the thread.

Intella helps with this type of review through the process of email threading. First, it identities the emails that belong to the same thread. Within each thread, it links the replies and forwards to their parent emails, constructing a graph of how the conversation unfolded. All duplicates of a mail will be represented by the same node in this graph. Next, it compares the emails within the thread and determines the set of “inclusive” and “non-inclusive” emails. By default, a mail will be marked as inclusive. When Intella detects that one of the follow-ups of a mail (a reply or a forward) contains all its text and attachments, it will be marked as non-inclusive, as reading the latter email implies having read the first as well. Reading all the inclusive emails and their attachments in a thread implies having read everything there is to read in the thread. This can greatly reduce the time needed to review a large collection of emails.

Besides separating inclusive from non-inclusive emails, email threading enables several other functionalities:

  • Sort the emails in a thread in the Details view, to read the entire email thread sequentially.
  • Group the emails in the Details view by thread.
  • Visualize a specific email thread in the Email Thread tab of the Previewer. This shows how the previewed relates to the other emails in the thread, e.g. what email did it reply to, what replies did it trigger, are there different branches in the thread, how was its content forwarded, etc.
  • Tag all emails in a thread at once.
  • Identify missing emails in a thread. These are emails that are referred to in the email headers or in the metadata embedded in an email body, but that cannot be found in the current evidence data. This may indicate missing evidence data that an investigator may still be able to acquire, e.g. from other custodians or from a backup. If additional evidence becomes available later, it can be added to the case. The email threading processing will then attempt to use the new emails to resolve the missing emails.
  • List the normalized subjects of the email threads in the Email Thread facet.

Each email item that was processed by the Email Threading analysis is assigned the following properties:

  • Threaded - Indicates whether the item has been subjected to email thread analysis.
  • Inclusive - Indicates whether the email is inclusive.
  • Non-Inclusive - Indicates whether the email is non-inclusive.
  • Missing Email Referent - Indicates that the threading process has detected that the email item is a reply to another email or a forwarded email, but the email that was replied to or forwarded is not available in the case.
  • Email Thread ID - The unique identifier of the thread that the email has been placed in.
  • Email Thread Name - The normalized subject of the thread that the email has been placed in.
  • Email Thread Node Count - The number of nodes in the thread that the email has been placed in.

Furthermore, the algorithm establishes for each follow-up email if it is a Reply, Reply All, or Forward. This status is derived from the sender and receiver information, rather than from e.g. the Subject line. A loose but conceptually practical definition is:

  • If the set of participants of the response email is the same as the email that it is responding to (the previous email in the thread), it is a Reply All, unless this is a conversation between only two people, in which case it is a Reply.
  • If the response email is going to one or more people, and none of them was involved in the original email, it is a Forward.
  • In all other cases, it is a Reply.

Note

Performing email threading analysis is governed by the ‘Can perform email threading’ permission. Users who are not granted with it will not see the Email Threading action in the contextual menu.

As email threading is a computationally expensive algorithm, it requires an explicitly triggered post-processing step. To start the Email Threading procedure, select one or more items in the Details view and select “Email Threading…” in the right-click menu. This will open the dialog shown below:

Email Threading Dialog

Select Discard existing email threading data if you want to clear the Email Thread facet and all the data generated as part of previous runs of Email Threading procedure.

Select Analyze headers embedded in email body if you want the algorithm to take the headers embedded in the email body into account. Such headers are typically placed above the quoted text, referencing the original author and time of the quoted text and sometimes other metadata. This can be used to link emails together when the SMTP or mail container-specific metadata is missing or incomplete. This option may produce better results but is computationally expensive. When speed is not of the essence, we recommend turning this feature on.

Click the Run button to start the email threading process.

Once the process is done, the Email Thread facet will be populated and the email items that were part of the threading analysis will be augmented with the threading-related information.

Besides processing the selected items, Intella will automatically process all duplicate items and parent items as well.

Note

The “Analyze paragraphs” indexing option is a prerequisite for determining the inclusiveness of emails. If this option was not used during indexing, all emails will be marked as Inclusive.