7. Working with facets

Besides keyword searching, the indexed items can be browsed using facets, which represent specific item properties.

Every facet organizes the items into groups (possibly hierarchical) depending on a specific item property.

Clicking on a facet in the facets panel will open a list of all values of the selected facet right below the facet name. In the example below, the Type facet has a list of file types as values.

Facet panel

To search for items that match a facet value, select the value and click the Search button beneath the values.

Note

It is possible to select more than one facet value at a time by holding down the Ctrl key when clicking on the facet values.

7.1. Available facets

7.1.1. Saved Searches

The Saved Searches is a list of previous sets of searches that the user has stored.

When search results are displayed in the Cluster Map and the Searches list, the Save button beneath the Searches list will be shown.

Saved searches

When the user clicks this button, a dialog opens that lets the user enter a name for the saved search. After clicking on the OK button, the chosen name will appear in the list in the Saved Searches facet.

Note

Predefined Saved Search called ‘Possible spam’ is added to every newly created case. It can be found under “Default searches” branch.

Click on the name of the saved search and then on the Restore button to bring the Cluster Map and the Searches list back into the state it had when the Save option was used.

The “Replace current results” checkbox controls what happens with the currently displayed searches when you restore a saved search. When turned on, the Cluster Map and Searches list will be emptied first. When selected, the contents of the saved search will be appended to them.

When the ‘Combine queries’ checkbox is selected, searches contained in the selected saved search will be combined to search for items matching any of the contained searches (Boolean OR operator). The items will be returned as a single set of results (one cluster).

Note

Backwards compatibility

As underlying model of how Intella Connect stores the data was changed in version 1.8.x so some of the Saved searches created with previous versions might not be compatible with 1.8.x anymore - in that case they will be marked as obsolete.

7.1.2. Features

The Features facet allows you to identify items that fall in certain special purpose categories:

  • Encrypted: all items that are encrypted. Example: password-protected PDF documents. When you select this category and click the Search button, you will be shown all items that are encrypted.

    Note

    Sometimes files inside an encrypted ZIP file are visible without entering a password, but a password still needs to be entered to extract the file. Such files cannot be exported by Intella Connect if the password has not been provided prior to indexing. In this case both the ZIP file and its encrypted entries will be marked as Encrypted, so searching for all encrypted items and exporting those will capture the parent ZIP file as well.

  • Decrypted: all items in the Encrypted category that Intella Connect was able to decrypt using the specified access credentials.

  • Unread: all emails, SMS/MMS, chat messages and conversations that are marked as “unread” in the source file. Note that this status is not related to previewing in Intella Connect.

    Note

    This property is only available for PST and OST emails and some cellphone dumps. If the Unread property is not set, it could mean that either the item was not read or that the property is not available for this item. Some tools allow the user to reset a message’s unread status, so even when the flag is set, it cannot be said with certainty that the message has not been read.

  • Empty documents: all items that have no text while text was expected. Example: a PDF file containing only images.

  • Has Duplicates: all items that have a copy in the case, i.e. an item with the same MD5 or message hash.

  • Has Geolocation: indicates whether a geolocation has been associated with the item, either as part of the original metadata or through an IP geolocation lookup.

  • OCRed: indicates whether the item has been OCRed after indexing.

  • Has Imported Text: all items that have text imported using importText option in Intella Command-line interface.

  • Content Analyzed: all items for which the Content Analysis procedure has been applied.

  • Exception items: all items that experienced processing errors during indexing. This has six subcategories:

    • Unprocessable items: the data cannot be processed because it is corrupt, malformed or not understood by the processor. Retrying will most likely result in the same result.
    • I/O errors: the processing failed due to I/O errors. The processing might succeed in a repeated processing attempt.
    • Decryption failures: the data cannot be processed because it is encrypted and a matching decryption key is not available. The processing might succeed in a repeated processing attempt when the required decryption key is supplied.
    • Timeout errors: the processing took too long and was aborted.
    • Truncated text: the extracted text was not processed entirely because it exceeded the limit.
    • Out of memory errors: the processing failed due to a lack of memory.
    • Processing errors: the processing failed due to a problem/bug in the processor. The description should contain the stack trace.
    • Decryption failed: The data cannot be processed because it is encrypted and a matching decryption key is not available. The processing might succeed in a repeated processing attempt when the required decryption key is supplied.
    • I/O errors: The processing failed due to I/O errors. The processing might succeed in a repeated processing attempt.
    • Out of memory: The processing failed due to a lack of memory.
    • Processing error: The processing failed due to a problem/bug in the processor. The description should contain the stack trace.
    • Timeout: The data processing took too long and was aborted.
    • Unprocessable data: The data cannot be processed because it is corrupt, malformed or not understood by the processor. Retrying will most likely result in the same result.
  • Extraction Unsupported: all items that are larger than zero bytes, whose type could be identified by Intella Connect, are not encrypted, but for which Intella Connect does not support content extraction. An example would be AutoCAD files: we detect this image type but do not support extraction any content out of it.

  • Text Fragments Extracted: indicates whether heuristic string extraction has been applied on a (typically unrecognized or unsupported) binary item.

  • Irrelevant: all items that fall into one of the categories below and that themselves are considered to be of little relevance to a review (as opposed to their child items):

    • Folders
    • Email containers (PST, NSF, Mbox, …)
    • Disk images (E01, L01, DD, …)
    • Cellphone reports (UFDR, XRY XML, …)
    • Archives (ZIP, RAR, …)
    • Executables (EXE, BAT, …)
    • Load files (DII, DAT, …)
    • Empty (zero byte) file
    • Embedded images - defined below
  • Threaded: all items that have been subjected to email threading processing and that were subsequently assigned to a thread (see the Email Thread facet). Subtypes:

    • Inclusive: all email items marked as inclusive.
    • Non-Inclusive: all email items marked as non-inclusive.
    • Missing Email Referent: Indicates that the threading process has detected that the email item is a reply to another email or a forwarded email, but the email that was replied to or that has been forwarded is not available in the case.
  • Recovered: all items that were deleted from a PST, NSF, EDB, disk image, cellphone report or cloud source and that Intella Connect could still (partially) recover. The items recovered from PST, NSF and EDB are the items that appear in the artificial “<RECOVERED>” and “<ORPHAN ITEMS>” folders of these files in the Location facet. This branch has the following sub-branches, based on the recovery type and the container type:

    • Recovered from PST.
    • Orphan from EDB.
    • Orphan from NSF.
    • Orphan from PST.
    • Recovered from cellphone.
    • Recovered file metadata from disk image.
    • Recovered entire file content from disk image.
    • Recovered partial file content from disk image.
  • Attached: all items that are attached to an email. Only the direct attachments are reported; any items nested in these attachments are not classified as Attachment.

  • Has attachments: all emails, documents and user activities that have other items attached to it. Note that it does NOT include embedded images.

  • Embedded Images: all items that have been extracted from a document, spreadsheet or presentation.

  • Tagged: all items that are tagged.

  • Flagged: all items that are flagged.

  • Batched: all items that are assigned to at least one batch

  • Commented: all items that have a comment made by a reviewer.

  • Previewed: all items that have been opened in Intella’s Previewer.

  • Opened: all items that have been opened in their native application.

  • Exported: all items that have been exported.

  • Redaction: all items that have been subject to one of the redaction procedures. See the section on Redaction for more information.

    • Redacted: all items that have one or more parts blacked out due to redactions. Items on which the Redact function has been used but in which no parts have actually been marked as redacted are not included in this category.
    • Queued for Redaction: all items that have their Queued for Redaction checkbox selected. These will turn to Redacted once the user performs the Process Redaction Queue function on them.
    • Missing keyword hits: all items that had a redaction issue when Process Redaction Queue was invoked.
  • Top-Level Parent: all items that are the top-level parent. Top-level parents are determined per the Show Parents settings, configurable with desktop versions of Intella.

  • All items: all items (non-deduplicated) in the entire case.

Note

In cases in which multiple reviewers have been active, i.e. shared cases or cases with imported Work Reports, the Previewed, Opened, Exported, Commented, Tagged, Flagged and Redacted nodes shown in the Facet panel will have sub-nodes, one node for each user.

7.1.3. Tags

Tags are labels defined by the user to group individual items. Typically used tags in an example are for example “relevant”, “not relevant” and “legally privileged”. Tags are added to items by right-clicking in the Details panel and choosing the Add Tags… option. Tags can also be added in the Previewer or by applying a Coding decision to an item via Coding Form.

To search for all items with a certain tag, select the tag from the Tags list and click the Search button below the list.

The Tags facet panel will have a drop-down list at the bottom, listing the names of all reviewers that have been active in this case. You can use this list to filter the tags list for taggings made by a selected reviewer only. Select the “All users” option to show taggings from all users.

If the same tag has been used by different reviewers, their names and the numbers of tagged items are displayed in the tag statistics line (in the parentheses after the tag name).

The tags can be organized into a hierarchical system by the creation of sub-tags within an existing (parent) tag group. You can create a sub-tag, from “Add tags…” dialog by specifying a parent tag in a drop down list.

To rename a tag or change the tag description, select the tag in the facet and choose “Edit…” in the context menu.

Important: When a tag is renamed, all items associated with this tag will be assigned the new tag name automatically. However, some operations that depend on specific tag names (such as indexing tasks with the Tag condition) may need to be corrected manually.

To delete a tag, select it in the facet and choose “Delete…” in the context menu. This might require a special permission being assigned to your user.

7.1.4. Custodians

Custodians are assigned to items to indicate the owner from whom an evidence item was obtained. The “Custodians” facet lists all custodian names in the current case and allows searching for all items with a certain attribute value. Custodian name attributes are assigned to items either automatically (as part of post-processing) or manually in the Details panel. To assign a custodian to items selected in the Details panel, use the “Set Custodian…” option in the right-click menu.

To remove custodian information from selected items, choose the “Clear Custodian…” option.

To delete a custodian name from the case and clear the custodian attribute in all associated items, select the value in the facet panel and choose “Delete” in the right-click menu.

7.1.5. Location

This facet represents the folder structure inside your sources. Select a folder and click Search to find all items in that folder.

Location facet

When “Search subfolders” is selected, the selected folder, all items in that folder, and all items nested in subfolders will be returned, i.e. all items in that entire sub-tree.

When “Search subfolders” is not selected, only the items nested in that folder will be returned. Items nested in subfolders will not be returned, nor will the selected folder itself be returned.

When your case consists of a single indexed folder, then the Location tree will show a single root representing this folder. Selecting this root node and clicking Search with “Search subfolders” switched on will therefore return all items in your case.

When your case consists of multiple mail files that have been added separately, e.g. by using the PST and NSF source types in the New Source wizard, then each of these files will be represented by a separate top-level node in the Location tree.

By default Location facet will expand all root sources so that their children are immediately visible. This behavior can be changed using Preferences > Facets.

7.1.6. Email Address

This facet represents the names of persons involved in sending and receiving emails. The names are grouped in ten categories:

  • From
  • Sender
  • To
  • Cc
  • Bcc
  • Addresses in Text
  • All Senders (From, Sender)
  • All Receivers (To, Cc, Bcc)
  • All Senders and Receivers
  • All Addresses

Most emails typically only have a From header, not a Sender. The Sender header is often used in the context of mailing lists. When a list server forwards a mail sent to a mailing list to all subscribers of that mailing list, the message send out to the subscribers usually has a From header representing the conceptual sender (the author of the message) and a Sender header representing the list server sending the message to the subscriber on behalf of the author.

7.1.7. Phone Number

This facet lists phone numbers observed in phone calls from cellphone reports as well as phone numbers listed in PST contacts and vCard files.

The “incoming” and “outgoing” branches are specific to phone calls. The “All Phone Numbers” branch combines all of the above contexts.

7.1.8. Chat Account

This facet lists chat accounts used to send or receive chat messages, such as Skype and WhatsApp account IDs. Phone numbers used for SMS and MMS messages are also included in this facet.

This facet also supports the filtering options described in the Email Address section.

7.1.9. Recipient Count

This facet lets the user query for the number of recipients of communications. The primary use case for this is filtering out all emails, chat messages etc. that are between two parties and no one else.

7.1.10. Date

This facet lets the user search on date ranges by entering a From and To date. Please note that the date entered in the To field is considered part of the date range.

Besides start and end dates, Intella Connect lets the user control which date attribute(s) are used:

  • Sent (e.g. all e-mail items)
  • Received (e.g. all e-mail items)
  • File Last Modified (e.g. file items)
  • File Last Accessed (e.g. file items)
  • File Created (e.g. file items)
  • Content Created (e.g. file items and e-mail items from PST files)
  • Content Last Modified (e.g. file items and e-mail items from PST files)
  • Primary Date
  • Family Date
  • Last Printed (e.g. documents)
  • Called (e.g. phone calls)
  • Start Date (e.g. meetings)
  • End Date (e.g. meetings)
  • Due Date (e.g. tasks)

All fields can be de/selected with “Check / uncheck all” checkbox.

The Date facet will only show the types of dates that actually occur in the evidence data of the current case.

Furthermore it is possible to narrow the search to only specific days or specific hours. This makes it possible to e.g. search for items sent outside of regular office hours.

Primary and Family dates

While processing the dates of all items, Intella Connect will try to pick a matching date rule based on the item’s type and use it to determine the Primary Date attribute for that item. The rules affecting this process are configurable with desktop versions of Intella and currently cannot be changed in Intella Connect. Reindexing the case or modifying rules used to compute Primary Dates may also affect values of Family Date attribute for items, as those two attributes are tightly related. To learn more about those attributes please refer to this section of the manual.

7.1.11. Type

This facet represents the file types (Microsoft Word, PDF, JPEG, etc.), organized into categories (Communication, Documents, Media etc.) and in some cases further into subcategories. To refine your query with a specific file type, select a type from the list and click the Search button.

Note that you can search for both specific document types like PNG Images, but also for the entire Image category.

Empty (zero byte) files are classified as “Empty files” in the “Others branch”.

7.1.12. Author

This facet represents the name(s) of the person(s) involved in the creation of documents. The names are grouped into two categories:

  • Creator
  • Contributor

To refine your query by a specific creator or contributor name, select the name and click the Search button.

7.1.13. Content Analysis

The Content Analysis facet allows you to search items based on specific types of entities that have been found in the textual content of these items. Three of the categories in this facet are populated automatically during indexing and are available immediately afterwards. These are:

  • Credit card numbers
  • Social security numbers (SSNs)
  • Phone numbers

The other categories are more computationally expensive to calculate and therefore require an explicitly triggered post-processing step. These categories are:

  • Person names
  • Organizations (e.g. company names)
  • Locations (e.g. city and country names)
  • Monetary amounts
  • Time (words and phrases related to the hours, minutes, weekdays, dates, etc.)
  • Skin tone (sub-categorized as Weak, Medium and Strong based on the presence of human skin colors, applies only to images)
  • Custom regular expressions (for searching e.g. bank account numbers, patent numbers and other types of codes that can be formally described as a regular expression)

To learn more about how to conduct Content Analysis please refer to section Details panel > Content analysis.

7.1.14. Email Thread

In the Email Thread facet you can search for emails based on the email thread identified by the email threading procedure. To populate this facet, a user needs to perform the email threading procedure on a selected set of items. Please see the Email Threading section for instructions.

Be default, all threads containing only a single email are hidden from view, as they can greatly increase the length of the list and are typically of little use. To include these threads in the list, uncheck the “Hide threads with one email” checkbox.

7.1.15. Keyword Lists

In the Keyword Lists facet you can load a keyword list, to automate the searching with sets of previously determined search terms.

A keyword list is a text file in UTF-8 encoding that contains one search term per line. Note that a search term can also be a combination of search terms, like “Paris AND Lyon”.

Once loaded, all the search terms (or queries) found in the keyword list are shown in the Keyword Lists facet. They are now available for search.

When the ‘Combine queries’ checkbox is selected, multiple keywords selected from a specific keyword list will be combined to search for items matching any of the selected terms (Boolean OR operator). The items will be returned as a single set of results (one cluster). If the checkbox is not selected, the selected terms will be searched separately, resulting in as many result sets as there are selected queries in the list.

Tip

Keyword lists can be used to share search terms between investigators.

7.1.16. MD5 and Message Hash

Intella can calculate MD5 and message hashes to check the uniqueness of files and messages. If two files have the same MD5 hash, Intella considers them to be duplicates. Similarly, two emails or SMS messages with the same message hash are considered to be duplicates. With the MD5 and Message Hash facet you can:

  1. Find items with a specific MD5 or message hash and
  2. Find items that match with a list of MD5 and message hashes.

Specific MD5 or message hash

You can use Intella Connect to search for files that have a specific MD5 or message hash. To do so, enter the hash (32 hexadecimal digits) in the field and click the Search button.

List of MD5 or message hashes

The hash list feature allows you to search the entire case for MD5 and message hash values from an imported list. Create a text file (.txt) with one hash value per line. Use the Add… button in the MD5 Hash facet to add the list. Select the imported text file in the panel and click the Search button below the panel. The items that match with the MD5 or message hashes in the imported list will be returned as a single set of results (one cluster).

Tip

Install a free tool such as MD5 Calculator by BullZip to calculate the MD5 hash of a file. You can then search for this calculated hash in Intella Connect to determine if duplicate files have been indexed.

7.1.17. Item ID Lists

In the Item ID Lists facet you can load a list of item IDs, to automate the searching with sets of previously determined item IDs.

An item ID list is a text file in UTF-8 encoding that contains one item ID per line.

Once loaded into the case, you can select the list name and click Search. The result will be a single result set consisting of the items with the specified IDs. Invalid item IDs will be ignored.

7.1.18. Language

This facet shows a list of languages that are automatically detected in your item texts.

To refine your query with a specific language, select the language from the list and click the Search button.

Important

If Intella cannot determine the language of an item, e.g. because the text is too short or mixes multiple languages, then the item will be classified as “Unidentified”.

Important

When language detection is not applicable to the item’s file type, e.g. images, then the item is classified as “Not Applicable”.

7.1.19. Size

This facet groups items based on their size in bytes.

To refine your query with a specific size range, select a value from the list and click the Search button.

7.1.20. Duration

This facet reflects the duration of phone calls listed in a cellphone report, grouped into meaningful categories.

7.1.21. Device Identifier

This facet groups items from cellphones by the IMEI and IMSI identifiers associated with these items. Please consult the documentation of the forensic cellphone toolkit provider for more information on what these numbers mean.

7.1.22. Export Sets

All export sets that have been defined during exporting are listed in this facet. Searching for the set returns all items that have been exported as part of that export set.

7.2. Including and excluding facet values

Facet values can be included and excluded. This enables filtering items on facet values without these values appearing as individual result sets in the Cluster Map visualization.

To include or exclude items based on a facet value, select the value and click on the arrow in the facet’s Search button. This will reveal a drop-down menu with the Include and Exclude options.

Search include exclude

7.2.1. Including a facet value

Including a facet value means that only those search results will be shown that also match with the chosen included facet value.

Example

The user selects the facet value “PDF Document” and includes this facet value with the drop-down menu of the Search button in the facet panel. The Searches panel in the Cluster Map shows that “PDF Document” is now an included term. This means that from now on all result sets and clusters will only hold PDF Documents. Empty clusters will be filtered out.

See the image on the right side for an example: the “Enron” search term resulted in 1,606,638 items, but after applying the PDF Documents category with its 22,167 items as an inclusion filter, only 6,325 items remain.

When multiple includes are used, the results are filtered for all items that are in at least one of the include sets, i.e. it is like filtering with the union of all includes.

7.2.2. Excluding a facet value

Excluding a facet value means that only those search results will be shown that do not match with the chosen excluded facet value.

Example

The user selects the facet value “PDF Document” and excludes this facet value with the drop-down menu of the Search button in the facet panel. The searches panel in the Cluster Map shows that “PDF Document” is excluded. As long as this exclusion remains, all result sets and clusters will not hold any PDF Documents. Empty clusters will be filtered out.

Note

Excludes are often used to filter out privileged items before exporting a set of items, e.g. by tagging items that match the privilege criteria with a tag called “privileged”.

In this scenario it is important to realize that when exporting an email to e.g. Original Format or PST format, it is exported with all its attachments embedded in it. The same applies to a Word document: it is exported intact, i.e. with all embedded items. Therefore, when an attachment is tagged as “privileged” and “privileged” is excluded from all results, but the email holding the attachment is in the set of items to export, the privileged attachment will still end up in the exported items.

The solution is to also tag both the parent email and its attachment as “privileged”. The tagging preferences can be configured so that all parent items and the items nested in them automatically inherit a tag when a tag is applied to a set of items. When filtering privileged information with the intent to export the remaining information, we recommend that you verify the results by indexing the exported results as a separate case and checking that there are no items matching your criteria for privileged items.