dtSearch – Features
Instantly Search Terabytes of Text
dtSearch – Features
Document Filters and Supported Data
Document filters overview
dtSearch products embed dtSearch’s proprietary document filters to support a broad range of data types.
- For all supported data types, support covers parsing, indexing and searching of retrieved full-text and metadata.
- The document filters can also convert non-web-ready content like Microsoft Office document and email formats “on the fly” to HTML for web-based display, etc., with highlighted hits.
Supported data types
dtSearch’s proprietary document filters support parsing, indexing, searching and display with highlighted hits of text and metadata across a broad range of data types.
- Web-ready content: supports integrated images and text in HTML, XML/XSL, PDF, ASP.NET, CMS, PHP, WordPress, SharePoint, etc.
- Other databases and data sources: supports XML, Access, XBASE, CSV, etc.; dtSearch Engine APIs support NoSQL and SQL-type databases, along with the full-text of BLOB data; dtSearch Engine APIs also support disk images, network data streams and other non-file data.
- MS Office formats: supports integrated browser-ready image and text in Word (RTF/DOC/DOCX), PowerPoint (PPT/PPTX), Excel (XLS/XLSX), Access (MDB/ACCDB) and OneNote (ONE); support includes documents saved from Office 365.
- Other “Office” formats, PDF and other printer formats, compression formats: supports other “Office” suite formats; EMF Spool (SPL) files; compression formats like RAR, ZIP, GZIP and TAR; PDF, PDF Portfolio, and many encrypted PDFs. New PDF 2.0 support
- Emails and attachments: supports integrated browser-ready images, text and attachments in Outlook/Exchange (PST/OST/MSG) and Thunderbird (MBOX/EML); support includes emails saved from Office 365.
- Recursively embedded objects: supports recursively embedded objects and images in supported email types and MS Office formats. For example, the dtSearch document filters would support an email attachment consisting of a ZIP container including both a PDF and an Access database, where the latter also includes an embedded PowerPoint with embedded images.
- Using dtSearch with cloud storage (OneDrive, Amazon S3, etc.) Full list of supported document types.
- Full list of supported document types.
Federated searching and the dtSearch Spider
dtSearch products provide federated search across any number of directories, emails (with nested attachments), and databases.
The dtSearch Spider adds local and remote online content to a search. The Spider can index sites to any level of depth, with support for public and secure online content, including log-ins and forms-based authentication. dtSearch products provide integrated relevancy ranking with highlighted hits across both online and offline data. Note: for developers, the Spider is presented as a .NET API.
Document filter APIs
All developers APIs (C++, Java and .NET through current versions) make available to developers dtSearch’s text parsing, extraction, conversion and hit-highlighting capabilities.
- An “object extraction” API lets developers navigate through the structure of each embedded object as a hierarchy, and optionally extract each object, such as an image in an MS Word file embedded in an MS Access database, compressed and attached to an email.
- General dtSearch Engine licenses include the document filters along with dtSearch indexing and searching functionality.
- The document filters are also available for separate license for developers requiring text parsing, extraction and conversion “only,” without search.
Searching Multiple Data Sources
- dtSearch products support integrated relevancy ranking with highlighted hits across both online and offline data repositories.
- Federated searching span any number of directories, emails (with nested attachments), and databases.
- The Spider adds local and remote, static and dynamic online content to a search. The Spider can index sites to any level of depth, with support for public and private or secure online content, including log-ins and forms-based authentication.
Basic Search Types
- Natural language searching lets you enter a “plain English” (or any other international language) unstructured search request.
- Phrase searching finds phrases like: due process of law.
- Boolean operators like and/or/not can join words and phrases: due process of law and not (equal protection or civil rights).
- Proximity searching finds a word or phrase within “n” words of another word or phrase: apple pie w/38 peach cobbler.
- Directed proximity searching finds a word or phrase “n” words before another word or phrase: apple pie pre/38 peach cobbler.
- Phonic searching finds words that sound alike, like Smythe in a search for Smith.
- Stemming finds variations on endings, like applies, applied, applying in a search for apply.
- Numeric range searching finds any number between two numbers, such as between 6 and 36.
- Macro capabilities make it easy to include frequently used items in a search request.
- Wildcard support allows ? to hold a single letter place, and * to hold multiple letter places: apple* and not appl?sauce.
- Regular expressions support provides a way to search for combinations of characters.
- Digit character matching enables searching for patterns of numbers.
- Unicode support allows for searching of all Unicode-based international languages, including support for “right to left” languages and special options for Asian character handling.
- New multicolor hit-highlighting search options (up to 10 highlight colors)
- Fuzzy searching uses a proprietary algorithm to find search terms even if they are misspelled.
- Search fuzziness adjusts from 0 to 10 so you can fine-tune fuzziness to the level of OCR or typographical errors in your files.
- A search for alphabet with a fuzziness of 1 would find alphaqet; with a fuzziness of 3, it would find both alphaqet and alpkaqet.
- Fuzziness is not built into the index, so you can vary fuzziness at the time of each search.
Concept / Synonym / Thesaurus Searching
- Concept searching lets you look for fast and find quick, speedy, etc.
- dtSearch offers variable levels of automatic synonym expansion based on a comprehensive semantic network of the English language.
- You can also add your own thesaurus terms.
- For example, Frank and Jones would not be synonyms covered by the built-in semantic network. But if you are working on the Frank Jones case, you may want to make them synonyms for purposes of your case. The same principle applies to technical jargon like airbags and SRS.
Metadata Search Options
- dtSearch products support metadata in all supported search types.
- The dtSearch Engine also supported faceted searching and other advanced data classification.
- Along with natural language algorithms, dtSearch products also support positional scoring and full-text and metadata-based variable term weighting.
Combining Search Types
- Nearly all search types are combinable.
- You can make your search request as complex as you want.