DocWire SDK
DocWire SDK: Award-winning modern data processing in C++20. SourceForge Community Choice & Microsoft support. AI-driven processing. Supports nearly 100 data formats, including email boxes and OCR. Boost efficiency in text extraction, web data extraction, data mining, document analysis. Offline processing possible for security and confidentiality
docwire::content_type::detector Class Reference

Content type detection chain element. More...

#include <content_type.h>

Inheritance diagram for docwire::content_type::detector:
docwire::chain_element docwire::with_pimpl< chain_element > docwire::with_pimpl_base

Public Member Functions

 detector (ref_or_owned< by_signature::database > signatures_db_to_use=by_signature::database{})
 Constructs a new detector with the given database of signatures. More...
 
continuation operator() (message_ptr msg, const message_callbacks &emit_message) override
 
bool is_leaf () const override
 Check if chain element is a leaf (last element which doesn't produce any messages). At this moment only exporters are leafs. More...
 
- Public Member Functions inherited from docwire::chain_element
 chain_element (chain_element &&)=default
 
chain_elementoperator= (chain_element &&)=default
 
virtual bool is_generator () const
 

Additional Inherited Members

- Protected Types inherited from docwire::with_pimpl< chain_element >
using impl_type = pimpl_impl< chain_element >
 
- Protected Member Functions inherited from docwire::with_pimpl< chain_element >
impl_typecreate_impl (Args &&... args)
 
 with_pimpl (Args &&... args)
 
 with_pimpl (with_pimpl< chain_element > &&other) noexcept
 
 with_pimpl (std::nullptr_t)
 
with_pimploperator= (with_pimpl &&other) noexcept
 
impl_typeimpl ()
 
const impl_typeimpl () const
 

Detailed Description

Content type detection chain element.

Detects and assigns content types to the provided data source using various detection strategies.

This class is a chain element that detects and assigns content types to data sources using the following detection methods:

  • By file extension
  • By file signature
  • Image content detection
  • ODF and OOXML format detection
  • ASP content detection
  • HTML content detection
  • iWork content detection
  • ODF Flat format detection
  • Outlook format detection
  • XLSB format detection
See also
performing file type detection example
content_type::detect
content_type::by_file_extension::detector
content_type::by_signature::detector
content_type::image::detector
content_type::odf_ooxml::detector
content_type::asp::detector
content_type::html::detector
content_type::iwork::detector
content_type::odf_flat::detector
content_type::outlook::detector
content_type::xlsb::detector

Definition at line 106 of file content_type.h.

Constructor & Destructor Documentation

◆ detector()

docwire::content_type::detector::detector ( ref_or_owned< by_signature::database signatures_db_to_use = by_signature::database{})
inline

Constructs a new detector with the given database of signatures.

The detector will use the provided database of signatures for content type detection. If no database is provided, it will be created and loaded.

Parameters
signatures_db_to_useThe database of signatures to be used for content type detection.
See also
content_type::by_signature::database

Definition at line 120 of file content_type.h.

Member Function Documentation

◆ is_leaf()

bool docwire::content_type::detector::is_leaf ( ) const
inlineoverridevirtual

Check if chain element is a leaf (last element which doesn't produce any messages). At this moment only exporters are leafs.

Returns
true if leaf

Implements docwire::chain_element.

Definition at line 145 of file content_type.h.


The documentation for this class was generated from the following file: