|
DocWire SDK
DocWire SDK: Award-winning modern data processing in C++20. SourceForge Community Choice & Microsoft support. AI-driven processing. Supports nearly 100 data formats, including email boxes and OCR. Boost efficiency in text extraction, web data extraction, data mining, document analysis. Offline processing possible for security and confidentiality
|
A parser for ODF and OOXML document formats. More...
#include <odf_ooxml_parser.h>
Public Member Functions | |
| void | parse (const data_source &data, const message_callbacks &emit_message) |
| Parses the given data source. More... | |
| odf_ooxml_parser () | |
| Default constructor. | |
| continuation | operator() (message_ptr msg, const message_callbacks &emit_message) override |
| Processes a message in the parsing chain. More... | |
| bool | is_leaf () const override |
| Check if chain element is a leaf (last element which doesn't produce any messages). At this moment only exporters are leafs. More... | |
Public Member Functions inherited from docwire::common_xml_document_parser< default_safety_level > | |
| void | registerODFOOXMLCommandHandler (const std::string &xml_tag, const CommandHandler &handler) |
| Registers a handler for a specific XML tag. More... | |
| std::string | parseXmlData (xml::children_view< safety_level > xml_nodes, XmlParseMode mode, zip_reader *zipfile) |
| Parses XML data from a view of nodes. More... | |
| std::string | parseXmlChildren (xml::node_ref< safety_level > &xml_node, XmlParseMode mode, zip_reader *zipfile) |
| Parses the children of a given XML node. More... | |
| void | extractText (std::string_view xml_contents, XmlParseMode mode, zip_reader *zipfile, std::string &text) |
| Extracts text from raw XML content. More... | |
| void | parseODFMetadata (std::string_view xml_content, attributes::metadata &metadata) const |
| Parses ODF metadata from XML content. More... | |
| const std::string | formatComment (const std::string &author, const std::string &time, const std::string &text) |
| Formats a comment for output. More... | |
| size_t & | getListDepth () |
| Returns the current nesting depth of lists. | |
| ListStyleMap & | getListStyles () |
| Gets the map of list styles. | |
| CommentMap & | getComments () |
| Gets the map of comments. | |
| RelationshipMap & | getRelationships () |
| Gets the map of relationships. | |
| SharedStringVector & | getSharedStrings () |
| Gets the vector of shared strings. | |
| bool | disabledText () const |
| Checks if text extraction is currently disabled. | |
| xml::reader_blanks | blanks () const |
| Gets the current blank node handling policy. | |
| void | disableText (bool disable) |
| Enables or disables text extraction. | |
| void | set_blanks (xml::reader_blanks blanks) |
| Sets the blank node handling policy for the XML reader. | |
| void | activeEmittingSignals (bool flag) |
| Controls whether signal emission (callbacks) is active. | |
| common_xml_document_parser () | |
| Default constructor. | |
Public Member Functions inherited from docwire::chain_element | |
| chain_element (chain_element &&)=default | |
| chain_element & | operator= (chain_element &&)=default |
| virtual bool | is_generator () const |
Additional Inherited Members | |
Public Types inherited from docwire::common_xml_document_parser< default_safety_level > | |
| enum | ODFOOXMLListStyle |
| Enum for list styles (e.g., numbered or bulleted). | |
| typedef std::vector< ODFOOXMLListStyle > | ListStyleVector |
| Type alias for a vector of list styles. | |
| using | ListStyleMap = std::map< std::string, common_xml_document_parser< safety_level >::ListStyleVector > |
| Type alias for a map of list style names to their definitions. | |
| using | CommentMap = std::map< int, common_xml_document_parser< safety_level >::comment > |
| Type alias for a map of comment IDs to Comment objects. | |
| using | RelationshipMap = std::map< std::string, common_xml_document_parser< safety_level >::relationship > |
| Type alias for a map of relationship IDs to Relationship objects. | |
| using | SharedStringVector = std::vector< shared_string > |
| Type alias for a vector of shared strings. | |
| typedef std::function< void(xml::node_ref< safety_level > &xml_node, XmlParseMode mode, zip_reader *zipfile, std::string &text, bool &children_processed, std::string &level_suffix, bool first_on_level)> | CommandHandler |
| Defines the function signature for an XML tag command handler. | |
Protected Types inherited from docwire::with_pimpl< chain_element > | |
| using | impl_type = pimpl_impl< chain_element > |
Protected Types inherited from docwire::with_pimpl< common_xml_document_parser< safety_level > > | |
| using | impl_type = pimpl_impl< common_xml_document_parser< safety_level > > |
Protected Types inherited from docwire::with_pimpl< odf_ooxml_parser< default_safety_level > > | |
| using | impl_type = pimpl_impl< odf_ooxml_parser< default_safety_level > > |
Protected Member Functions inherited from docwire::with_pimpl< chain_element > | |
| impl_type * | create_impl (Args &&... args) |
| with_pimpl (Args &&... args) | |
| with_pimpl (with_pimpl< chain_element > &&other) noexcept | |
| with_pimpl (std::nullptr_t) | |
| with_pimpl & | operator= (with_pimpl &&other) noexcept |
| impl_type & | impl () |
| const impl_type & | impl () const |
Protected Member Functions inherited from docwire::with_pimpl< common_xml_document_parser< safety_level > > | |
| impl_type * | create_impl (Args &&... args) |
| with_pimpl (Args &&... args) | |
| with_pimpl (with_pimpl< common_xml_document_parser< safety_level > > &&other) noexcept | |
| with_pimpl (std::nullptr_t) | |
| with_pimpl & | operator= (with_pimpl &&other) noexcept |
| impl_type & | impl () |
| const impl_type & | impl () const |
Protected Member Functions inherited from docwire::with_pimpl< odf_ooxml_parser< default_safety_level > > | |
| impl_type * | create_impl (Args &&... args) |
| with_pimpl (Args &&... args) | |
| with_pimpl (with_pimpl< odf_ooxml_parser< default_safety_level > > &&other) noexcept | |
| with_pimpl (std::nullptr_t) | |
| with_pimpl & | operator= (with_pimpl &&other) noexcept |
| impl_type & | impl () |
| const impl_type & | impl () const |
A parser for ODF and OOXML document formats.
| safety_level | The safety policy to use. |
Definition at line 28 of file odf_ooxml_parser.h.
|
inlineoverridevirtual |
Check if chain element is a leaf (last element which doesn't produce any messages). At this moment only exporters are leafs.
Implements docwire::chain_element.
Definition at line 71 of file odf_ooxml_parser.h.
|
overridevirtual |
Processes a message in the parsing chain.
Implements docwire::chain_element.
| void docwire::odf_ooxml_parser< safety_level >::parse | ( | const data_source & | data, |
| const message_callbacks & | emit_message | ||
| ) |
Parses the given data source.
| data | The data source to parse. |
| emit_message | Callback to emit messages (e.g., document elements). |