|
DocWire SDK
DocWire SDK: Award-winning modern data processing in C++20. SourceForge Community Choice & Microsoft support. AI-driven processing. Supports nearly 100 data formats, including email boxes and OCR. Boost efficiency in text extraction, web data extraction, data mining, document analysis. Offline processing possible for security and confidentiality
|
A parser for flat ODF XML documents. More...
#include <odfxml_parser.h>
Public Member Functions | |
| odfxml_parser () | |
| Default constructor. | |
| continuation | operator() (message_ptr msg, const message_callbacks &emit_message) override |
| Processes a message in the parsing chain. More... | |
| bool | is_leaf () const override |
| Check if chain element is a leaf (last element which doesn't produce any messages). At this moment only exporters are leafs. More... | |
Public Member Functions inherited from docwire::common_xml_document_parser< default_safety_level > | |
| void | registerODFOOXMLCommandHandler (const std::string &xml_tag, const CommandHandler &handler) |
| Registers a handler for a specific XML tag. More... | |
| std::string | parseXmlData (xml::children_view< safety_level > xml_nodes, XmlParseMode mode, zip_reader *zipfile) |
| Parses XML data from a view of nodes. More... | |
| std::string | parseXmlChildren (xml::node_ref< safety_level > &xml_node, XmlParseMode mode, zip_reader *zipfile) |
| Parses the children of a given XML node. More... | |
| void | extractText (std::string_view xml_contents, XmlParseMode mode, zip_reader *zipfile, std::string &text) |
| Extracts text from raw XML content. More... | |
| void | parseODFMetadata (std::string_view xml_content, attributes::metadata &metadata) const |
| Parses ODF metadata from XML content. More... | |
| const std::string | formatComment (const std::string &author, const std::string &time, const std::string &text) |
| Formats a comment for output. More... | |
| size_t & | getListDepth () |
| Returns the current nesting depth of lists. | |
| ListStyleMap & | getListStyles () |
| Gets the map of list styles. | |
| CommentMap & | getComments () |
| Gets the map of comments. | |
| RelationshipMap & | getRelationships () |
| Gets the map of relationships. | |
| SharedStringVector & | getSharedStrings () |
| Gets the vector of shared strings. | |
| bool | disabledText () const |
| Checks if text extraction is currently disabled. | |
| xml::reader_blanks | blanks () const |
| Gets the current blank node handling policy. | |
| void | disableText (bool disable) |
| Enables or disables text extraction. | |
| void | set_blanks (xml::reader_blanks blanks) |
| Sets the blank node handling policy for the XML reader. | |
| void | activeEmittingSignals (bool flag) |
| Controls whether signal emission (callbacks) is active. | |
| common_xml_document_parser () | |
| Default constructor. | |
Public Member Functions inherited from docwire::chain_element | |
| chain_element (chain_element &&)=default | |
| chain_element & | operator= (chain_element &&)=default |
| virtual bool | is_generator () const |
Protected Member Functions | |
| auto | create_base_context_guard (const message_callbacks &emit_message) |
Protected Member Functions inherited from docwire::with_pimpl< chain_element > | |
| impl_type * | create_impl (Args &&... args) |
| with_pimpl (Args &&... args) | |
| with_pimpl (with_pimpl< chain_element > &&other) noexcept | |
| with_pimpl (std::nullptr_t) | |
| with_pimpl & | operator= (with_pimpl &&other) noexcept |
| impl_type & | impl () |
| const impl_type & | impl () const |
Protected Member Functions inherited from docwire::with_pimpl< common_xml_document_parser< safety_level > > | |
| impl_type * | create_impl (Args &&... args) |
| with_pimpl (Args &&... args) | |
| with_pimpl (with_pimpl< common_xml_document_parser< safety_level > > &&other) noexcept | |
| with_pimpl (std::nullptr_t) | |
| with_pimpl & | operator= (with_pimpl &&other) noexcept |
| impl_type & | impl () |
| const impl_type & | impl () const |
Protected Member Functions inherited from docwire::with_pimpl< odfxml_parser< default_safety_level > > | |
| impl_type * | create_impl (Args &&... args) |
| with_pimpl (Args &&... args) | |
| with_pimpl (with_pimpl< odfxml_parser< default_safety_level > > &&other) noexcept | |
| with_pimpl (std::nullptr_t) | |
| with_pimpl & | operator= (with_pimpl &&other) noexcept |
| impl_type & | impl () |
| const impl_type & | impl () const |
Additional Inherited Members | |
Public Types inherited from docwire::common_xml_document_parser< default_safety_level > | |
| enum | ODFOOXMLListStyle |
| Enum for list styles (e.g., numbered or bulleted). | |
| typedef std::vector< ODFOOXMLListStyle > | ListStyleVector |
| Type alias for a vector of list styles. | |
| using | ListStyleMap = std::map< std::string, common_xml_document_parser< safety_level >::ListStyleVector > |
| Type alias for a map of list style names to their definitions. | |
| using | CommentMap = std::map< int, common_xml_document_parser< safety_level >::comment > |
| Type alias for a map of comment IDs to Comment objects. | |
| using | RelationshipMap = std::map< std::string, common_xml_document_parser< safety_level >::relationship > |
| Type alias for a map of relationship IDs to Relationship objects. | |
| using | SharedStringVector = std::vector< shared_string > |
| Type alias for a vector of shared strings. | |
| typedef std::function< void(xml::node_ref< safety_level > &xml_node, XmlParseMode mode, zip_reader *zipfile, std::string &text, bool &children_processed, std::string &level_suffix, bool first_on_level)> | CommandHandler |
| Defines the function signature for an XML tag command handler. | |
Protected Types inherited from docwire::with_pimpl< chain_element > | |
| using | impl_type = pimpl_impl< chain_element > |
Protected Types inherited from docwire::with_pimpl< common_xml_document_parser< safety_level > > | |
| using | impl_type = pimpl_impl< common_xml_document_parser< safety_level > > |
Protected Types inherited from docwire::with_pimpl< odfxml_parser< default_safety_level > > | |
| using | impl_type = pimpl_impl< odfxml_parser< default_safety_level > > |
A parser for flat ODF XML documents.
| safety_level | The safety policy to use. |
Definition at line 28 of file odfxml_parser.h.
|
inlineoverridevirtual |
Check if chain element is a leaf (last element which doesn't produce any messages). At this moment only exporters are leafs.
Implements docwire::chain_element.
Definition at line 54 of file odfxml_parser.h.
|
overridevirtual |
Processes a message in the parsing chain.
Implements docwire::chain_element.