|
| template<data_source_compatible_type T> |
| | data_source (const T &source) |
| | Constructs a data_source from a compatible type. More...
|
| |
| template<data_source_compatible_type T> |
| | data_source (T &&source) |
| | Constructs a data_source by moving from a compatible type. More...
|
| |
| template<data_source_compatible_type T> |
| | data_source (const T &source, file_extension file_extension) |
| | Constructs a data_source with an explicit file extension. More...
|
| |
| template<data_source_compatible_type T> |
| | data_source (T &&source, file_extension file_extension) |
| | Constructs a data_source by moving, with an explicit file extension. More...
|
| |
| template<data_source_compatible_type T> |
| | data_source (const T &source, mime_type mime_type, confidence mime_type_confidence) |
| | Constructs a data_source with an initial MIME type and confidence. More...
|
| |
| template<data_source_compatible_type T> |
| | data_source (T &&source, mime_type mime_type, confidence mime_type_confidence) |
| | Constructs a data_source by moving, with an initial MIME type and confidence. More...
|
| |
| std::span< const std::byte > | span (std::optional< length_limit > limit=std::nullopt) const |
| | Returns the content as a span of bytes. More...
|
| |
| std::string | string (std::optional< length_limit > limit=std::nullopt) const |
| | Returns the content as a string. More...
|
| |
| std::string_view | string_view (std::optional< length_limit > limit=std::nullopt) const |
| | Returns the content as a string_view. More...
|
| |
|
std::shared_ptr< std::istream > | istream () const |
| | Returns an input stream for reading the data.
|
| |
|
std::optional< std::filesystem::path > | path () const |
| | Returns the file path if the source is a file, otherwise std::nullopt.
|
| |
|
std::optional< docwire::file_extension > | file_extension () const |
| | Returns the file extension if available.
|
| |
|
unique_identifier | id () const |
| | Returns the unique identifier for this data source.
|
| |
| std::optional< std::pair< mime_type, confidence > > | highest_confidence_mime_type_info () const |
| | Returns the MIME type with the highest confidence and its confidence level. More...
|
| |
|
std::optional< mime_type > | highest_confidence_mime_type () const |
| | Returns the MIME type with the highest confidence.
|
| |
|
confidence | highest_mime_type_confidence () const |
| | Returns the highest confidence level found among detected MIME types.
|
| |
| bool | has_highest_confidence_mime_type_in (const std::vector< mime_type > &mts) const |
| | Checks if the highest confidence mime type is present in the given list. More...
|
| |
|
void | assert_not_encrypted () const |
| | Asserts that the data source is not encrypted.
|
| |
|
confidence | mime_type_confidence (mime_type mt) const |
| | Returns the confidence level for a specific MIME type.
|
| |
| void | add_mime_type (mime_type mt, confidence c) |
| | Adds a mime type with a confidence level. More...
|
| |
The class below represents a binary data source for data processing. It can be initialized with a file path, memory buffer, input stream or other data source. All popular C++ data sources are supported. Document parsers and 3rdparty libraries needs to have access to the data in preferred way like memory buffer or file path or stream or range, because of their implementation and it cannot be changed. Sometimes one method is faster than other, and parser needs to know about state of data source to decide. Converting data from one storage form to other should be possible in all combinations but performed only as required (lazy) and cached inside the class, for example file should be read to memory only once. Performance is very important, for example we should not duplicate memory buffer that is passed to class.
- Examples
- local_embedding_similarity.cpp.
Definition at line 127 of file data_source.h.