Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions docs/usage.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,8 @@ which is always derived from std::exception.

All classes are defined in the namespace zim.
Copying is allowed and tried to make as cheap as possible.
The reading part of the libzim is most of the time thread safe.
Searching and creating part are not. You have to serialize access to the class yourself.
The reading part of the libzim (including search) is most of the time thread safe.
Creating part is not. You have to serialize access to the Creator class yourself.

The main class, which accesses a archive is |Archive|.
It has actually a reference to an implementation, so that copies of the class just references the same file.
Expand Down
167 changes: 145 additions & 22 deletions include/zim/archive.h
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,75 @@ namespace zim
efficientOrder
};

/**
* Configuration to pass to archive constructors.
*
* Some configuration option specifying how to open a zim archive.
* For now, it is only related to preload data but it may change in the future.
*
* Archive may preload few data to speedup future accessing.
* However, this preload itself can take times.
*
* OpenConfig allow user to define how Archive should preload data.
*/
struct LIBZIM_API OpenConfig {
/**
* Default configuration.
*
* - Dirent ranges is activated.
* - Xapian preloading is activated.
*/
OpenConfig();

/**
* Configure xapian preloading.
*
* This method modify the configuration and return itelf.
*/
OpenConfig& preloadXapianDb(bool load) { m_preloadXapianDb = load; return *this; }

/**
* Configure xapian preloading.
*
* This method create a new configuration with the new value.
*/
OpenConfig preloadXapianDb(bool load) const {
auto other = *this;
other.m_preloadXapianDb = load;
return other;
}

/**
* Configure direntRanges preloading.
*
* libzim will load `nbRanges + 1` dirents to create `nbRanges` dirent ranges.
* This will be used to speedup dirent lookup. This is an extra layer on top of
* classic dirent cache.
*
* This method modify the configuration and return itelf.
*/
OpenConfig& preloadDirentRanges(int nbRanges) { m_preloadDirentRanges = nbRanges; return *this; }

/**
* Configure direntRanges preloading.
*
* libzim will load `nbRanges + 1` dirents to create `nbRanges` dirent ranges.
* This will be used to speedup dirent lookup. This is an extra layer on top of
* classic dirent cache.
*
* This method create a new configuration with the new value.
*/
OpenConfig preloadDirentRanges(int nbRanges) const {
auto other = *this;
other.m_preloadDirentRanges = nbRanges;
return other;
}

bool m_preloadXapianDb;
int m_preloadDirentRanges;
};


/**
* The Archive class to access content in a zim file.
*
Expand Down Expand Up @@ -93,6 +162,20 @@ namespace zim
*/
explicit Archive(const std::string& fname);

/** Archive constructor.
*
* Construct an archive from a filename.
* The file is open readonly.
*
* The filename is the "logical" path.
* So if you want to open a split zim file (foo.zimaa, foo.zimab, ...)
* you must pass the `foo.zim` path.
*
* @param fname The filename to the file to open (utf8 encoded)
* @param openConfig The open configuration to use.
*/
Archive(const std::string& fname, OpenConfig openConfig);

#ifndef _WIN32
/** Archive constructor.
*
Expand All @@ -106,6 +189,19 @@ namespace zim
*/
explicit Archive(int fd);

/** Archive constructor.
*
* Construct an archive from a file descriptor.
* Fd is used only at Archive creation.
* Ownership of the fd is not taken and it must be closed by caller.
*
* Note: This function is not available under Windows.
*
* @param fd The descriptor of a seekable file representing a ZIM archive
* @param openConfig The open configuration to use.
*/
Archive(int fd, OpenConfig openConfig);

/** Archive constructor.
*
* Construct an archive from a descriptor of a file with an embedded ZIM
Expand All @@ -123,6 +219,24 @@ namespace zim
*/
Archive(int fd, offset_type offset, size_type size);

/** Archive constructor.
*
* Construct an archive from a descriptor of a file with an embedded ZIM
* archive inside.
* Fd is used only at Archive creation.
* Ownership of the fd is not taken and it must be closed by caller.
*
* Note: This function is not available under Windows.
*
* @param fd The descriptor of a seekable file with a continuous segment
* representing a complete ZIM archive.
* @param offset The offset of the ZIM archive relative to the beginning
* of the file (rather than the current position associated with fd).
* @param size The size of the ZIM archive.
* @param openConfig The open configuration to use.
*/
Archive(int fd, offset_type offset, size_type size, OpenConfig openConfig);

/** Archive constructor.
*
* Construct an archive from a descriptor of a file with an embedded ZIM
Expand All @@ -137,6 +251,21 @@ namespace zim
*/
explicit Archive(FdInput fd);

/** Archive constructor.
*
* Construct an archive from a descriptor of a file with an embedded ZIM
* archive inside.
* Fd is used only at Archive creation.
* Ownership of the fd is not taken and it must be closed by caller.
*
* Note: This function is not available under Windows.
*
* @param fd A FdInput (tuple) containing the fd (int), offset (offset_type) and size (size_type)
* referencing a continuous segment representing a complete ZIM archive.
* @param openConfig The open configuration to use.
*/
Archive(FdInput fd, OpenConfig openConfig);

/** Archive constructor.
*
* Construct an archive from several file descriptors.
Expand All @@ -151,6 +280,22 @@ namespace zim
* referencing a series of segments representing a complete ZIM archive.
*/
explicit Archive(const std::vector<FdInput>& fds);

/** Archive constructor.
*
* Construct an archive from several file descriptors.
* Each part may be embedded in a file.
* Fds are used only at Archive creation.
* Ownership of the fds is not taken and they must be closed by caller.
* Fds (int) can be the same between FdInput if the parts belong to the same file.
*
* Note: This function is not available under Windows.
*
* @param fds A vector of FdInput (tuple) containing the fd (int), offset (offset_type) and size (size_type)
* referencing a series of segments representing a complete ZIM archive.
* @param openConfig The open configuration to use.
*/
Archive(const std::vector<FdInput>& fds, OpenConfig openConfig);
#endif

/** Return the filename of the zim file.
Expand Down Expand Up @@ -576,28 +721,6 @@ namespace zim
*/
void setDirentCacheMaxSize(size_t nbDirents);

/** Get the size of the dirent lookup cache.
*
* The returned size returns the default size or the last set size.
* This may not correspond to the actual size of the dirent lookup cache.
* See `set_dirent_lookup_cache_max_size` for more information.
*
* @return The maximum number of sub ranges created in the lookup cache.
*/
size_t getDirentLookupCacheMaxSize() const;

/** Set the size of the dirent lookup cache.
*
* Contrary to other `set_<foo>_cache_max_size`, this method is useless once
* the lookup cache is created.
* The lookup cache is created at first access to a entry in the archive.
* So this method must be called before any access to content (including metadata).
* It is best to call this method first, just after the archive creation.
*
* @param nbRanges The maximum number of sub ranges created in the lookup cache.
*/
void setDirentLookupCacheMaxSize(size_t nbRanges);

#ifdef ZIM_PRIVATE
cluster_index_type getClusterCount() const;
offset_type getClusterOffset(cluster_index_type idx) const;
Expand Down
4 changes: 2 additions & 2 deletions include/zim/entry.h
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ namespace zim
class LIBZIM_API Entry
{
public:
explicit Entry(std::shared_ptr<FileImpl> file_, entry_index_type idx_);
explicit Entry(std::shared_ptr<const FileImpl> file_, entry_index_type idx_);

bool isRedirect() const;
std::string getTitle() const;
Expand Down Expand Up @@ -84,7 +84,7 @@ namespace zim
entry_index_type getIndex() const { return m_idx; }

protected: // so that Item can be implemented as a wrapper over Entry
std::shared_ptr<FileImpl> m_file;
std::shared_ptr<const FileImpl> m_file;
entry_index_type m_idx;
std::shared_ptr<const Dirent> m_dirent;
};
Expand Down
5 changes: 1 addition & 4 deletions include/zim/item.h
Original file line number Diff line number Diff line change
Expand Up @@ -38,9 +38,6 @@ namespace zim
*/
class LIBZIM_API Item : private Entry
{
public: // types
typedef std::pair<std::string, offset_type> DirectAccessInfo;

public: // functions
std::string getTitle() const { return Entry::getTitle(); }
std::string getPath() const { return Entry::getPath(); }
Expand Down Expand Up @@ -84,7 +81,7 @@ namespace zim
* If it is not possible to have direct access for this item,
* return a pair of `{"", 0}`
*/
DirectAccessInfo getDirectAccessInformation() const;
zim::ItemDataDirectAccessInfo getDirectAccessInformation() const;

entry_index_type getIndex() const { return Entry::getIndex(); }

Expand Down
7 changes: 2 additions & 5 deletions include/zim/search.h
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,6 @@
#include "archive.h"
#include <vector>
#include <string>
#include <map>

namespace Xapian {
class Enquire;
Expand All @@ -48,10 +47,8 @@ class SearchResultSet;
* A Searcher is mainly used to create new `Search`
* Internaly, this is mainly a wrapper around a Xapian database.
*
* You should consider that all search operations are NOT threadsafe.
* It is up to you to protect your calls to avoid race competition.
* However, Searcher (and subsequent classes) do not maintain a global/share state.
* You can create several Searchers and use them in different threads.
* All search (at exception of SearchIterator) operation are thread safe.
* You can freely create several Search from one Searcher and use them in different threads.
*/
class LIBZIM_API Searcher
{
Expand Down
7 changes: 6 additions & 1 deletion include/zim/search_iterator.h
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,6 @@
#include <memory>
#include <iterator>
#include "entry.h"
#include "archive.h"
#include "uuid.h"

namespace zim
Expand All @@ -35,6 +34,12 @@ class SearchResultSet;
/**
* A interator on search result (an Entry)
*
* SearchIterator are mostly thread safe:
* - Manipulating the iterator itself (increment it, ...) is not thread safe.
* You should not share an iterator between different thread (and you probably don't have use case for that)
* - Reading from two iterators (getPath, ...) from two differents thread is ok.
* (ie: You can pass iterator from one thread to the other one)
*
* Be aware that the referenced/pointed Entry is generated and stored
* in the iterator itself.
* Once the iterator is destructed or incremented/decremented, you must NOT
Expand Down
38 changes: 38 additions & 0 deletions include/zim/zim.h
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@
#define ZIM_ZIM_H

#include <cstdint>
#include <string>

#ifdef __GNUC__
#define DEPRECATED __attribute__((deprecated))
Expand Down Expand Up @@ -135,6 +136,43 @@
*/
COUNT
};

/**
* Information needed to directly access to an item data, bypassing libzim library.
*
* Some items may have their data store uncompressed in the zim archive.
* In such case, an user can read the item data directly by (re)opening the file and
* seek at the right offset.
*/
struct ItemDataDirectAccessInfo {

Check notice on line 148 in include/zim/zim.h

View check run for this annotation

codefactor.io / CodeFactor

include/zim/zim.h#L148

Redundant blank line at the start of a code block should be deleted. (whitespace/blank_line)
/**
* The filename to open.
*/
std::string filename;

/**
* The offset to seek to before reading.
*/
offset_type offset;

explicit ItemDataDirectAccessInfo()
: filename(),
offset()
{}

ItemDataDirectAccessInfo(const std::string& filename, offset_type offset)
: filename(filename),
offset(offset)
{}

/**
* Return if the ItemDataDirectAccessInfo is valid
*/
bool isValid() const {
return !filename.empty();
}
};
}

#endif // ZIM_ZIM_H
Expand Down
Loading