-
Notifications
You must be signed in to change notification settings - Fork 86
Description
Platform
- SeqAn version: 3.0.2
- Operating system: Linux ubuntuvb 4.15.0-112-generic travis: Use g++-7 direcly from packages (and fix ubuntu 16.04 problems) #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
- Compiler: GCC 9.2.1
Question
I have a question regarding the new FM index implementation when trying to upgrade from SeqAn2 to SeqAn3.
When using the SeqAn FM index data structure but running my own algorithm on it, it was previously possible to obtain the node position of the underlying suffix tree and use it later to access the same node. This is for example necessary when writing interim states of the algorithm to disk and read it later to continue the index search.
In SeqAn2, I used the following way to achieve this:
// Index config
typedef seqan::FastFMIndexConfig<void, uint64_t,2 ,1> FMIConfig;
// Index type
typedef seqan::Index<seqan::StringSet<seqan::DnaString>, seqan::FMIndex<void, FMIConfig> > FMIndex;
// Vertex descriptor
typedef seqan::Iter<FMIndex,seqan::VSTree<seqan::TopDown<seqan::Preorder>>>::TVertexDesc FMVertexDescriptor;
// [...]
// Build index and do some search
// [...]
// Now I can simply use the FMVertexDescriptor to store the current index position of the algorithm:
// vDesc is an instance of FMVertexDescriptor
std::vector<char> data;
char* d = data.data();
memcpy(d, &vDesc, sizeof(FMVertexDescriptor));
// [...]
// And do the same to create a new vertex descriptor and continue the algorithm
void deserialize(char * d)
{
FMVertexDescriptor vDesc;
memcpy(&vDesc, d, sizeof(FMVertexDescriptor));
bytes += sizeof(FMVertexDescriptor);
//[...]
}When I tried to use the SeqAn3 FM index for the same thing, I recognized that in principle this should be possible using the seqan3::fm_index_cursor containing the node that is used for searching the index. Like this (minimal example):
// Assuming index_t is the index type used
index_t index;
// [...]
// build index
// [...]
seqan3::fm_index_cursor<index_t> cursor(index);
cursor.extend_right('G'_dna5);
// works fine so far.
// How could I now serialize the cursor, write it to a file,
// and load it later again to create a new cursor and continue search?My question: As far as I can observe, there is no way to access the private members node, parent_lb, parent_rb and sigma of seqan3::fm_index_cursor<index_t> for storing the cursor location. At the same time, it also seems not to be possible to create a cursor at a given position that was calculated before, e.g. by providing a constructor to create a cursor from the members mentioned above. Thus, is there currently any way to serialize / obtain the underlying values of FM index cursor positions and create a new cursor later using these values? Or any other way to perform one part of the search, store the current state and continue the search later with a new cursor instance?