Skip to content

FM index cursor (de-)serialization #2044

@tloka

Description

@tloka

Platform

Question

I have a question regarding the new FM index implementation when trying to upgrade from SeqAn2 to SeqAn3.
When using the SeqAn FM index data structure but running my own algorithm on it, it was previously possible to obtain the node position of the underlying suffix tree and use it later to access the same node. This is for example necessary when writing interim states of the algorithm to disk and read it later to continue the index search.

In SeqAn2, I used the following way to achieve this:

// Index config
typedef seqan::FastFMIndexConfig<void, uint64_t,2 ,1> FMIConfig;

// Index type
typedef seqan::Index<seqan::StringSet<seqan::DnaString>, seqan::FMIndex<void, FMIConfig> > FMIndex;

// Vertex descriptor
typedef seqan::Iter<FMIndex,seqan::VSTree<seqan::TopDown<seqan::Preorder>>>::TVertexDesc FMVertexDescriptor;

// [...]
// Build index and do some search
// [...]

// Now I can simply use the FMVertexDescriptor to store the current index position of the algorithm:
// vDesc is an instance of FMVertexDescriptor
std::vector<char> data;
char* d = data.data();
memcpy(d, &vDesc, sizeof(FMVertexDescriptor));

// [...]

// And do the same to create a new vertex descriptor and continue the algorithm
void deserialize(char * d)
{
  FMVertexDescriptor vDesc;
  memcpy(&vDesc, d, sizeof(FMVertexDescriptor));
  bytes += sizeof(FMVertexDescriptor);
  //[...]
}

When I tried to use the SeqAn3 FM index for the same thing, I recognized that in principle this should be possible using the seqan3::fm_index_cursor containing the node that is used for searching the index. Like this (minimal example):

// Assuming index_t is the index type used
index_t index;

// [...]
// build index
// [...]

seqan3::fm_index_cursor<index_t> cursor(index);
cursor.extend_right('G'_dna5);
// works fine so far. 

// How could I now serialize the cursor, write it to a file, 
// and load it later again to create a new cursor and continue search?

My question: As far as I can observe, there is no way to access the private members node, parent_lb, parent_rb and sigma of seqan3::fm_index_cursor<index_t> for storing the cursor location. At the same time, it also seems not to be possible to create a cursor at a given position that was calculated before, e.g. by providing a constructor to create a cursor from the members mentioned above. Thus, is there currently any way to serialize / obtain the underlying values of FM index cursor positions and create a new cursor later using these values? Or any other way to perform one part of the search, store the current state and continue the search later with a new cursor instance?

Metadata

Metadata

Assignees

No one assigned

    Labels

    questiona user question how to do certain things

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions