Skip to content

Enhance deltacode matching #112

@chinyeungli

Description

@chinyeungli

Given 2 inputs (either csv or json) from scancode scan, the tool should compare the two inputs based on the path value and return pathmatch and pathscore information.

The current behavior is to compare with the "full" path. However, what will be a better way is the compare with segments.
Input A is the path that we want to find matches.
For instance,
Input A:
/tmp/project/a/b/c/d.java
Input B:
/project/test/a/b/c/d.java

deltacode may conclude the above 2 do not match.
Instead, deltacode should return pathscore as 4 (because the above 2 inputs have 4 consecutive segments match starting from the end/right to left) and let user to conclude if this is a real match or not.

In addition, it should also automatically do some filtering in a sense that only keep the highest pathscore as a match and keep both if pathscore are the same for the input.
For instance,
Input C:
/project/c/d.java
Input D:
/tmp/test/a/b/c/d.java

Input C will automatically be ignored because it only has pathscore 2
Input B and D should be kept because these both have pathscore 4

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions