Skip to content

Conversation

scanoss-qg
Copy link
Contributor

@scanoss-qg scanoss-qg commented Feb 23, 2022

This PR relates to Add SCANOSS Agent #2166
The agent detects licenses by querying file info from osskb DB. This agent fingerprints all the files related to the upload and query information about that file. Only licenses are kept, other information is discarded. The scan results are placed on license_file table.

Description

This agent fingerprints all the files related to the upload and query information about that file. Only licenses are kept, other information is discarded. The scan results are placed on license_file table.

Changes

This is the initial commit that only report license information. Other useful information such as copyright and version could be included

How to test

Just do an upload and check on "SCANOSS Toolkit" from optional analysis

@scanoss-qg
Copy link
Contributor Author

We have updated mod_deps file in order to install library dependencies. Please, verify buiding process again.

@ag4ums ag4ums self-assigned this Mar 8, 2022
@github-actions
Copy link

This pull request has conflicts, please rebase with master to resolve those before we can evaluate the pull request.

@github-actions github-actions bot added has merge conflicts PR to be rebased and removed has merge conflicts PR to be rebased labels Mar 14, 2022
@scanoss-qg scanoss-qg force-pushed the scanoss-qg/2166/scanoss-agent branch from 435364e to 9b09ce1 Compare March 14, 2022 14:46
@scanoss-qg
Copy link
Contributor Author

scanoss-qg commented Mar 14, 2022

Done with the rebase! If I can be of assistance, please do not hesitate to contact me @ag4ums

@scanoss-qg scanoss-qg force-pushed the scanoss-qg/2166/scanoss-agent branch from 9b09ce1 to 0941e8f Compare March 14, 2022 21:24
@ag4ums
Copy link
Contributor

ag4ums commented Mar 21, 2022

@scanoss-qg, I just tested this PR, I have few questions
the scanner stamps the files with the license -> that's great,
however user does not have a way to identify/verify that why a particular file is marked with a particular license (may be i am missing something).
Also, I think a snippet scanner, should show the matched snippets found from different files, don't see anything similar here.

@scanoss-qg
Copy link
Contributor Author

scanoss-qg commented Mar 21, 2022

Dear @ag4ums,

As far as we understand, FOSSology takes a single field of data from the agent: License name. We can of course provide you with a comprehensive array with matching data (range of lines match, component name, vendor, PURL, download URL, etc,etc). However, providing any of these extra metadata would require an update to FOSSology’s data model.

Having said that, the scope of this first agent is to provide you the ability to detect a license on a file that not necesarily contains a license header. It is understood that, unlike all other agents, the license provided by SCANOSS is obtained by comparing the file (or snippets) against our Open Source Knowledge.

We understand that this initial agent already brings to FOSSology a capability not available until now. Would it make sense to merge this contribution and then discuss in detail any further integration?

It would be nice to hear your feedback

Best Regards,

@ag4ums
Copy link
Contributor

ag4ums commented Mar 22, 2022

hello @scanoss-qg, let me discuss this with the community members, as I think this feature in its current from is not mergeable.

  • snippet requires a dedicated page, with consolidated information for snippet findings (copyright page/software heritage as an example).
  • you can also consider filling the package information in the conf page... as many of the fields are already available, but need to fill manually.

@scanoss-qg scanoss-qg force-pushed the scanoss-qg/2166/scanoss-agent branch from 732b1f9 to 6cfcb88 Compare March 25, 2022 19:10
@scanoss-qg
Copy link
Contributor Author

Hi again @ag4ums:
We went through a second round with the agent. First of all, we have changed the agent name in order to not be confusing. We have also added the matching file info on the [Info page]. Now, you can find information such as purl, download url, remote matching file and line ranges in case of snippet match, and path within the original project. We are not currently retrieving copyrights on our free service, so we have not included in this agent version. However, we have tested the Idea on an internal agent version (a tab between scancode findings and fossology findings).
About filling automatically the information on conf section, I suggest letting that for the next agent version. That can Undoubdately be done. We will be getting forward to hear your suggestions and comments.

Copy link
Contributor Author

@scanoss-qg scanoss-qg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems all changes are there

@ag4ums
Copy link
Contributor

ag4ums commented Mar 31, 2022

It seems all changes are there

looking into it....

@scanoss-qg
Copy link
Contributor Author

Hi @ag4ums , Sorry for the insistence. Did you have the chance to take a look on the new version of the agent? Once merged this basic functionality , we can continue adding more features. Thanks in advance!

@ag4ums
Copy link
Contributor

ag4ums commented Apr 13, 2022

Hi @ag4ums , Sorry for the insistence. Did you have the chance to take a look on the new version of the agent? Once merged this basic functionality , we can continue adding more features. Thanks in advance!

Hi @scanoss-qg , I have looked into it... many thanks for this PR,... as of now it's like other agent that find the licenses of a file,.....the snippet finding information are still missing...may be they are for future implementations.... what may be good here is a plan of all the features that you are planning to implement.. may be an issue....with some milestone... may be you can also consider joining our community meeting to discuss it(link is in wiki). ...... in current situation I would like to merge this PR as an experimental feature until we have basic snippet scanning feature included .... we can discuss this once you have the issues/feature list created.

Comment on lines 35 to 27
'scancode' => 'S',
'spasht' => 'Sp',
'reso' => 'Rs'
'reso' => 'Rs',
'scanoss' => 'Sc'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ag4ums should the scancode be renamed to Sc and scanoss to So?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/cc @ag4ums

Copy link
Contributor Author

@scanoss-qg scanoss-qg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

Copy link
Contributor Author

@scanoss-qg scanoss-qg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guys, All Suggestions have been accepted and corrected. SSL and OpenSSL dependencies have been removed. Using GCrypt library instead.
We have also added a README.md as a short guide

@shaheemazmalmmd
Copy link
Member

@scanoss-qg Can you please update the commit messages as per CONTRIBUTING.md. Also please squash the commits to one.

@scanoss-qg scanoss-qg force-pushed the scanoss-qg/2166/scanoss-agent branch from d5a39af to 009f32b Compare May 24, 2022 13:54
@scanoss-qg
Copy link
Contributor Author

@shaheemazmalmmd @GMishx @ag4ums I think that we have commited all the suggestions. Please, let us know if you need further information/modifications.

case "$DISTRO" in
Debian|Ubuntu)
apt-get $YesOpt install \
libcurl4 libcurl4-gnutls-dev libssl-dev jq
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the openssl dependencies be replaced with libgcrypt20-dev?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

libssl dependency was removed from src and makefile. I will remove the dependency from mod_deps too

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ag4ums , please check the changes before merging PR.

@ag4ums
Copy link
Contributor

ag4ums commented Jun 21, 2022

@scanoss-qg , is the scanning working in general? @shaheemazmalmmd , @GMishx
I am getting this below error, though it shows scanoss agent run successfully, but dont show any findings
from geeky scan details
image

@scanoss-qg
Copy link
Contributor Author

HI @ag4ums! It seems to be a SQL issue caused by a non clean uninstall. I will go back to you soon.

@scanoss-qg
Copy link
Contributor Author

Hi @ag4ums, as I said before, the issue seems to be related to non clean installation. Anyway I have modified the code to be able to recover to this kind of situations. Please, let me know if you find something strange, perhaps we could schedule a meeting to finally close this ticket.

@GMishx GMishx self-assigned this Jul 1, 2022
@GMishx
Copy link
Member

GMishx commented Jul 1, 2022

When I first run the agent, it sends following line to scheduler:
2022-07-01 13:26:58 scheduler [31847] :: JOB[-23].scanoss[31918.localhost]: "row number 0 is out of range 0..-1"

Running the agent, multiple warning and even some segmentation faults were generated:

2022-07-01 14:52:09 scanoss [0] :: JOB[4042].scanoss[36976.localhost]: "Segmentation fault"
2022-07-01 14:52:09 scanoss [0] :: JOB[4042].scanoss[36976.localhost]: "parse error: Unfinished JSON term at EOF at line 2, column 0"
2022-07-01 14:52:09 scanoss [0] :: JOB[4042].scanoss[36976.localhost]: "parse error: Unfinished JSON term at EOF at line 2, column 0"
2022-07-01 14:52:09 scanoss [0] :: JOB[4042].scanoss[36976.localhost]: "parse error: Objects must consist of key:value pairs at line 5, column 1"
2022-07-01 14:52:09 scanoss [0] :: JOB[4042].scanoss[36976.localhost]: "parse error: Objects must consist of key:value pairs at line 5, column 1"

Scanner finally failed after scanning 325/2086 files with following message:

2022-07-01 15:01:29 scanoss [0] :: JOB[4042].scanoss[36976.localhost]: "parse error: Unfinished JSON term at EOF at line 2, column 0"
2022-07-01 15:01:29 scanoss [0] :: JOB[4042].scanoss[36976.localhost]: "ERROR snippet_scan.c.293: Snippet scan: failed to dump scan results"
2022-07-01 15:01:29 scanoss [0] :: JOB[4042].scanoss[36976.localhost]: "ERROR snippet_scan.c.293: Snippet scan: failed to dump scan results"
2022-07-01 15:01:29 scanoss [0] :: JOB[4042].scanoss[36976.localhost]: "ERROR snippet_scan.c.293: Snippet scan: failed to dump scan results"
2022-07-01 15:01:29 scanoss [0] :: JOB[4042].scanoss[36976.localhost]: "ERROR snippet_scan.c.293: Snippet scan: failed to dump scan results"
2022-07-01 15:01:29 scanoss [0] :: JOB[4042].scanoss[36976.localhost]: "ERROR snippet_scan.c.293: Snippet scan: failed to dump scan results"
2022-07-01 15:01:29 scanoss [0] :: JOB[4042].scanoss[36976.localhost]: "ERROR snippet_scan.c.293: Snippet scan: failed to dump scan results"
2022-07-01 15:01:29 scanoss [0] :: JOB[4042].scanoss[36976.localhost]: "ERROR snippet_scan.c.293: Snippet scan: failed to dump scan results"
2022-07-01 15:01:29 scanoss [0] :: JOB[4042].scanoss[36976.localhost]: "ERROR snippet_scan.c.293: Snippet scan: failed to dump scan results"
2022-07-01 15:01:29 scanoss [0] :: JOB[4042].scanoss[36976.localhost]: "ERROR snippet_scan.c.293: Snippet scan: failed to dump scan results"
2022-07-01 15:01:29 scanoss [0] :: JOB[4042].scanoss[36976.localhost]: "ERROR snippet_scan.c.293: Snippet scan: failed to dump scan results"
2022-07-01 15:01:29 scanoss [0] :: JOB[4042].scanoss[36976.localhost]: "ERROR snippet_scan.c.253: Snippet scan: failed to run scan "
2022-07-01 15:01:30 scanoss [0] :: JOB[4042].scanoss[36976.localhost]: agent was killed by signal: 11.Segmentation fault

@GMishx
Copy link
Member

GMishx commented Jul 1, 2022

I ran the agent from command line to capture the coredump and found following issue:

$ echo "321" | ./src/scanoss/agent/scanoss --userID=3 --groupID=3 --scheduler_start
VERSION: "4.1.0.17"

OK
FATAL snippet_scan.c.206: Snippet_scan: The scan throws an invalid result
Segmentation fault (core dumped)

$ gdb src/scanoss/agent/scanoss /var/crash/core-scanoss-11-1000-1000-41585-1656668103
#0  0x00007f44bd392217 in fclose () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x0000562285fb5b53 in scanTempFile (key=16310) at snippet_scan.c:209
#2  0x0000562285fb5ef2 in ProcessUpload (upload_pk=upload_pk@entry=321) at snippet_scan.c:369
#3  0x0000562285fb4a4d in main (argc=<optimized out>, argv=<optimized out>) at main.c:248

Here: https://github.com/fossology/fossology/pull/2167/files#diff-35a047fd2940234261f524e719c1efcc8e80a90c71e3fdf7dd51d37b1809550cR204-R209

If f == null then fclose(f) will cause issue.

@GMishx
Copy link
Member

GMishx commented Jul 1, 2022

The segmentation fault in the job view are from bellow:

Core was generated by `scanner /tmp/scanoss/scanoss.tmp -o /tmp/scanoss/scanoss.tmp.json -H https://os'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00005579255f5df1 in scan_request_by_chunks (s=0x557926bbe2e0) at src/scanner.c:384
384                 *last_bracket = ','; //replace } by ,
(gdb) bt
#0  0x00005579255f5df1 in scan_request_by_chunks (s=0x557926bbe2e0) at src/scanner.c:384
#1  0x00005579255f73bc in scanner_recursive_scan (scanner=0x557926bbe2e0, wfp_only=false) at src/scanner.c:806
#2  0x00005579255f4c58 in main (argc=8, argv=0x7ffc80b5f828) at src/main.c:184

@scanoss-qg
Copy link
Contributor Author

scanoss-qg commented Jul 1, 2022 via email

@GMishx
Copy link
Member

GMishx commented Jul 1, 2022

Thanks for your feedback, I really appreciate it.
Can you share me the project you are trying to scan?

I scanned logback-v_1.1.7.

Core was generated by `scanner /tmp/scanoss/scanoss.tmp -o /tmp/scanoss/scanoss.tmp.json -H https://os'. As far as I can see, there is a problem there with the url of the API. Could you check the value on Sysconfig table of DB the variablename=ScAPIURL

The URL is correct (https://osskb.org/scan/direct). I think gdb trimmed the string to fit in the window.

I will continue woirking today on this, any information is welcome.

Sure thanks.

@GMishx
Copy link
Member

GMishx commented Jul 1, 2022

While you are working on it @scanoss-qg , I would recommend to add following changes as well for Debian packaging:

diff --git a/debian/control b/debian/control
index 727dd2899..e50afd49b 100644
--- a/debian/control
+++ b/debian/control
@@ -8,7 +8,7 @@ Build-Depends: debhelper (>=9~), libglib2.0-dev, libmagic-dev, libxml2-dev,
  libboost-program-options-dev, libjsoncpp-dev, libjson-c-dev, libpq-dev,
  php7.0-cli|php7.2-cli|php7.3-cli|php7.4-cli, php-mbstring, php-zip,
  php-xml, libboost-system-dev, libboost-filesystem-dev, libgcrypt20-dev,
- composer
+ composer, libcurl4, libcurl4-gnutls-dev
 Standards-Version: 3.9.1
 Homepage: https://fossology.org
 
@@ -392,6 +392,19 @@ Description: architecture to fetch license, copyright information from scancode.
  .
  This package contains the scancode agent programs and their resources.
 
+Package: fossology-scanoss
+Architecture: any
+Depends: fossology-common, fossology-ununpack, fossology-wgetagent,
+ ${shlibs:Depends}, ${misc:Depends}
+Description: architecture to fetch license and snippet information from scanoss.
+ The FOSSology project is a web based framework that allows you to
+ upload software to be picked apart and then analyzed by software agents
+ which produce results that are then browsable via the web interface.
+ Existing agents include license analysis, metadata extraction, and MIME
+ type identification.
+ .
+ This package contains the scanoss agent programs and their resources.
+
 Package: fossology-spasht
 Architecture: any
 Depends: fossology-common, ${shlibs:Depends}, ${misc:Depends}
diff --git a/src/scanoss/mod_deps b/src/scanoss/mod_deps
index c267a79da..e3c3e76c9 100755
--- a/src/scanoss/mod_deps
+++ b/src/scanoss/mod_deps
@@ -2,7 +2,7 @@
 ######################################################################
 # SCANOSS Agent for FOSSLogy
 # Copyright (C) 2018-2022 SCANOSS.COM
-# 
+#
 #  This program is free software: you can redistribute it and/or modify
 # it under the terms of the GNU General Public License as published by
 # the Free Software Foundation, either version 2 of the License, or
@@ -77,7 +77,7 @@ if [[ $BUILDTIME ]]; then
   case "$DISTRO" in
     Debian|Ubuntu)
       apt-get $YesOpt install \
-        libcurl4 libcurl4-gnutls-dev libssl-dev jq
+        libcurl4 libcurl4-gnutls-dev jq
       ;;
     RedHatEnterprise*|CentOS|Fedora)
       yum $YesOpt install \
@@ -91,7 +91,7 @@ if [[ $RUNTIME ]]; then
   case "$DISTRO" in
     Debian|Ubuntu)
       apt-get $YesOpt install \
-        libcurl4 libcurl4-gnutls-dev libssl-dev jq
+        libcurl4 libcurl4-gnutls-dev jq
       ;;
     RedHatEnterprise*|CentOS|Fedora)
       yum $YesOpt install \

@scanoss-qg
Copy link
Contributor Author

scanoss-qg commented Jul 1, 2022 via email

@scanoss-qg
Copy link
Contributor Author

scanoss-qg commented Jul 1, 2022 via email

@GMishx
Copy link
Member

GMishx commented Jul 2, 2022

The scancode wrapper runs similarly:

string command =
"PYTHONPATH='/home/" + projectUser + "/pythondeps/' " +
"SCANCODE_CACHE=" + cacheDir + "/scancode " + // Use fossology's cache
"/home/" + projectUser + "/pythondeps/bin/scancode -" +
state.getCliOptions() +
" --custom-output - --custom-template scancode_template.html " +
file.getFileName() + " --quiet " +
((state.getCliOptions().find('l') != string::npos) ? " --license-text --license-score " + to_string(MINSCORE): "");

And installs the agent as a pip package:

# Install pip dependencies together.
# Using --target causes conflict in bin dir, using --upgrade overwrites it.
# See https://github.com/pypa/pip/issues/8063
# Dependencies for scancode
su $TARGETUSER -c 'python3 -m pip install --target=$HOME/pythondeps --no-input setuptools wheel'
###########################################################################
if [[ $EXPERIMENTAL ]]; then
# Include experimental dependencies
su $TARGETUSER -c 'PYTHONPATH="$HOME/pythondeps" python3 -m pip install --target=$HOME/pythondeps --no-input --upgrade spacy pandas scancode-toolkit'
su $TARGETUSER -c 'PYTHONPATH="$HOME/pythondeps" python3 -m spacy download en_core_web_sm --target=$HOME/pythondeps'
else
# Only non-experimental dependencies
su $TARGETUSER -c 'PYTHONPATH="$HOME/pythondeps" python3 -m pip install --target=$HOME/pythondeps --no-input --upgrade scancode-toolkit'
fi

Just a heads up with the problems we are facing with scancode:

  1. The agent calls scancode cli once per file.
  2. scancode has a significant bootstrap time.
  3. This results in the agent running 3-4 times slower in magnitudes compared to cli.

If scanoss does not have such bootstrap time, it is possible to do it.

What do you think about the suggestion @ag4ums?

@ag4ums
Copy link
Contributor

ag4ums commented Jul 5, 2022

completely agree @GMishx, bootstrap time is an issue we see here... if we can reduce the time somehow for python packages should be fine.

@scanoss-qg
Copy link
Contributor Author

scanoss-qg commented Jul 5, 2022

Hi @ag4ums and @GMishx : I have fixed the reported issues. It seems that when scanning big projects, some processes were not closed. I have fixed that issue. Regarding to adding our official CLI, we should suffer the same issue but we could solve it by recreating the project first and then scanning it in one shoot. Let leave this improvement for the next agent release.

@GMishx
Copy link
Member

GMishx commented Jul 21, 2022

Hello @scanoss-qg , this time the agent completed successfully but there were no results :-(

The license identified is empty.
image
image

Nothing in the info page as well
image

@scanoss-qg
Copy link
Contributor Author

Thanks @GMishx for your feedback! By default, we are omitting snippet scanning on md files. However, the segmentation fault should not happen, and I think it is related to the json parsing. I will try to reproduce the error. Are you trying to scan logback-v_1.1.7 project? I will make a fix ASAP. Stay tuned! :)

@scanoss-qg scanoss-qg force-pushed the scanoss-qg/2166/scanoss-agent branch from a9b3197 to 0966dda Compare July 27, 2022 12:01
@scanoss-qg
Copy link
Contributor Author

Hi @GMishx. It was hard to replicate the error. It seems that some processes opened by the agent were not closed correctly. I have forced the error creating hundreds of dummy processes and then failed. I think the issue is now fixed. I tested with other bigger projects and works fine. I also placed errno on the LOG_ERROR for debugging purposes. However, you consider that some types of non-source files wll be skipped (eg: .md, xls, xml, etc). I will be waitting your feedback!

@github-actions github-actions bot added the has merge conflicts PR to be rebased label Aug 1, 2022
@github-actions
Copy link

github-actions bot commented Aug 1, 2022

This pull request has conflicts, please rebase with master to resolve those before we can evaluate the pull request.

@scanoss-qg scanoss-qg force-pushed the scanoss-qg/2166/scanoss-agent branch from 0966dda to 521933f Compare August 2, 2022 14:50
@github-actions github-actions bot removed the has merge conflicts PR to be rebased label Aug 2, 2022
@scanoss-qg
Copy link
Contributor Author

Hi @ag4ums and @GMishx, just a quick comment (and not related directly with the agent). After doing the rebase, we did a fresh installation. The post install script fails because of some error related to the DB connection. We found that the file /usr/local/etc/fossology/Db.conf contains a SPDX header with ";"
Removing those headers lines, solves the problem. If you want, I can create a PR for this.

@github-actions github-actions bot added the has merge conflicts PR to be rebased label Nov 30, 2022
@github-actions
Copy link

This pull request has conflicts, please rebase with master to resolve those before we can evaluate the pull request.

@juliancoccia
Copy link

Sorry to bump this thread. Is there anything we can do to help you merge this PR?
Meanwhile, for users willing to build the SCANOSS agent, we have brought up a repo with instructions here:
https://github.com/scanoss/fossology

@GMishx
Copy link
Member

GMishx commented Dec 19, 2022

Hello @scanoss-qg and @juliancoccia , we were waiting to release FOSSology 4.2.0 before merging this PR. Since we have released 4.2.1, we can proceed with merging the scanoss branch to master.

However, we have recently changed our build system from make to CMake, there are some changes required with the branch.

We have updated the documentation for the same and can be refereed from following wikis:

  1. Install from Source
  2. Building Debian packages with CMake

Please feel free to contact us if you need support with CMake.

@scanoss-qg scanoss-qg closed this Dec 22, 2022
@scanoss-qg scanoss-qg force-pushed the scanoss-qg/2166/scanoss-agent branch from 521933f to 97d888b Compare December 22, 2022 11:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants