-
Notifications
You must be signed in to change notification settings - Fork 483
feat(scanoss-agent): Initial version of scanoss agent for Fossology #2167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(scanoss-agent): Initial version of scanoss agent for Fossology #2167
Conversation
8b10b74
to
d86fd85
Compare
We have updated mod_deps file in order to install library dependencies. Please, verify buiding process again. |
This pull request has conflicts, please rebase with master to resolve those before we can evaluate the pull request. |
435364e
to
9b09ce1
Compare
Done with the rebase! If I can be of assistance, please do not hesitate to contact me @ag4ums |
9b09ce1
to
0941e8f
Compare
@scanoss-qg, I just tested this PR, I have few questions |
Dear @ag4ums, As far as we understand, FOSSology takes a single field of data from the agent: License name. We can of course provide you with a comprehensive array with matching data (range of lines match, component name, vendor, PURL, download URL, etc,etc). However, providing any of these extra metadata would require an update to FOSSology’s data model. Having said that, the scope of this first agent is to provide you the ability to detect a license on a file that not necesarily contains a license header. It is understood that, unlike all other agents, the license provided by SCANOSS is obtained by comparing the file (or snippets) against our Open Source Knowledge. We understand that this initial agent already brings to FOSSology a capability not available until now. Would it make sense to merge this contribution and then discuss in detail any further integration? It would be nice to hear your feedback Best Regards, |
hello @scanoss-qg, let me discuss this with the community members, as I think this feature in its current from is not mergeable.
|
732b1f9
to
6cfcb88
Compare
Hi again @ag4ums: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems all changes are there
looking into it.... |
Hi @ag4ums , Sorry for the insistence. Did you have the chance to take a look on the new version of the agent? Once merged this basic functionality , we can continue adding more features. Thanks in advance! |
Hi @scanoss-qg , I have looked into it... many thanks for this PR,... as of now it's like other agent that find the licenses of a file,.....the snippet finding information are still missing...may be they are for future implementations.... what may be good here is a plan of all the features that you are planning to implement.. may be an issue....with some milestone... may be you can also consider joining our community meeting to discuss it(link is in wiki). ...... in current situation I would like to merge this PR as an experimental feature until we have basic snippet scanning feature included .... we can discuss this once you have the issues/feature list created. |
src/lib/php/Data/AgentRef.php
Outdated
'scancode' => 'S', | ||
'spasht' => 'Sp', | ||
'reso' => 'Rs' | ||
'reso' => 'Rs', | ||
'scanoss' => 'Sc' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ag4ums should the scancode
be renamed to Sc
and scanoss
to So
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/cc @ag4ums
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guys, All Suggestions have been accepted and corrected. SSL and OpenSSL dependencies have been removed. Using GCrypt library instead.
We have also added a README.md as a short guide
@scanoss-qg Can you please update the commit messages as per CONTRIBUTING.md. Also please squash the commits to one. |
d5a39af
to
009f32b
Compare
@shaheemazmalmmd @GMishx @ag4ums I think that we have commited all the suggestions. Please, let us know if you need further information/modifications. |
src/scanoss/mod_deps
Outdated
case "$DISTRO" in | ||
Debian|Ubuntu) | ||
apt-get $YesOpt install \ | ||
libcurl4 libcurl4-gnutls-dev libssl-dev jq |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can the openssl dependencies be replaced with libgcrypt20-dev
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
libssl dependency was removed from src and makefile. I will remove the dependency from mod_deps too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ag4ums , please check the changes before merging PR.
@scanoss-qg , is the scanning working in general? @shaheemazmalmmd , @GMishx |
HI @ag4ums! It seems to be a SQL issue caused by a non clean uninstall. I will go back to you soon. |
Hi @ag4ums, as I said before, the issue seems to be related to non clean installation. Anyway I have modified the code to be able to recover to this kind of situations. Please, let me know if you find something strange, perhaps we could schedule a meeting to finally close this ticket. |
When I first run the agent, it sends following line to scheduler: Running the agent, multiple warning and even some segmentation faults were generated:
Scanner finally failed after scanning 325/2086 files with following message:
|
I ran the agent from command line to capture the coredump and found following issue:
If |
The segmentation fault in the job view are from bellow:
|
Thanks for your feedback, I really appreciate it.
Can you share me the project you are trying to scan?
Core was generated by `scanner /tmp/scanoss/scanoss.tmp -o
/tmp/scanoss/scanoss.tmp.json -H https://os'.
As far as I can see, there is a problem there with the url of the API.
Could you check the value on Sysconfig table of DB the variablename=ScAPIURL
I will continue woirking today on this, any information is welcome.
…On Fri, Jul 1, 2022 at 6:43 AM Gaurav Mishra ***@***.***> wrote:
I ran the agent from command line to capture the coredump and found
following issue:
$ echo "321" | ./src/scanoss/agent/scanoss --userID=3 --groupID=3 --scheduler_start
VERSION: "4.1.0.17"
OK
FATAL snippet_scan.c.206: Snippet_scan: The scan throws an invalid result
Segmentation fault (core dumped)
$ gdb src/scanoss/agent/scanoss /var/crash/core-scanoss-11-1000-1000-41585-1656668103
#0 0x00007f44bd392217 in fclose () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x0000562285fb5b53 in scanTempFile (key=16310) at snippet_scan.c:209
#2 0x0000562285fb5ef2 in ProcessUpload ***@***.***=321) at snippet_scan.c:369
#3 0x0000562285fb4a4d in main (argc=<optimized out>, argv=<optimized out>) at main.c:248
Here:
https://github.com/fossology/fossology/pull/2167/files#diff-35a047fd2940234261f524e719c1efcc8e80a90c71e3fdf7dd51d37b1809550cR204-R209
If f == null then fclose(f) will cause issue.
—
Reply to this email directly, view it on GitHub
<#2167 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ASTI3FF5N32PKV3S4UJN7TTVR24SRANCNFSM5PFAHOIQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
I scanned logback-v_1.1.7.
The URL is correct (
Sure thanks. |
While you are working on it @scanoss-qg , I would recommend to add following changes as well for Debian packaging: diff --git a/debian/control b/debian/control
index 727dd2899..e50afd49b 100644
--- a/debian/control
+++ b/debian/control
@@ -8,7 +8,7 @@ Build-Depends: debhelper (>=9~), libglib2.0-dev, libmagic-dev, libxml2-dev,
libboost-program-options-dev, libjsoncpp-dev, libjson-c-dev, libpq-dev,
php7.0-cli|php7.2-cli|php7.3-cli|php7.4-cli, php-mbstring, php-zip,
php-xml, libboost-system-dev, libboost-filesystem-dev, libgcrypt20-dev,
- composer
+ composer, libcurl4, libcurl4-gnutls-dev
Standards-Version: 3.9.1
Homepage: https://fossology.org
@@ -392,6 +392,19 @@ Description: architecture to fetch license, copyright information from scancode.
.
This package contains the scancode agent programs and their resources.
+Package: fossology-scanoss
+Architecture: any
+Depends: fossology-common, fossology-ununpack, fossology-wgetagent,
+ ${shlibs:Depends}, ${misc:Depends}
+Description: architecture to fetch license and snippet information from scanoss.
+ The FOSSology project is a web based framework that allows you to
+ upload software to be picked apart and then analyzed by software agents
+ which produce results that are then browsable via the web interface.
+ Existing agents include license analysis, metadata extraction, and MIME
+ type identification.
+ .
+ This package contains the scanoss agent programs and their resources.
+
Package: fossology-spasht
Architecture: any
Depends: fossology-common, ${shlibs:Depends}, ${misc:Depends}
diff --git a/src/scanoss/mod_deps b/src/scanoss/mod_deps
index c267a79da..e3c3e76c9 100755
--- a/src/scanoss/mod_deps
+++ b/src/scanoss/mod_deps
@@ -2,7 +2,7 @@
######################################################################
# SCANOSS Agent for FOSSLogy
# Copyright (C) 2018-2022 SCANOSS.COM
-#
+#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 2 of the License, or
@@ -77,7 +77,7 @@ if [[ $BUILDTIME ]]; then
case "$DISTRO" in
Debian|Ubuntu)
apt-get $YesOpt install \
- libcurl4 libcurl4-gnutls-dev libssl-dev jq
+ libcurl4 libcurl4-gnutls-dev jq
;;
RedHatEnterprise*|CentOS|Fedora)
yum $YesOpt install \
@@ -91,7 +91,7 @@ if [[ $RUNTIME ]]; then
case "$DISTRO" in
Debian|Ubuntu)
apt-get $YesOpt install \
- libcurl4 libcurl4-gnutls-dev libssl-dev jq
+ libcurl4 libcurl4-gnutls-dev jq
;;
RedHatEnterprise*|CentOS|Fedora)
yum $YesOpt install \ |
Sure, I will take your advice. Thank you!
…On Fri, Jul 1, 2022 at 7:15 AM Gaurav Mishra ***@***.***> wrote:
While you are working on it @scanoss-qg <https://github.com/scanoss-qg> ,
I would recommend to add following changes as well for Debian packaging:
diff --git a/debian/control b/debian/control
index 727dd2899..e50afd49b 100644--- a/debian/control+++ b/debian/control@@ -8,7 +8,7 @@ Build-Depends: debhelper (>=9~), libglib2.0-dev, libmagic-dev, libxml2-dev,
libboost-program-options-dev, libjsoncpp-dev, libjson-c-dev, libpq-dev,
php7.0-cli|php7.2-cli|php7.3-cli|php7.4-cli, php-mbstring, php-zip,
php-xml, libboost-system-dev, libboost-filesystem-dev, libgcrypt20-dev,- composer+ composer, libcurl4, libcurl4-gnutls-dev
Standards-Version: 3.9.1
Homepage: https://fossology.org
@@ -392,6 +392,19 @@ Description: architecture to fetch license, copyright information from scancode.
.
This package contains the scancode agent programs and their resources.
+Package: fossology-scanoss+Architecture: any+Depends: fossology-common, fossology-ununpack, fossology-wgetagent,+ ${misc:Depends}+Description: architecture to fetch license and snippet information from scanoss.+ The FOSSology project is a web based framework that allows you to+ upload software to be picked apart and then analyzed by software agents+ which produce results that are then browsable via the web interface.+ Existing agents include license analysis, metadata extraction, and MIME+ type identification.+ .+ This package contains the scanoss agent programs and their resources.+
Package: fossology-spasht
Architecture: any
Depends: fossology-common, ${shlibs:Depends}, ${misc:Depends}diff --git a/src/scanoss/mod_deps b/src/scanoss/mod_deps
index c267a79da..e3c3e76c9 100755--- a/src/scanoss/mod_deps+++ b/src/scanoss/mod_deps@@ -2,7 +2,7 @@
######################################################################
# SCANOSS Agent for FOSSLogy
# Copyright (C) 2018-2022 SCANOSS.COM-# +#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 2 of the License, or@@ -77,7 +77,7 @@ if [[ $BUILDTIME ]]; then
case "$DISTRO" in
Debian|Ubuntu)
apt-get $YesOpt install \- libcurl4 libcurl4-gnutls-dev libssl-dev jq+ libcurl4 libcurl4-gnutls-dev jq
;;
RedHatEnterprise*|CentOS|Fedora)
yum $YesOpt install \@@ -91,7 +91,7 @@ if [[ $RUNTIME ]]; then
case "$DISTRO" in
Debian|Ubuntu)
apt-get $YesOpt install \- libcurl4 libcurl4-gnutls-dev libssl-dev jq+ libcurl4 libcurl4-gnutls-dev jq
;;
RedHatEnterprise*|CentOS|Fedora)
yum $YesOpt install \
—
Reply to this email directly, view it on GitHub
<#2167 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ASTI3FC2K24KPT3FK24QBITVR3AK7ANCNFSM5PFAHOIQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
I would like to ask you a question that could simplify the agent.
Currently, the agent is wrapping the deprecated version of our scanner
written in C and not contain advanced features. Would it be possible to
replace it with the python client that is currently in production?
I mean, just add the dependency during installation (pip3 install scanoss)
and then the agent can use it directly. Or should we include and compile
also the scanner source code too?
This fact would let us simplify the code, keep the scanner updated and make
scans faster. We are ready to deliver a High Precision Snippet Matching
feature that brings accuracy and precision to the scan.
I'd really apreciate your feedback!
…On Fri, Jul 1, 2022 at 7:18 AM Quique Goni ***@***.***> wrote:
Sure, I will take your advice. Thank you!
On Fri, Jul 1, 2022 at 7:15 AM Gaurav Mishra ***@***.***>
wrote:
> While you are working on it @scanoss-qg <https://github.com/scanoss-qg>
> , I would recommend to add following changes as well for Debian packaging:
>
> diff --git a/debian/control b/debian/control
> index 727dd2899..e50afd49b 100644--- a/debian/control+++ b/debian/control@@ -8,7 +8,7 @@ Build-Depends: debhelper (>=9~), libglib2.0-dev, libmagic-dev, libxml2-dev,
> libboost-program-options-dev, libjsoncpp-dev, libjson-c-dev, libpq-dev,
> php7.0-cli|php7.2-cli|php7.3-cli|php7.4-cli, php-mbstring, php-zip,
> php-xml, libboost-system-dev, libboost-filesystem-dev, libgcrypt20-dev,- composer+ composer, libcurl4, libcurl4-gnutls-dev
> Standards-Version: 3.9.1
> Homepage: https://fossology.org
> @@ -392,6 +392,19 @@ Description: architecture to fetch license, copyright information from scancode.
> .
> This package contains the scancode agent programs and their resources.
> +Package: fossology-scanoss+Architecture: any+Depends: fossology-common, fossology-ununpack, fossology-wgetagent,+ ${misc:Depends}+Description: architecture to fetch license and snippet information from scanoss.+ The FOSSology project is a web based framework that allows you to+ upload software to be picked apart and then analyzed by software agents+ which produce results that are then browsable via the web interface.+ Existing agents include license analysis, metadata extraction, and MIME+ type identification.+ .+ This package contains the scanoss agent programs and their resources.+
> Package: fossology-spasht
> Architecture: any
> Depends: fossology-common, ${shlibs:Depends}, ${misc:Depends}diff --git a/src/scanoss/mod_deps b/src/scanoss/mod_deps
> index c267a79da..e3c3e76c9 100755--- a/src/scanoss/mod_deps+++ b/src/scanoss/mod_deps@@ -2,7 +2,7 @@
> ######################################################################
> # SCANOSS Agent for FOSSLogy
> # Copyright (C) 2018-2022 SCANOSS.COM-# +#
> # This program is free software: you can redistribute it and/or modify
> # it under the terms of the GNU General Public License as published by
> # the Free Software Foundation, either version 2 of the License, or@@ -77,7 +77,7 @@ if [[ $BUILDTIME ]]; then
> case "$DISTRO" in
> Debian|Ubuntu)
> apt-get $YesOpt install \- libcurl4 libcurl4-gnutls-dev libssl-dev jq+ libcurl4 libcurl4-gnutls-dev jq
> ;;
> RedHatEnterprise*|CentOS|Fedora)
> yum $YesOpt install \@@ -91,7 +91,7 @@ if [[ $RUNTIME ]]; then
> case "$DISTRO" in
> Debian|Ubuntu)
> apt-get $YesOpt install \- libcurl4 libcurl4-gnutls-dev libssl-dev jq+ libcurl4 libcurl4-gnutls-dev jq
> ;;
> RedHatEnterprise*|CentOS|Fedora)
> yum $YesOpt install \
>
> —
> Reply to this email directly, view it on GitHub
> <#2167 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/ASTI3FC2K24KPT3FK24QBITVR3AK7ANCNFSM5PFAHOIQ>
> .
> You are receiving this because you were mentioned.Message ID:
> ***@***.***>
>
|
The scancode wrapper runs similarly: fossology/src/scancode/agent/scancode_wrapper.cc Lines 86 to 93 in 20ca3a4
And installs the agent as a pip package: fossology/install/fo-install-pythondeps Lines 163 to 176 in 20ca3a4
Just a heads up with the problems we are facing with scancode:
If scanoss does not have such bootstrap time, it is possible to do it. What do you think about the suggestion @ag4ums? |
completely agree @GMishx, bootstrap time is an issue we see here... if we can reduce the time somehow for python packages should be fine. |
Hi @ag4ums and @GMishx : I have fixed the reported issues. It seems that when scanning big projects, some processes were not closed. I have fixed that issue. Regarding to adding our official CLI, we should suffer the same issue but we could solve it by recreating the project first and then scanning it in one shoot. Let leave this improvement for the next agent release. |
Hello @scanoss-qg , this time the agent completed successfully but there were no results :-( |
Thanks @GMishx for your feedback! By default, we are omitting snippet scanning on md files. However, the segmentation fault should not happen, and I think it is related to the json parsing. I will try to reproduce the error. Are you trying to scan logback-v_1.1.7 project? I will make a fix ASAP. Stay tuned! :) |
a9b3197
to
0966dda
Compare
Hi @GMishx. It was hard to replicate the error. It seems that some processes opened by the agent were not closed correctly. I have forced the error creating hundreds of dummy processes and then failed. I think the issue is now fixed. I tested with other bigger projects and works fine. I also placed errno on the LOG_ERROR for debugging purposes. However, you consider that some types of non-source files wll be skipped (eg: .md, xls, xml, etc). I will be waitting your feedback! |
This pull request has conflicts, please rebase with master to resolve those before we can evaluate the pull request. |
0966dda
to
521933f
Compare
Hi @ag4ums and @GMishx, just a quick comment (and not related directly with the agent). After doing the rebase, we did a fresh installation. The post install script fails because of some error related to the DB connection. We found that the file /usr/local/etc/fossology/Db.conf contains a SPDX header with ";" |
This pull request has conflicts, please rebase with master to resolve those before we can evaluate the pull request. |
Sorry to bump this thread. Is there anything we can do to help you merge this PR? |
Hello @scanoss-qg and @juliancoccia , we were waiting to release FOSSology 4.2.0 before merging this PR. Since we have released 4.2.1, we can proceed with merging the scanoss branch to master. However, we have recently changed our build system from make to CMake, there are some changes required with the branch. We have updated the documentation for the same and can be refereed from following wikis: Please feel free to contact us if you need support with CMake. |
521933f
to
97d888b
Compare
This PR relates to Add SCANOSS Agent #2166
The agent detects licenses by querying file info from osskb DB. This agent fingerprints all the files related to the upload and query information about that file. Only licenses are kept, other information is discarded. The scan results are placed on license_file table.
Description
This agent fingerprints all the files related to the upload and query information about that file. Only licenses are kept, other information is discarded. The scan results are placed on license_file table.
Changes
This is the initial commit that only report license information. Other useful information such as copyright and version could be included
How to test
Just do an upload and check on "SCANOSS Toolkit" from optional analysis