Skip to content

Conversation

@ziadhany
Copy link
Collaborator

@ziadhany ziadhany commented Dec 23, 2025

@ziadhany ziadhany marked this pull request as ready for review December 24, 2025 10:47
@ziadhany
Copy link
Collaborator Author

ziadhany commented Dec 24, 2025

OSV Logs , importers:

  • pypa_importer_v2
  • pysec_importer_v2
  • oss_fuzz_importer_v2
  • github_osv_importer_v2

osv_v2.zip

Add support to collect commits

Signed-off-by: ziad hany <[email protected]>
Use parse_advisory_data_v3 for GitHub OSV.

Signed-off-by: ziad hany <[email protected]>
Update the function docs osv_v2

Signed-off-by: ziad hany <[email protected]>
Fix CVSSv4 vector length issue

Signed-off-by: ziad hany <[email protected]>
Signed-off-by: ziad hany <[email protected]>
…es to keep ranges consistent.

Signed-off-by: ziad hany <[email protected]>
{
"reference_id": "",
"reference_type": "",
"url": "http://git.fedorahosted.org/cgit/freeipa.git/commit/?id=a1991aeac19c3fec1fdd0d184c6760c90c9f9fc9"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we classify them as commit type ?

Copy link
Collaborator Author

@ziadhany ziadhany Dec 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that this is passed as part of the reference URLs, so it’s not easy to detect whether it’s a commit URL or just an article. We have already classified some of these as reference_type=commit because they were passed as a Git range.

I think one option is to improve our capabilities to parse different commit URLs in the packageurl-python library and rely on improvers like pipelines/v2_improvers/collect_commits.py to handle this case.

@ziadhany
Copy link
Collaborator Author

ziadhany commented Dec 31, 2025

pypa_importer_v2

from vulnerabilities.models import AdvisoryV2
from django.db.models import Count
duplicates = (
    AdvisoryV2.objects
    .values('avid')
    .annotate(count=Count('id'))
    .filter(count__gt=1)
)
len(duplicates)
Out[5]: 0
AdvisoryV2.objects.count()
Out[6]: 3227

pysec_v2.txt

from vulnerabilities.models import AdvisoryV2
from django.db.models import Count
duplicates = (
    AdvisoryV2.objects
    .values('avid')
    .annotate(count=Count('id'))
    .filter(count__gt=1)
)
len(duplicates)
Out[5]: 0
AdvisoryV2.objects.count()
Out[6]: 3289

oss_fuzz_v2 has a duplicated advisory_id ex:
https://github.com/search?q=repo%3Agoogle%2Foss-fuzz-vulns+OSV-2023-152&type=code
I’m not sure what the best option is in this case. Should we use the full path, like
vulns/https:/gitlab.com/wireshark/wireshark.git/OSV-2023-152.yaml as the advisory_id to avoid duplication?

from vulnerabilities.models import AdvisoryV2
from django.db.models import Count
duplicates = (
    AdvisoryV2.objects
    .values('avid')
    .annotate(count=Count('id'))
    .filter(count__gt=1)
)
len(duplicates)
Out[5]: 5
duplicates
Out[6]: <AdvisoryQuerySet [{'avid': 'oss_fuzz_importer_v2/OSV-2023-69', 'count': 2}, {'avid': 'oss_fuzz_importer_v2/OSV-2023-152', 'count': 2}, {'avid': 'oss_fuzz_importer_v2/OSV-2022-1108', 'count': 2}, {'avid': 'oss_fuzz_importer_v2/OSV-2023-38', 'count': 2}, {'avid': 'oss_fuzz_importer_v2/OSV-2023-49', 'count': 2}]>
AdvisoryV2.objects.count()
Out[7]: 3781

github_osv_importer_v2:

from vulnerabilities.models import AdvisoryV2
from django.db.models import Count
duplicates = (
    AdvisoryV2.objects
    .values('avid')
    .annotate(count=Count('id'))
    .filter(count__gt=1)
)
duplicates
Out[5]: <AdvisoryQuerySet []>
len(duplicates)
Out[6]: 0


scoring_elements = models.CharField(
max_length=150,
max_length=200,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is a maximum length of 200 characters enough for CVSS v4? I think so.

@ziadhany
Copy link
Collaborator Author

ziadhany commented Dec 31, 2025

For logs, we need to make a clear decision about which logs and data we ignore, and which ones we need to fix/support.

pysec_importer_v2:

ERROR 2025-12-31 05:03:11.688415 UTC Unsupported PyPI advisory data file: GHSA-227r-w5j2-6243.json
ERROR 2025-12-31 05:03:12.411085 UTC Unsupported PyPI advisory data file: MAL-2025-917.json

oss_fuzz_importer_v2

Unsupported package type: {'package': {'name': 'libbpf', 'ecosystem': 'OSS-Fuzz'}, 'ranges': [{'type': 'GIT', 'repo': 'https://github.com/libbpf/libbpf', 'events': [{'introduced': '421213a052aebb0c357b6d0872d6c57f2113800d'}, {'fixed': '741277511035893c72a34df05da3b943afa747a4'}]}], 'versions': ['v0.6.0', 'v0.6.1', 'v0.7.0', 'v0.8.0', 'v0.8.1', 'v1.0.0', 'v1.0.1'], 'ecosystem_specific': {'severity': 'HIGH'}} in OSV: 'OSV-2021-1576'

github_osv_importer_v2

Unsupported package type: {'package': {'ecosystem': 'Go', 'name': 'github.com/cortexproject/cortex'}, 'ranges': [{'type': 'ECOSYSTEM', 'events': [{'introduced': '0'}, {'last_affected': '1.9.0'}]}]} in OSV: 'GHSA-jphm-g89m-v42p'
(This appears to be a bug.) Unsupported package type: {'package': {'ecosystem': 'crates.io', 'name': 'deno'}, 'ranges': [{'type': 'ECOSYSTEM', 'events': [{'introduced': '1.5.0'}, {'fixed': '1.10.2'}]}], 'database_specific': {'last_known_affected_version_range': '<= 1.10.1'}} in OSV: 'GHSA-xpwj-7v8q-mcgj'
Unsupported package type: {'package': {'ecosystem': 'Pub', 'name': 'archive'}, 'ranges': [{'type': 'ECOSYSTEM', 'events': [{'introduced': '0'}, {'fixed': '3.3.8'}]}], 'database_specific': {'last_known_affected_version_range': '<= 3.3.7'}} in OSV: 'GHSA-r285-q736-9v95'
Unsupported package type: {'package': {'ecosystem': 'SwiftURL', 'name': 'github.com/marmelroy/Zip'}, 'ranges': [{'type': 'ECOSYSTEM', 'events': [{'introduced': '0'}, {'last_affected': '2.1.2'}]}]} in OSV: 'GHSA-g454-wj9r-jpg4'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants