flowchart TB
%% Diagram direction: top-to-bottom (TB)
%% Point of Interaction
subgraph Point of Interaction
user((User))
github((Automation via GitHub))
end
%% User Scenarios
subgraph User Scenarios
cweInteractively[Interactively Assign CWEs]
cvePublish[Interactively populate CVE fields for CVE Publication]
bulkAssign[Bulk Assign CWEs]
end
%% Solutions
subgraph Solutions
cweExpert[CWE Expert<br/>🤖 NotebookLM Gemini Pro Free]
cweCvePublisher[CVE CWE Oracle]
bulkAssignSolution[Bulk Assign CWEs]
end
%% Building Blocks
subgraph Building Blocks
getContent[Get CVE Content<br/>Source of Truth 🗃️]
findDuplicates[Find CVE Description Duplicates 🛠️]
assignImpact[Assign MITRE Technical Impact<br/>Gemini FineTuned 🤖]
createVulnDesc[Create Vulnerability Description<br/>PoC. Any LLM 🤖]
getReferences[Get CVE References Content<br>Repo of refined CVE reference content 🗃️]
getReferencesCrawler[Get CVE References Crawler 🛠️<br/> Gemini 🤖]
getKeyphrases[Get CVE Vulnerability Keyphrases.<br>Repo of extracted keyphrases for CVEs 🗃️]
extractKeyphrases[Extract CVE Vulnerability Keyphrases<br>🛠️]
KeyPhraseExtractionModel[KeyPhrase Extraction Model<br>Gemini FineTuned 🤖]
KeyPhraseExtractionModelAnalyzer[KeyPhrase Extraction Model Analyzer<br>to evaluate, and change, the model output 🛠️]
checkQuality[Check Vulnerability Description quality 🛠️]
end
%% Connections between sections
github --> bulkAssign
user --> cweInteractively
user --> cvePublish
cvePublish --> cweCvePublisher
cweCvePublisher --> createVulnDesc
cweCvePublisher --> checkQuality
checkQuality --> getKeyphrases
cweInteractively --> cweExpert
bulkAssign --> bulkAssignSolution
bulkAssignSolution --> getContent
bulkAssignSolution --> getKeyphrases
bulkAssignSolution --> getReferences
getKeyphrases --> assignImpact
getReferences --> getReferencesCrawler
getKeyphrases --> extractKeyphrases
extractKeyphrases --> KeyPhraseExtractionModel
KeyPhraseExtractionModelAnalyzer --> KeyPhraseExtractionModel
%% Styling
style cweCvePublisher fill:#d3d3d3,stroke:#000,stroke-width:2,stroke-dasharray:5 5
style createVulnDesc fill:#d3d3d3,stroke:#000,stroke-width:2,stroke-dasharray:5 5
%% Clickable links (replace with your URLs)
click cweInteractively "https://github.com/CyberSecAI#create-a-cwe-expert-interactively-assign-cwes" "Interactively Assign CWEs" _blank
click vulnDesc "https://github.com/CyberSecAI#create-vulnerability-description-from-advisory-patch-and-other-existing-vulnerability-information" "Create Vulnerability Description" _blank
click checkQuality "https://github.com/CyberSecAI/VulnerabilityDescriptionQualityChecker" "Check Vulnerability Description quality" _blank
click bulkAssign "https://github.com/CyberSecAI#bulk-assign-cwes" "Bulk Assign CWEs" _blank
click assignImpact "https://github.com/CyberSecAI/Mitre_technical_impact_dataset" _blank
click cweExpert "https://github.com/CyberSecAI/CWE-Expert" "CWE Expert Tool" _blank
click createVulnDesc "https://github.com/CyberSecAI#create-vulnerability-description-from-advisory-patch-and-other-existing-vulnerability-information" "Vulnerability Description Tool" _blank
click checkDescQuality "https://github.com/CyberSecAI#create-vulnerability-description-from-advisory-patch-and-other-existing-vulnerability-information" "Check Description Quality Tool" _blank
click bulkAssignSolution "https://github.com/CyberSecAI/top25_cwe_assign_compare" "Bulk Assign Solution Tool" _blank
click getContent "https://github.com/CyberSecAI/cve_source_of_truth" "Get CVE Content" _blank
click findDuplicates "https://github.com/CyberSecAI/cve_dedup" "Find CVE Duplicates" _blank
click getKeyphrases "https://github.com/CyberSecAI/cve_info" "Get Keyphrases" _blank
click getReferences "https://github.com/CyberSecAI/cve_info_refs" "Get CVE References" _blank
click getReferencesCrawler "https://github.com/CyberSecAI/cve_info_refs_crawler" "Get CVE References" _blank
click extractKeyphrases "https://github.com/CyberSecAI/KeyPhraseExtraction" "Extract KeyPhrase" _blank
click KeyPhraseExtractionModel "https://github.com/CyberSecAI/KeyPhraseExtractionModel" "KeyPhrase Extraction Model" _blank
click KeyPhraseExtractionModelAnalyzer "https://github.com/CyberSecAI/keyphrase_analyzer" "KeyPhrase Extraction Model Analyzer" _blank
- CVE Enrichment User Scenarios
- User Stories
- Repo Overview
Tip
See https://github.com/CyberSecAI/cve_info.
For a given CVE Description, the following is available in the json file for that CVE:
- description: original CVE Description
- keyphrases: Vulnerability Key Phrases extracted from the CVE Description per https://www.cve.org/Resources/General/Key-Details-Phrasing.pdf
- Optional: mitre_technical_impacts: The Impact(s) mapped to MITRE Technical Impacts per https://cwe.mitre.org/community/swa/priority.html
Vulnerability Descriptions should use Key Details Phrasing because "the correct amount and type of information in a description is important".
There are many use cases associated with using this Vulnerability Description:
- Assigning other CVE data e.g. CWE, CPE, ....
- Data Analysis and research to determine salient characteristics or trends
Vulnerability Descriptions range significantly in quality, and there are even descriptions that have no vulnerability information.
Where Key Phrases from Vulnerability Description are extracted, this tends to be done adhoc, in isolation, using a variety of tools from RegEx to Language Models. This data is not publicly available or shared.
For all CVEs, the Key Phrases from Vulnerability Description are available in a repository that
- is publicly available
- is consistent format
- allows feedback and updating
- is accurate i.e. the Key Phrases are correct for the Vulnerability Description
This increases the quality of Vulnerability Descriptions, and the associated data derived from them.
Tip
See https://github.com/orgs/CyberSecAI/projects/2/views/1?pane=issue&itemId=86534944&issue=CyberSecAI%7CCWEMap%7C34 comments for Proof Of Concept demo examples of creating a CVE vulnerability Description from a set of links.
Currently, in general, Vulnerability Descriptions are generated manually - and often lack quality and consistency.
MITRE CVE prescribe Description formats https://www.cve.org/Resources/General/Key-Details-Phrasing.pdf.
https://vulnogram.github.io/#editor encourages/supports this in CVE Description:
[PROBLEMTYPE] in [COMPONENT] in [VENDOR] [PRODUCT] [VERSION] on [PLATFORMS] allows [ATTACKER] to [IMPACT] via [VECTOR]
Broadly, there's 2 approaches:
- user provides reference links and any info they already have. A tool generates the content for the CVE description and other fields (using whatever format) and it has quality baked in - and a user reviews/tweaks as required.
- user fills in the fields manually (using whatever format), and a tool checks.
Notes
- Reducing human/manual effort would be my overall approach to improving CVE enrichment i.e. approach 1.
- This relates to extracting data from CVE reference links to inform CWE, and KeyPhrase, MITRE Technical Impact.
A user, based on their understanding of the vulnerability, and what makes a good vulnerability description, manually writes the Vulnerability Description.
For all CVEs, the Vulnerability Descriptions
- follow the template
- are current, complete, correct
- can be auto-generated from existing advisory and patch information per Proof Of Concept.
- can be auto-reviewed and rated for quality
It should be easy for users to do the right thing i.e. the easy path gives the best result. The easy path is to auto create good Vulnerability Descriptions from the
- existing advisory and patch information per Proof Of Concept.
- user supplied info
Tip
See https://github.com/CyberSecAI/VulnerabilityDescriptionQualityChecker for an analyis of how much information is in Vulnerability Descriptions; specifically Key Details Phrases present.
Vulnerability Description Quality can be rated by:
What Key Details Phrases are present e.g.
- https://nvd.nist.gov/vuln/detail/cve-2024-45346#VulnChangeHistorySection original version has no vulnerability information:
"The Xiaomi Security Center expresses heartfelt thanks to Ken Gannon and Ilyes Beghdadi of NCC Group working with Trend Micro Zero Day Initiative! At the same time, we also welcome more outstanding and professional security experts and security teams to join the Mi Security Center (MiSRC) to jointly ensure the safe access of millions of Xiaomi users worldwide Life."
- Only Impact: https://nvd.nist.gov/vuln/detail/CVE-1999-0015:
"Teardrop IP denial of service."
- Product and Weakness:
- https://nvd.nist.gov/vuln/detail/CVE-2022-23264
Microsoft Edge (Chromium-based) Spoofing Vulnerability
- https://nvd.nist.gov/vuln/detail/CVE-2022-21904
Windows GDI Information Disclosure Vulnerability
- https://nvd.nist.gov/vuln/detail/CVE-2022-23264
- Having only characters from a specified character set e.g. ASCII or UTF-8.
- Containing HTML Tags
- e.g. there are 686
<p> and </p>tag pairs, mostly in CVE-2020- CVES, mostly in MicroSoft CVEs - e.g. there are 10
<strong> and </strong>tag pairs
- e.g. there are 686
- Typos, Spelling and grammar issues.
- JSON Schema Conformance issues
CVEProject/cvelistV5/issues/ related to Quality
- CVEProject/cvelistV5#5 "118955 CVE records don't have an affected product/vendor or version"
-
118955 records have not a valid affected software in their details. With some random picks to verify, the software is only noted down in the descriptions[] fields as text, but are not set inside the containers/cna/affected Array inside the JSON file.
-
- CVEProject/cvelistV5#64 "A large number of CVEs do not have products and versions"
-
A large number of CVEs do not have products and versions, but corresponding versions such as fastjson can be found on the CVE website
-
- CVEProject/cvelistV5#19 ""opertion" misspelling"
- CVEProject/cvelistV5#56 "HTML tag inside json"
Vulnerability Description Quality is largely a manual effort - both at Vulnerability Description creation time and CVE publication time.
A rating of the quality of CVE Descriptions is applied
- if a Vulnerability Description does not meet some minimum standard, then it is flagged.
- the quality issues are listed
- an overall score is given
Users can immediately determine if their CVE Description meets the required quality rating.
Low Quality Vulnerability Descriptions are eradicated from published CVEs by
- Detecting Low Quality Vulnerability Descriptions
- Rejecting Low Quality Vulnerability Descriptions
Tip
See https://github.com/cisagov/vulnrichment/issues?q=is%3Aissue%20state%3Aclosed%20author%3ACrashedmind for some auto-generated CWE assignment examples for CISA Vulnrichment.
These include
- CVE Description extracted KeyPhrases
- supporting examples of similar CVEs from the MITRE CWE Observed Examples, and MITRE CWE Top 25 examples
- Assigned CWE with the CWE mapping meta data
Per CWE Guidance
Root cause mapping is the identification of the underlying cause(s) of a vulnerability. This is best done by correlating CVE Records and/or bug or vulnerability tickets with CWE entries. Today, this is not done accurately at scale by the vulnerability management ecosystem.
Accurate root cause mapping is valuable because it directly illuminates where investments, policy, and practices can address the root causes responsible for vulnerabilities so that they can be eliminated. This applies to both industry and government decision makers. Additionally, it enables:
- Driving the removal of classes of vulnerabilities: Root cause mapping encourages a valuable feedback loop into a vendor’s SDLC or architecture design planning
- Saving money: the more weaknesses avoided in your product development, the less vulnerabilities to manage after deployment
- Trend analysis (e.g., how big of a problem is memory safety compared to other problems like injection)
- Further insight to potential “exploitability” based on root cause (e.g., command injection vulnerabilities tend to see increased adversary attention, be targeted by certain actors)
- Organizations demonstrating transparency to customers how they are targeting and tackling problems in their products
The MITRE CWE (Common Weakness Enumeration) specification is a comprehensive list of software and hardware security vulnerabilities. It categorizes weaknesses to help identify and mitigate security flaws. The specification covers various types of weaknesses, from coding errors to design flaws.
While it provides a clear framework, its complexity lies in the extensive classification system, the technical nature of vulnerabilities, and its depth, requiring detailed understanding of security, coding practices, and risk management to effectively use it.
There are ~~1000 CWEs, and the PDF version is almost 3000 pages.
People struggle with the amount of information in MITRE CWE, and may not have the understanding of security required to assign CWEs.
Today, assigning CWEs is not done accurately at scale by the vulnerability management ecosystem. It is generally done manually.
Given a Vulnerability Description and related text (e.g. from bug or vulnerability references), the CWE(s) are automatically assigned with
- the root cause and other weakness highlighted
- the rationale for their choice including other CVE examples from CWE Observed Examples or CWE Top25 Mappings with similar weakness(es) and CWE assignment(s).
- the chain of CWEs from root cause to follow on weaknesses
Tip
💁 A CWE-Expert can be built for free in a browser in less than 1 minute using the instruction and CWE Corpus files provided here.
You can ask your CWE-Expert
- any questions about CWEs
- to assign CWEs to vulnerability descriptions A CWE-Expert can be built for free in a browser in less than 1 minute using the CWE Corpus files and prompts
See https://github.com/CyberSecAI/CWE-Expert
Demo
- Youtube video showing general CWE questions, and CWE assignment for several CVEs
- Some nuanced CWE assignments
As any user, I want to assign CWEs with the assistance of an expert on MITRE CWE specification and security
- to be able to get answers to my questions
- to get recommendations
- to be able to provide input and feedback to the expert
Users need to have a deep understanding of the MITRE CWE specification, or need to spend time searching for the appropriate CWE information.
Users can chat with a CWE expert that can provide CWE recommendations for vulnerability descriptions, and answer general questions of CWEs.
User Stories are shorter, more point specific requirements than User Scenarios.
As any user, I want to check CWEs for CVE Descriptions in bulk automatically, so I can then assign the correct CWEs.
Tip
A Solution Proof Of Concept was implemented that asked a Consensus of 3 state of the art LLMs if the a population of CWEs assigned by CISA Vulnrichment was correct or not.
As any user, I want to know what CVE Descriptions are exact or close (fuzzy) Duplicates
The numbers
- 237863: CVEs at the time of analysis
- 157158: Unique CVE Descriptions
- 80705: Duplicate CVE Descriptions (exact or close (fuzzy) duplicates)
- 21429: Number of duplicate groups
- 669: Largest duplicate group per plot i.e. 669 CVE Descriptions are (almost) same.
- 2: The most common number of duplicates per plot
classDiagram
class VulnerabilityDescriptionQualityChecker {
Analysis of KeyPhrases from published CVEs
This can be used to assess the quality of CVE Descriptions
based on how many of the 8 vulnerability keyphrase types they have.
}
class cve_info {
Vulnerability KeyPhrases extracted from published CVEs
}
class keyphrase_analyzer {
This analyzes and refines the KeyPhrases extracted by the KeyPhraseExtractionModel
to improve the dataset so it can be used to finetune a KeyPhraseExtractionModel with more and better data.
}
class KeyPhraseExtractionModel {
FineTuned LLM for extracting KeyPhrases from published CVEs
}
class cve_source_of_truth {
TBC: Single repo as source of truth
Collates info from NVD, CVEv5project
and other sources
}
class nvd_cve_data {
CVE data downloaded from
NVD to a CSV file
}
class cve_dedup {
An analysis of what CVEs are exact / fuzzy duplicates
}
class CWE_Expert {
The corpus files and instructions to build an interative CWE Expert.
}
class CyberSecAI_github_io {
Source for a guide to using LLMs for Cybersecurity
}
VulnerabilityDescriptionQualityChecker --> cve_info
keyphrase_analyzer --> cve_info
cve_source_of_truth --> nvd_cve_data
cve_info <-- KeyPhraseExtractionModel
