HDLParser is a tool of collecting patch-related commits and extracting real bug fixes in hardware description languages (HDLs). It can automatically collects bug fixing commits from HDL repositories, and parses code changes of patches leveraging hdlConvertor and GumTree. Furthermore, it can measure the redundancy of bug fixing commits.
- 1. HDLParser
 
An important problem is the lack of the knowledge on the characteristics of bug fixes in HDLs. Such knowledge will boost the understanding of hardware developers and provide useful insights to new research direction towards automated bug fixing in HDLs.
However, few studies focus on bug fixes of HDLs, which hinders the proposal of APR techniques targeting HDLs. There are mainly two barriers. On one hand, there is lack of research to study the characteristics of bug fixes in HDLs. On the other hand, whether the redundancy assumption still holds in HDLs has not been validated for now.
With such motivation, we propose an automated technique named HDLParser for analysis of bug fixes in HDLs. We run HDLParser to make a fine-granularity analysis of patches and validate the redundancy assumption on bug fixing commits. We obtain some interesting findings. All the relevant artifacts are available in this repository.
- Ubuntu 20.04
 - Python >= 3.8.0
 - hdlConvertor
 - GumTree 3.0.0
 
The parsing script is used to get the AST of HDL files by hdlConvertor, and then transform the AST to the xml format as the input of GumTree. The steps to use the parsing script are as followed:
- Adding 
hdlparser/hdlparserto the system path - hdlparser can be used as a standalone tool like this: 
hdlparser /path/to/HDLfile 
The support for HDLs can be configured with reference to GumTree's support for Python. The configurated files are placed in gumtree-3.0.0-SNAPSHOT.
Repositories in HDLs
- Verilog: e200_opensource, picorv32, wujian100, darkriscv, hw, amiga2000-gfxcard, verilog-ethernet, hdl, zipcpu, miaow
 - VHDL: ghdl, aws-fpga, Open-Source-FPGA-Bitcoin-Miner, FPGA_Webserver, chipwhisperer, neorv32, gplgpu, vunit, gcvideo, awesome-model-quantization
 - SystemVerilog: opentitan, swerv_eh1, MinecraftHDL, rsd, hdmi, ibex, lowrisc-chip, cv32e40p, Cores-SweRV, nontrivial-mips
 
commads
./collect_subjects.shAfter runing it, for Verilog, VHDL and SystemVerilog, there are ten repositories cloned intosubjects,subjects2,subject3respectively.
1.2.3.2. Collecting patch-related commits, parsing code changes of patches and measuring commit redundancy
./run.sh
- If it executes successfully
- The first step makes statistics of project LOC, which show the code line numbers of all projects respectively.
 - The second step collects bug-fix-related commits with bug-related keywords from project repositories.
It also will fileter out changes of test code. Its output consists of three kinds of files. The results in Verilog, VHDL, SystemVerilog are stored in 
data,data2,data3respectively.- Buggy version of a HDL code file containing a bug, stored in the directory "
<HDLdata>/PatchCommits/Keyword/<ProjectName>/prevFiles/". - Fixed version of the Java code file, stored in the directory "
<HDLdata>/PatchCommits/Keyword/<ProjectName>/revFiles/". - Diff Hunk of the code changes of fixing the bug, stored in the directory "
<HDLdata>/PatchCommits/Keyword/<ProjectName>/DiffEntries/". 
 - Buggy version of a HDL code file containing a bug, stored in the directory "
 - The third step will further filter out the HDL code files that only contain non-HDL code changes (e.g. comments).
 - The fourth step makes statistics of diff hunk sizes of code changes. The results will be stored in the directory "
<HDLdata>/DiffentrySizes/". - The fifth step will parse code changes of patches and make statistics of fine-grained code entities impatced by patches. The results will be stored in the directory "
<HDLdata>/ParseResults/". Meanwhile, the fix patterns are also collected and stored in the directory "<HDLdata>/ParseResults/". - The sixth step will perform a measurement of the redundancy of the patch-related commits. The results will stored in the directory "
<HDLdata>/ParseResults/". 
 
If correctly executed, HDLParser can provide detailed information (e.g. occurrence of buggy codes and corresponding repair actions) that hardware developers deepen their understanding on real bug fixes. The knowledge can help developers repair program effectively.
Based on the repair actions parsed from collected patches, HDLParser can provide the most frequent fix patterns that facilitate the design of pattern-based APR towards HDLs.
HDLParser validates the redundancy assumption of bug fixing commits in HDLs that is fundamental assumption of various APR techniques. The redundancy assumption provides a smaller search space for donor codes.
These two areas of knowledge can support the development of APR in HDLs.
We will consistently develop and maintain this project to make it a better tool for the community. Also, all contributions are welcome.