Details are organized in LeNet-5.pdf.
python
- Train the original model and light model.
- Get the data from trained model such as weights and biases.
- Get the input data to be tested and the answer.
data
- Created by
/python/*_training.ipynband/python/get_data_*.ipynb. - Trained weights and biases are stored as text. (
*.data) - Weights and biases for original model are in
/data/origin. - Weights and biases for light model are in
/data/lite.
cplusplus
- Implemented according to each model and data type.
origin: Original model with floating point data type.floating-point: Light model with floating point data type.fixed-point: Light model with fixed point data type.hls-stream: Light model withhls::streamdata type.hls-parallel: Partially parallel model withhls::streamdata type.etc: Analyze result, save input txt file as a binary, determine number of fixed point interger part, or generate weight array.
hls
dtype: Compare floating point and fixed point in terms of latency, resource, max frequency through simple function.predict: Partially parallel model that can be synthesized with HLS, that the top function name ispredict.
arm
- SW driver code that read input, check accuracy, measure latency.
- Read input binary file through SD card.
- Run 10,000 cases to measure accuracy.
- Use the AXI Timer IP on the PL (Programmable Logic) side to measure the latency.
etc
waveform: Waveform obtained through Integrated Logic Analyzer (ILA).BlockDesign.pdf: Block design of programmable logic.LeNet-5.pdf: Presentation pdf file that summarizes the content of this LeNet-5 project.
python
- Tensorflow (2.10.0) must be installed in advance.
- Since the extension of the file is
*.ipynb, if you use vscode, it is recommended to installJupyterextension. - If you run all the
*.ipynbfiles,/datafolder will be generated.
cplusplus
- Each directories can be compiled by the command
make. - In most cases, it can be run by the command
./main. - Exceptionally, there are three options for
floating-pointdirectories.- Run just one input.
./main --input ../../data/input_N.data./main -i ../../data/input_N.data
- Run one input, and print intermediate results.
./main --input ../../data/input_N.data --print./main -i ../../data/input_N.data -p
- Run all (Check accuracy and find Max/Min value of intermediate outputs, weights and biases).
./main --all./main -a
- Run just one input.
- The AXI DMA provides high-speed data movement between system memory and an AXI4-Stream based target IP.
- The Integrated Logic Analyzer (ILA) IP core is logic analyzer core that can be used to monitor the internal signals of a design.
- The AXI Timer provides an AXI4-Lite interface to communicate with the PS (Processing System).
- Interrupt is driven when the
ap_doneblock level interface signal is active High.
- Initialize AXI DMA IP and predict IP through
dmaInit()andpredictInit(). - Flush the cache before transferring data via DMA through
cacheFlush(). - AXI DMA IP reads data through DDR and transfers it to predict IP through
dataTx(). - Wait for the preidct IP to process, and read the result when interrupt signal is raised.
- Predict IP on PL is 40.34x faster than Original with -O0 compile option on PS.
- Predict IP on PL is 7.38x faster than Original with -O2 compile option on PS.
- Predict IP on PL is 16.01x faster than Lite with -O0 compile option on PS.
- Predict IP on PL is 1.57x faster than Lite with -O3 compile option on PS.
- SDK issue [Closed] Issue #1