Following the《Tutorial_for_Llama3_2_1B_QCS6490_IoT》document, I am able to convert the llama3.2 1b model and run it with genie-t2t-run on SA8295 android device. The qairt's version is 2.37.1.250807, and the model's outputs is as expected. but I noticed that the top cpu usage of genie-t2t-run is as high as 200% , while the whole cpu is 800%.
I am confused as I thought the model should use the NPU resource, so I want to know
- Is the cpu usage of the genie-2t-run normal?
- How could I confirm that the model is using npu resource for inference? Can I use any tools to detect the usage of the npu for the given process?
Looking forward to your reply.