running llama3.2 1b model with genie-t2t-run costs too much cpu resource

Following the《Tutorial_for_Llama3_2_1B_QCS6490_IoT》document, I am able to convert the llama3.2 1b model and run it with genie-t2t-run on SA8295 android device. The qairt's version is 2.37.1.250807, and the  model's outputs is as expected. but I noticed that the top cpu usage of genie-t2t-run is as high as 200% , while the whole cpu is 800%. 
I am confused as I thought the model should use the NPU resource, so I want to know
1) Is the cpu usage of the genie-2t-run normal?
2) How could I confirm that the model is using npu resource  for inference? Can I use any tools to detect the usage of the npu for the given process?
Looking forward to your reply.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

running llama3.2 1b model with genie-t2t-run costs too much cpu resource #118

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

running llama3.2 1b model with genie-t2t-run costs too much cpu resource #118

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions