Skip to content

running llama3.2 1b model with genie-t2t-run costs too much cpu resource #118

@Leonslam

Description

@Leonslam

Following the《Tutorial_for_Llama3_2_1B_QCS6490_IoT》document, I am able to convert the llama3.2 1b model and run it with genie-t2t-run on SA8295 android device. The qairt's version is 2.37.1.250807, and the model's outputs is as expected. but I noticed that the top cpu usage of genie-t2t-run is as high as 200% , while the whole cpu is 800%.
I am confused as I thought the model should use the NPU resource, so I want to know

  1. Is the cpu usage of the genie-2t-run normal?
  2. How could I confirm that the model is using npu resource for inference? Can I use any tools to detect the usage of the npu for the given process?
    Looking forward to your reply.

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions