Skip to content

Conversation

@jensroth-git
Copy link
Contributor

  • Updated r3d_pass_final_blit to use render size for HiDPI/Retina support.
  • Introduced r3d_try_internal_format function to probe texture formats and updated texture support checks in r3d_support_check_texture_internal_formats.
  • Fixed custom frame buffer height calculation

- Added background color clearing in the Draw function of examples/dof.c.
- Updated r3d_pass_final_blit to use render size for HiDPI/Retina support.
- Introduced r3d_try_internal_format function to probe texture formats and updated texture support checks in r3d_support_check_texture_internal_formats.
@jensroth-git
Copy link
Contributor Author

Just tested on Windows and Mac, unfortunately can't test on Linux

@Bigfoot71
Copy link
Owner

For the blit, thanks for the fix! I think I’ll switch to a simple “copy” via shader anyway, since blit causes quite a few other problems…

These two cases (format check and blit) are still bothering me, I’ll think about what can be done.

I'm merging and will test and refine this right away.

Thanks!

@Bigfoot71 Bigfoot71 merged commit e2a99d3 into Bigfoot71:master Aug 19, 2025
6 checks passed
@Bigfoot71
Copy link
Owner

@jensroth-git The internal format support verification code has been revised, I decided to simplify everything by following your approach. We now check both format support and whether it can be used as a framebuffer attachment.

Normally, if everything still works, then nothing is broken, but if you could confirm that, it would be great.

The commit is here: a6a2c8a

@jensroth-git
Copy link
Contributor Author

it works great on macOS!

@jensroth-git
Copy link
Contributor Author

I should note though, that on macOS, in the examples when using GetScreenWidth/Height for R3D_Init, it is only half native res, you need to use GetRenderWidth/Height.

Regardless of using the HighDPI flags for setting up the raylib window...

I have not looked into how Raylib handles/doesn't handle this yet.

@Bigfoot71
Copy link
Owner

Thanks, I’ll look into that. However, I don’t have a Mac to test on, so if I do something it’ll be a shot in the dark x)

@jensroth-git
Copy link
Contributor Author

I think its really a raylib issue, not something r3d should concern itself with?

int GetScreenWidth(void);                                   // Get current screen width
int GetScreenHeight(void);                                  // Get current screen height
int GetRenderWidth(void);                                   // Get current render width (it considers HiDPI)
int GetRenderHeight(void);                                  // Get current render height (it considers HiDPI)

so screen width is not even supposed to handle high dpi anyway?
maybe we should just switch the samples to RenderWidth for windows too?

I'll have to check tomorrow what values ScreenWidth/Height gives for different dpi and compare with macOS.

@Bigfoot71
Copy link
Owner

I think its really a raylib issue, not something r3d should concern itself with?

Yes, it seems to me that I’ve already seen this topic quite a few times as an issue on raylib.

As for the examples, I’m not sure what visual impact it has, if it’s really blurry or pixelated, then yes, otherwise leave it as is. Apart from the blit, r3d shouldn’t be handling the rest anyway.

@jensroth-git
Copy link
Contributor Author

jensroth-git commented Aug 20, 2025

well they are rendering at half res, quite noticeable

can I get your input on something?
I'm currently working on optimizing the DoF, but had to implement a simplistic profiling framework first.
I think it will help once we want to optimize particular shaders or paths?

unless you have a better option?

heres my current implementation

// Profile this GPU Section
R3D_PROF_ZONE_GPU("DOF Pass") {
    glBindFramebuffer(GL_FRAMEBUFFER, R3D.framebuffer.pingPong.id);
    {
        glViewport(0, 0, R3D.state.resolution.width, R3D.state.resolution.height);

        r3d_framebuffer_swap_pingpong(R3D.framebuffer.pingPong);

        r3d_shader_enable(screen.dof);
        {
            // ...
        }
        r3d_shader_disable();
    }
}

to get the zone values later there is

double R3D_ProfGetGPUZoneMS(const char *zoneName, int samplesAverage);

// usage to get last 64 frame average
char profText[64];
snprintf(profText, sizeof(profText), "DoF: %.2f", R3D_ProfGetGPUZoneMS("DOF Pass", 64));
DrawText(profText, 10, 10, 20, WHITE);

Maybe I'm overengineering, but it seems quite usable 🤔
and could easily be expanded for measuring min/max, CPU time, and even draw profiling UI separately later on.

@Bigfoot71
Copy link
Owner

Bigfoot71 commented Aug 20, 2025

Oh yes, I thought of that too, actually it would be nice to have something like that beforehand.

Your example looks really good, I have nothing to add, as long as we can set up a precise test in under a minute, that would be perfect.

However, how do you plan to profile the GPU?

Afaik there are queries available in GL 3.3, GL_TIME_ELAPSED and GL_SAMPLES_PASSED could be very useful here


By the way, in your example, I’m not sure if you plan to store the results by name or otherwise, but if it’s only for internal testing, we could simplify things by using a BEGIN / END macro system that would call glBeginQuery and glEndQuery. The END macro could either block the CPU until the result is ready by automatically calling glGetQueryObjectui64v, or, alternatively, store the results to retrieve them all at once later, but that would also require dynamic management for query generation.

It’s just a quick thought, this would need to be carefully verified.

@jensroth-git
Copy link
Contributor Author

jensroth-git commented Aug 20, 2025

heres the current implementation
I'm storing the zones in a struct array, with lookup by name (I'm yearning for std::map tbh 😭), and the results in a ring buffer, I just want to be able to average a certain number of frames to get more stable values,
the DoF runs at 2.9ms at 1920x1080 radius 20 for me atm,
but I noticed its crucial to disable TargetFPS to get accurate results

@jensroth-git
Copy link
Contributor Author

I pushed the pre optimization state with the current profiler onto my fork, if you want to take a look

@Bigfoot71
Copy link
Owner

I read your example carefully, and if I understood the goal, everything could be summed up in a simple macro that would do everything locally:

#define GPU_PROFILE_BLOCK(Name, NSAMPLES, CodeBlock)                          \
do {                                                                          \
    static GLuint query = 0;                                                  \
    static double hist[NSAMPLES] = {0};                                       \
    static int count = 0, index = 0;                                          \
                                                                              \
    if (!query) glGenQueries(1, &query);                                      \
                                                                              \
    glBeginQuery(GL_TIME_ELAPSED, query);                                     \
    do { CodeBlock } while(0);                                                \
    glEndQuery(GL_TIME_ELAPSED);                                              \
                                                                              \
    GLuint64 ns = 0;                                                          \
    glGetQueryObjectui64v(query, GL_QUERY_RESULT, &ns);                       \
    double ms = ns / 1e6;                                                     \
                                                                              \
    hist[index] = ms;                                                         \
    index = (index + 1) % NSAMPLES;                                           \
    if (count < NSAMPLES) count++;                                            \
                                                                              \
    if (count == NSAMPLES) {                                                  \
        double sum = 0.0;                                                     \
        for (int i = 0; i < NSAMPLES; i++) sum += hist[i];                    \
        printf("[GPU] %s avg(%d) = %.3f ms\n", Name, NSAMPLES, sum/NSAMPLES); \
        count = 0;                                                            \
    }                                                                         \
} while(0)

The idea is that you can name the part via 'Name', 'NSAMPLES' is the number of samples you want before getting the average, and the CodeBlock is the code you want to profile

Here's how it can be used:

GPU_PROFILE_BLOCK("DoSomething", 32, {
    glDoSomething();
});

Every 32 runs this will give an average in this form:

[GPU] DoSomething avg(32) = 0.001 ms

I made an example with SDL if you want to try it: https://gist.github.com/Bigfoot71/ac652f10e73b364a5fd5a3b99ef590b3

@jensroth-git
Copy link
Contributor Author

jensroth-git commented Aug 21, 2025

That is much simpler indeed, we could even add code to aggregate into a public profiler later, just not sure if blocking the cpu is going to give incorrect results, or if we should make it work with GL_QUERY_RESULT_AVAILABLE and polling like my version did it?

@Bigfoot71
Copy link
Owner

just not sure if blocking the cpu is going to give incorrect results

No, it won’t give incorrect results, in fact it can even be a bit more precise. What it really tells the CPU is "wait until the GPU has finished the operation and give me the result right away once it’s done"

Overall it will slow the program down since the CPU is blocked, but the GPU which runs in parallel will execute the operation as usual. The GPU behavior may change slightly at the end because of the stall, but the operation itself is still measured correctly.

we could even add code to aggregate into a public profiler later

But that would require a more complex system, one that doesn’t simply stall the CPU each time. And since all GPU operations are actually "flushed" at R3D_End, you would need to measure each pass globally and make the results accessible. At that point you are at the boundary between a rendering framework and a full engine, so I’m not entirely sure what would be the most relevant way to expose it.

I’m also not sure such GPU timing should be included in release builds. Either it’s only exposed in debug builds (though not many developers will necessarily compile R3D in debug when making a game), or it would need to be a more advanced system that is dynamic and adds virtually no overhead when not active.

Most engines only include their profiler in the editor and simply don’t ship it in the final release build, which makes things "simpler" on that side...

@Bigfoot71
Copy link
Owner

Bigfoot71 commented Aug 22, 2025

Just a note, the solution I proposed remains a quick solution when developing a feature and was not intended to be included publicly, the macro does not even release the generated query...

So yes, your system could be included, but I’m not sure yet about the right way to do it.

And some operations are currently difficult to profile accurately (globally), such as draw calls, and that’s really due to the internal design...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants