Skip to content

Conversation

@AndrianBdn
Copy link
Contributor

I came across libdns while checking an outdated Caddy DNS provider module that depends on it.

I noticed there were no functional tests defined - only a recommendation to test. Because of that, many providers have issues.

To help, I added a small set of end-to-end tests and a simple in-memory (non-synchronized) DNS provider. It is a basic implementation that passes the tests, to test the tests.

I also implemented these tests on my Cloudflare libdns provider fork:
libdns/cloudflare@master...AndrianBdn:cloudflare:master

It turns out the Cloudflare DNS provider has some weird behavior due to unusual FQDN normalization in CNAME (this is actually very common problem - might we worth documenting if libdns clients expect dots in external CNAMEs or not).

I've tried to test DigitalOcean implementation, however it quickly become evident that its current SetRecords implementation is just a hack: https://github.com/libdns/digitalocean/blob/master/provider.go#L64

I think the libdns community would benefit from having these basic e2e tests. The tests could also live in the template repo, but having a basic reference in-memory provider and test suite in the main library would likely be more useful.

I would open PRs with e2e tests and fixes to cloudflare, route53, maybe to digitalocean too once the framework is established.

@gucci-on-fleek
Copy link
Collaborator

I noticed there were no functional tests defined - only a recommendation to test. Because of that, many providers have issues.

To help, I added a small set of end-to-end tests and a simple in-memory (non-synchronized) DNS provider. It is a basic implementation that passes the tests, to test the tests.

Nice, thanks! My tests for the rfc2136 provider (conversion_test.go, provider_test.go) look a little bit more comprehensive at the moment, so you might want to copy some of that over to your PR.

I think the libdns community would benefit from having these basic e2e tests. [...] having a [...] test suite in the main library would likely be more useful.

I agree that this is the best place to store the tests, because if we find common bugs, we can just update this single repo instead of having to make PRs to every repo to add new tests.

The tests could also live in the template repo

It might be a good idea to add a GitHub Actions workflow to the template repo so that new users are pushed to implement e2e tests (or make the conscious decision to delete the file). The required configuration is really short for this.

basic reference in-memory provider

Hmm, I'm unsure if this is a good idea. DNS is essentially 100 different edge cases piled on top of each other, so it will be fairly tricky to make a minimal in-memory provider that will work like a “real” DNS provider. My suggestion would be to (either)

  1. Use the rfc2136 provider for this purpose. There's almost the same number of lines of code for dummy (124) and rfc2136 (224), but rfc2136 uses a real DNS server, so it should properly validate all the weird edge cases. Feel free to email me if you want API keys to test against my server—I've already given access to libdns/rfc2136 and DNSControl for similar purposes.

  2. Implement a basic Bind zonefile provider. This has the advantage of being intrinsically useful, and we could shell out to named-checkzone during the tests to validate with real DNS software (that runs completely locally).

But if you think that the in-memory provider is the best solution, then I'm sure that I could be convinced.

I would open PRs with e2e tests and fixes to cloudflare, route53, maybe to digitalocean too once the framework is established.

👍

gucci-on-fleek added a commit to gucci-on-fleek/libdns-rfc2136 that referenced this pull request Aug 31, 2025
Copy link
Collaborator

@gucci-on-fleek gucci-on-fleek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I've added some code to my fork of libdns/rfc2136, and the tests all ran successfully. There are still a few things that I'm unsure about, but this definitely appears to be on the right track. Thanks again!

gucci-on-fleek added a commit to gucci-on-fleek/libdns-rfc2136 that referenced this pull request Aug 31, 2025
gucci-on-fleek added a commit to gucci-on-fleek/libdns that referenced this pull request Sep 1, 2025
`SvcParams.String()` used to iterate over keys in a random order, which
meant that calling `ServiceBinding.RR()` could return different `Data`
for identical records. This commit sorts the keys in lexicographic
order, making the result deterministic.

Furthermore, since it seems useful for code to assume that it can safely
roundtrip between `Record.RR()` and `RR.Parse()` and that `Record.RR()`
is deterministic, I've now documented this. These are both already the
case, but a formal guarantee should make external usage easier.

Related to libdns#189.
@AndrianBdn
Copy link
Contributor Author

Thanks for the comment @gucci-on-fleek and for looking at this PR.

We're already aligned on test placement. I'm fine with a GitHub Actions template too - most providers will still need some zones & secrets set up by maintainers, so CI isn't the hardest part here.

[About the in-memory provider, DNS complexity, etc]

I called these tests "e2e", but they're really just sanity checks. They don't do real DNS queries - we just CRUD records via libdns interfaces and check consistency. I think that's still useful for the libdns community, since many providers are incomplete or have quirks. Maybe we should rename them to 'sanity'? )

I don't see much value in testing actual DNS behavior. That's the DNS server/service job (AWS, GCP, Azure, etc.). And even if we find oddities (TTL rounding, delayed propagation, partial support of certain values, etc.), getting them fixed is slow, and most would just end up as test config flags (TestShouldIgnoreTTL, TestSleepAfterAdd: 4s).

On the "test provider”: I like the idea of keeping libdns self-contained. You clone, run tests, and it just works. Adding more cases is straightforward. Depending on an external provider like rfc2136, which itself depends on a live server, adds complexity - and even if that server is spec-perfect, most big DNS providers aren't.

Open to input: should we keep it small and focused, or go all-in with more record types and real DNS checks?

@gucci-on-fleek
Copy link
Collaborator

@AndrianBdn

[About the in-memory provider, DNS complexity, etc]

I called these tests "e2e", but they're really just sanity checks. They don't do real DNS queries - we just CRUD records via libdns interfaces and check consistency. I think that's still useful for the libdns community, since many providers are incomplete or have quirks. Maybe we should rename them to 'sanity'? )

[…]

On the "test provider”: I like the idea of keeping libdns self-contained. You clone, run tests, and it just works. Adding more cases is straightforward. Depending on an external provider like rfc2136, which itself depends on a live server, adds complexity - and even if that server is spec-perfect, most big DNS providers aren't.

I guess my question is, what's the point of the in-memory provider? If the only purpose of it is to make adding new tests easier, it doesn't really seem worth it—we'd want to test any new tests with at least 1 real provider before adding it upstream (here), so having the in-memory provider just adds another provider that can potentially fail.

I don't see much value in testing actual DNS behavior. That's the DNS server/service job (AWS, GCP, Azure, etc.). And even if we find oddities (TTL rounding, delayed propagation, partial support of certain values, etc.), getting them fixed is slow, and most would just end up as test config flags (TestShouldIgnoreTTL, TestSleepAfterAdd: 4s).

By testing actual DNS behaviour, I meant more something like that an actual DNS server will block us from setting a CNAME record for the same name as a A/TXT record, won't let us set ipv4hint=2001:db8::1, won't let us set an MX record with an IP address for a target, etc.; not making sure that the providers aren't lying to us. Using any real DNS provider should stop us from adding such test cases, but these will all work with the current in-memory provider (I think?).

Open to input: should we keep it small and focused, or go all-in with more record types and real DNS checks?

I think that we will probably want tests at least as comprehensive as the rfc2136 ones (conversion_test.go, provider_test.go). I think that the minimum required to get this merged is probably:

  1. Test to make sure that implementations are handling SetRecords vs AppendRecords correctly (Most implementations missing deletion behavior for SetRecords #186). I think that you've already done this ☑️

  2. Test every single Record type and test setting every single field. This should be trivial to implement, just lots of annoying copy-and-pasting. You can probably copy this list, or just copy-and-paste some of your current records and then change their types.

Eventually (for a later PR if someone is interested), it would probably be a good idea to do the following:

  1. Test weird edge cases, like this list from DNSControl.
  2. Add tests for previously-fixed issues (like .toCAA() doesn't handle CAA-values with spaces #178) to prevent any future regressions.

Copy link
Collaborator

@gucci-on-fleek gucci-on-fleek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we're getting close; thanks again for working on this.

mholt pushed a commit that referenced this pull request Sep 2, 2025
`SvcParams.String()` used to iterate over keys in a random order, which
meant that calling `ServiceBinding.RR()` could return different `Data`
for identical records. This commit sorts the keys in lexicographic
order, making the result deterministic.

Furthermore, since it seems useful for code to assume that it can safely
roundtrip between `Record.RR()` and `RR.Parse()` and that `Record.RR()`
is deterministic, I've now documented this. These are both already the
case, but a formal guarantee should make external usage easier.

Related to #189.
@AndrianBdn
Copy link
Contributor Author

@gucci-on-fleek thanks for you suggestions!

  • Adjusted cleanup logic.
  • Added support for more DNS record types. I had to put in exclusions. Better to let providers adapt tests and keep other records support as an open issue.
  • After thinking it through, I’m going to support only one Provider interface set. Incomplete providers can work around this, or just add the missing method.

I meant more something like that an actual DNS server will block us from setting a CNAME record for the same name as a A/TXT record, won't let us set ipv4hint=2001:db8::1, won't let us set an MX record with an IP address for a target, etc.; not making sure that the providers aren't lying to us.

I don’t think that’s really a libdns issue. If a service allows something invalid (like IP in MX), that’s more about the service than libdns. In theory some validation can live in libdns — e.g. MX or SVCB record checks.

I’m not attached to the in-memory dummy provider. The only benefit is that it creates a double-entry system for tests and a dummy implementation. But since I haven’t tested with any real provider that supports all record types, the dummy provider may not be correct indeed.

I can remove it, and we can add a note that e2e tests should be verified with rfc2136 as a reference implementation of a libdns provider (can libdns maintainers give it such status?). BTW it would also help to have a Dockerfile + script there to start a DNS server (bind?) with a preconfigured zone and key.

@gucci-on-fleek
Copy link
Collaborator

@AndrianBdn

thanks for you suggestions!

  • Adjusted cleanup logic.

  • Added support for more DNS record types. I had to put in exclusions. Better to let providers adapt tests and keep other records support as an open issue.

Thanks, I'll look at the code a little while later.

  • After thinking it through, I’m going to support only one Provider interface set. Incomplete providers can work around this, or just add the missing method.

I think that it might be easiest to just leave out ListZones entirely. Only 8 out of 79 providers currently support it, and it's not used by CertMagic/Caddy for anything.

I meant more something like that an actual DNS server will block us from setting a CNAME record for the same name as a A/TXT record, won't let us set ipv4hint=2001:db8::1, won't let us set an MX record with an IP address for a target, etc.; not making sure that the providers aren't lying to us.

I don’t think that’s really a libdns issue. If a service allows something invalid (like IP in MX), that’s more about the service than libdns. In theory some validation can live in libdns — e.g. MX or SVCB record checks.

I’m not attached to the in-memory dummy provider. The only benefit is that it creates a double-entry system for tests and a dummy implementation. But since I haven’t tested with any real provider that supports all record types, the dummy provider may not be correct indeed.

Ok, I still don't really see the point of the in-memory provider, but you've already written the code, and it should be trivial to maintain, so I'm okay with including it.

I can remove it, and we can add a note that e2e tests should be verified with rfc2136 as a reference implementation of a libdns provider (can libdns maintainers give it such status?).

I'm not really sure that we need a reference provider—in theory libdns should behave the same everywhere, so if 2 providers disagree, then there's probably a bug somewhere. And different people are more comfortable testing with different providers, so it's probably easier to let people use whatever they prefer.

BTW it would also help to have a Dockerfile + script there to start a DNS server (bind?) with a preconfigured zone and key.

Hmm, that's definitely a good idea, I'll look into that.

@AndrianBdn
Copy link
Contributor Author

I'm not really sure that we need a reference provider—in theory libdns should behave the same everywhere, so if 2 providers disagree, then there's probably a bug somewhere.

Maybe I am over-thinking it, but here is my concern:

When someone submits a PR that changes tests, how do we know the change is valid? One option is to require them to show it works against some provider. A reference provider gives us a common baseline. It could be in-memory (although it is limited in DNS sense and probably buggy), or rfc2136, or something else. This is not about "truth" in DNS - only about having something consistent to measure test changes against.

Obviously we can introduce a policy, that requires to adopt new tests in any two providers. But that feels like unnecessary friction.

@mholt
Copy link
Contributor

mholt commented Sep 5, 2025

Sorry I've been absent from this for a while!

I think that it might be easiest to just leave out ListZones entirely.

Could it be optional? Maybe with a warning or something if it's not implemented, as opposed to a hard fail?

When someone submits a PR that changes tests, how do we know the change is valid? One option is to require them to show it works against some provider. A reference provider gives us a common baseline. It could be in-memory (although it is limited in DNS sense and probably buggy), or rfc2136, or something else. This is not about "truth" in DNS - only about having something consistent to measure test changes against.

This is a good point IMO.

Overall I'm in favor of this change, although I might have a nit about the folder structure/placement. I dunno -- will think on it.

@mholt mholt linked an issue Sep 5, 2025 that may be closed by this pull request
@AndrianBdn
Copy link
Contributor Author

@mholt @gucci-on-fleek I introduced a different way to skip ZoneLister test

@gucci-on-fleek
Copy link
Collaborator

@AndrianBdn

When someone submits a PR that changes tests, how do we know the change is valid?

The practical/realistic answer is that it depends on who merges it. If I merge it, then I'll probably test it against rfc2136, and if @mholt merges it, then he'll probably test it against cloudflare. Having PR acceptance depend on who merges it probably isn't an ideal situation, but in theory the providers should all act the same, so it hopefully shouldn't cause too many issues.

One option is to require them to show it works against some provider.

Yes, I think that as long as someone has tested their changes against any provider, we're totally willing to work with them to make sure that it works properly with multiple providers.

A reference provider gives us a common baseline.

Well, in some sense, having contributors test against a non-rfc2136/cloudflare provider is a good thing, since that means that any merged PR will have been tested against at least 2 different providers.

But yeah, I definitely agree with you that this is a tricky question, and I'm still unsure on what the best option is here.

Copy link
Collaborator

@gucci-on-fleek gucci-on-fleek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple small issues, but LGTM otherwise. This is a rather large change, so I'll wait to hear from @mholt before merging.

Once this is merged, I think that we'll probably tag a beta release right away, and then wait for 3ish providers to use it before “officially” releasing it (at which point we're committing to some degree of backwards compatibility, although much less than for the main package).

@mholt
Copy link
Contributor

mholt commented Sep 8, 2025

lol, Gmail did NOT like my notification emails on this thread, since some of the domains are flagged for malware:

image

Probably a good thing if we stick to well-known example domains 😆

Copy link
Contributor

@mholt mholt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay yeah, this is looking good. I like the suggestion to use a map for what to skip -- is record type the only useful filter there, or should we make the filter more generalized?

Approving since we can iterate on this still, but probably worth addressing the remaining questions first if possible.

@mholt
Copy link
Contributor

mholt commented Sep 8, 2025

One other quick thought: could we rename the package e2e? Something like libdnstest maybe? Just a little more specific/descriptive, since I think "e2e" is too broad.

@AndrianBdn
Copy link
Contributor Author

@mholt @gucci-on-fleek Thanks for the reviews -- code updated.

One question: should providers expect the trailing dot in FQDNs (e.g., CNAME targets)?

I haven’t found anything in libdns about it. Many providers ignore the dot, because some APIs don't distinguish between a record reference and an FQDN.

For our CNAME records in tests, do we keep the trailing dot or drop it?

@gucci-on-fleek
Copy link
Collaborator

One question: should providers expect the trailing dot in FQDNs (e.g., CNAME targets)?

Yes, see https://docs.dnscontrol.org/language-reference/why-the-dot for a better explanation.

I haven’t found anything in libdns about it.

Hmm, good point; I'll open a PR for that shortly.

For our CNAME records in tests, do we keep the trailing dot or drop it?

Definitely keep it.

Copy link
Collaborator

@gucci-on-fleek gucci-on-fleek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that there are still a few minor things that we might need to change, but I'm going to merge this and tag an alpha release so that we can get some outside testing.

Thanks @AndrianBdn!

@mholt
Copy link
Contributor

mholt commented Sep 11, 2025

Thanks for iterating on this, both of you -- happy to merge it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Unit tests for providers (and template)

3 participants