Reducing Go Dependencies

Reducing Go Dependencies

A case study of dependency reduction in Huma.

Intro

This article is a practical look at reducing dependencies in Go libraries. We'll start by looking at how Go dependencies work, then go into a few ideas around reducing dependencies. Finally, we'll go into a few ways I've implemented these ideas in Huma and the results. Hopefully you can use some of the same techniques in your own projects.

Why Reduce Dependencies

Why would you want to reduce the number of dependencies in your Go library? There are a few reasons:

  • Reduced Complexity: Fewer dependencies means less overall code to understand and maintain.

  • Reduced Risk: Fewer dependencies means fewer potential security vulnerabilities.

  • Reduced Build Times: Fewer dependencies can mean faster builds.

  • Reduced Binary Size: Fewer dependencies can mean smaller binaries.

A bit more subjective, perhaps, is that reducing dependencies can make your library more approachable to potential users. If they see that your library has fewer dependencies, they might be more likely to use it.

It can also mean the difference between using your library and not using it for certain constrained organizations, like governments, NGOs, or large corporations with strict security policies around third party code.

How Go Dependencies Work

Go dependencies are managed using the go.mod file. This file lists the dependencies of your project, and is used by the go tool to download and build your dependencies. When a user of your package runs go get, the go tool will download your package and some or all of its dependencies.

Go module dependency graph

As of Go 1.17+, the go tool does what is called dependency graph pruning, meaning it will only add and download the dependencies that are actually used by your project. This means that if you use a module that has a dependency that is not used by your project, it will not be downloaded by the go tool.

Huma is a great example of this, as it uses an adapter concept to support many different router packages. The Huma go.mod contains a dependency for each router, but because each one is in its own package within the Huma module, only the one that is actually used by the project will be downloaded.

Another great example is the popular stretchr/testify module, which provides a package for test assertions. Many, many libraries use it for testing, but consumers of the library do not need to download that package because they are not using it in their own code.

💡
This is important to remember: not every dependency you see in a library's go.mod will wind up in your project's go.mod because of this pruning!

Ideas Around Reducing Dependencies

Here are a few general ideas around reducing dependencies in Go libraries:

  • Use The Standard Library: Use the Go standard library as much as possible. It's well-tested, well-documented, and has a lot of functionality built in. Sometimes a dependency is just a convenience wrapper around the standard library and isn't strictly necessary.

  • Build It Yourself: Sometimes it's easier to build a small piece of functionality yourself than to bring in a dependency. This can be especially true for small, well-defined pieces of functionality.

  • It's Okay To Copy: If you only need a small part of a library, consider copying the code into your own project. This can be a good idea if the library is large or has a lot of dependencies.

  • Reframe The Problem: Sometimes you can reframe the problem you're trying to solve to avoid a dependency. For example, if you're using a library to marshal to a specific format, you might be able to use a simpler format or a different approach to avoid the dependency.

  • Move Examples: It's common and helpful to include examples or demos in your codebase to make it easy for new users to adopt and experiment. However, these examples can sometimes bring in a lot of dependencies. Consider moving them to a separate package or directory to avoid adding unnecessary dependencies to your main package.

A Side Note On Versioning

Keep in mind that as you work to reduce dependencies you have two options related to versioning (assuming you use semantic versioning):

  1. Try your best to remain backward-compatible. You cannot break existing users of the library. This is the most desirable outcome and should result in a new minor version.

  2. Make a new major version with breaking changes. This will require all users to change their own code, but it may be worth the benefits.

A third option might be to deprecate, announce, and later remove (or move) features without a new major version. It's not strictly semantic versioning then, but more like library evolution and may or may not work for your users.

A Case Study: Huma

Huma is a micro web framework for Go that is designed to be easy to use and provide OpenAPI & JSON Schema support on top of existing routers. It has a number of adapters for different routers, including http.ServeMux, chi, fiber, gorilla/mux, gin, etc.

When originally designing Huma v2.0.0 significant time was spent on the library's interface, usability, and feature set. I think that was the right call, and various dependencies allowed me to move faster and try various approaches easily. However, at release Huma had a lot of dependencies (both direct and indirect), and I wanted to reduce them in part because of feedback from the community and in part because I wanted to make the library more approachable to potential users.

Pull request #223 significanly reduces the dependencies between Huma versions v2.3.0 and v2.4.0. Read on for a detailed analysis.

Environment Variables

Huma depended on spf13/viper as a way to bind environment variables to command-line options so that a service can be configured either via env vars or flags on the command line. This was a convenience feature, but it was also a significant dependency. I realized that I was pulling in a dozen dependencies just for this, when instead I could write a small amount of code using the standard library to accomplish essentially the same thing. Whereas before I had code similar to this:

cfg := viper.New()
cfg.SetEnvPrefix("SERVICE")
cfg.SetEnvKeyReplacer(strings.NewReplacer("-", "_"))
cfg.AutomaticEnv()

// ... later on ...
cfg.SetDefault(name, defaultValue)
cfg.BindPFlag(name, flag)

Instead, I could write a small amount of code like this:

envName := "SERVICE_" + casing.Snake(name, strings.ToUpper)
defaultValue := field.Tag.Get("default")
if v := os.Getenv(envName); v != "" {
    // Env vars will override the default value, which is used to document
    // what the value is if no options are passed.
    defaultValue = v
}

Now, instead of pulling in a dozen dependencies, I was only using the standard library. Command-line options would either have the default set from the service code or be overridden by the environment variable, causing the help text to be accurate for that particular run.

This worked great as I really didn't need the full power of viper and was able to replace it with a small amount of code. You should consider whether you really need the full power of a library you are about to pull into your project.

YAML Generation

Huma supports generating OpenAPI 3.1 from your service, and it's common for OpenAPI to use both JSON and YAML for the spec document. It's also common for people to use extensions to the OpenAPI spec, meaning they put their own fields into the resulting JSON/YAML document. This is also the only part of my code that needs to use YAML, and it only needs to be marshaled (i.e. no parsing).

I was originally using goccy/go-yaml as it has built-in support for both ,inline to merge in extension fields, as well as a JSON formatter during serialization (which works because JSON is a subset of YAML). This worked great to support both JSON and YAML using one library, but it was a significant dependency with its own validation and parsing code, colorized output support using additional dependencies, special error dependencies, etc.

Rethinking and reframing this problem, I realized that I could use the standard library to generate JSON, and then use a small amount of code to merge in the extension fields using custom MarshalJSON() ([]byte, error) methods on my OpenAPI and JSON Schema structs (the marshalJSON utility is omitted for brevity):

func (i *Info) MarshalJSON() ([]byte, error) {
    return marshalJSON([]jsonFieldInfo{
        {"title", i.Title, omitNever},
        {"description", i.Description, omitEmpty},
        {"termsOfService", i.TermsOfService, omitEmpty},
        {"contact", i.Contact, omitEmpty},
        {"license", i.License, omitEmpty},
        {"version", i.Version, omitNever},
    }, i.Extensions)
}

Once I had the JSON, it's actually not that difficult to convert it to YAML. I began to write a small library to do so, but found someone else had already done it in just a few lines and was able to copy it into my codebase.

This removed a significant dependency and replaced it with mostly standard library code. I kept the ,inline field tags so if someone using Huma wanted to augment or serialize using a proper YAML library everything will still just work as expected.

JSON Schema Validation

Huma has its own JSON Schema request validator built-in, but for some validations it depended on third-party libraries, particularly for validating the format of strings.

For example, it used google/uuid to validate that a UUID is in the correct format. But, that library does a lot more than simple validation. It also has parsing, string generation, and other features that I didn't need. It's a mature library. I was able just copy a small amount of code from it to validate UUIDs in my own codebase.

Text Manipulation

As part of the OpenAPI generation, Huma uses a registry of schemas which are created from your Go operation handler input/output structs. These schemas are named using the struct name and support unicode, generics, and scalar types too.

For consistency the code which generates these names converts the input to PascalCase, which is common for public (exported) structs in Go code but not used everywhere, and generics using scalars result in type names like MyGeneric[int] which we want converted to MyGenericInt. This means you need to title case the parts of the name after removing any generics brackets.

Unfortunately strings.Title is deprecated, and the suggestion is to use golang.org/x/text/cases instead. That brings in over 40mb of source as a dependency due to all the unicode handling and other features it provides! This makes it hard to justify using it for just a single line of code, slows down compilation, potentially increases the binary size, and makes it impossible for users to share runnable snippets on the Go Playground as the builds will timeout.

Instead, I was able to write a tiny amount of code to handle this case and avoid the dependency entirely. It's not as featureful as the golang.org/x/text/cases package, but it's good enough for my use case and avoids a significant dependency while still supporting unicode and generics:

result := ""
for _, part := range strings.FieldsFunc(name, func(r rune) bool {
    // Split on special characters. Note that `,` is used when there are
    // multiple inputs to a generic type.
    return r == '[' || r == ']' || r == '*' || r == ','
}) {
    // Split fully qualified names like `github.com/foo/bar.Baz` into `Baz`.
    fqn := strings.Split(part, ".")
    base := fqn[len(fqn)-1]

    // Add to result, and uppercase for better scalar support (`int` -> `Int`).
    // Use unicode-aware uppercase to support non-ASCII characters.
    r, size := utf8.DecodeRuneInString(base)
    result += strings.ToUpper(string(r)) + base[size:]
}

It continues to pass a pretty extensive suite of test cases, and I'm happy with the result.

Code Examples

Huma has a "transformer" concept, which enables you to write a piece of code which modifies the response of an HTTP request before it is marshaled to e.g. JSON. It works with Go instances, and is a powerful feature used to e.g. insert $schema JSON Schema references which enabled code completion and linting as you type in editors like VSCode.

Since it's not a concept found in other routers or web frameworks, I wanted to provide more examples of what might be possible with it. My mistake was including them in the main package, causing additional dependencies like danielgtaylor/shorthand and danielgtaylor/mexpr to get pulled in.

Moving such examples into an examples directory with its own go.mod makes things much cleaner and avoids pulling in unnecessary dependencies. It also makes it easier for users to understand the core library and what is just an example or toy to show it off.

Results

The results speak for themselves. Using the Huma tutorial code as a benchmark, and replacing the Chi router with Go 1.22's improved http.ServeMux router to truly show the required dependencies of Huma's core library, we can see the following results from Huma v2.3.0 (before the changes):

module example.com/demo

go 1.22

require github.com/danielgtaylor/huma/v2 v2.3.0

require (
    github.com/danielgtaylor/casing v0.0.0-20210126043903-4e55e6373ac3 // indirect
    github.com/danielgtaylor/mexpr v1.8.0 // indirect
    github.com/danielgtaylor/shorthand/v2 v2.1.1 // indirect
    github.com/fatih/color v1.15.0 // indirect
    github.com/fsnotify/fsnotify v1.6.0 // indirect
    github.com/fxamacker/cbor/v2 v2.5.0 // indirect
    github.com/goccy/go-yaml v1.11.0 // indirect
    github.com/google/uuid v1.3.1 // indirect
    github.com/hashicorp/hcl v1.0.0 // indirect
    github.com/inconshreveable/mousetrap v1.1.0 // indirect
    github.com/magiconair/properties v1.8.7 // indirect
    github.com/mattn/go-colorable v0.1.13 // indirect
    github.com/mattn/go-isatty v0.0.20 // indirect
    github.com/mitchellh/mapstructure v1.5.0 // indirect
    github.com/pelletier/go-toml/v2 v2.0.8 // indirect
    github.com/spf13/afero v1.9.5 // indirect
    github.com/spf13/cast v1.5.1 // indirect
    github.com/spf13/cobra v1.8.0 // indirect
    github.com/spf13/jwalterweatherman v1.1.0 // indirect
    github.com/spf13/pflag v1.0.5 // indirect
    github.com/spf13/viper v1.15.0 // indirect
    github.com/subosito/gotenv v1.4.2 // indirect
    github.com/x448/float16 v0.8.4 // indirect
    golang.org/x/exp v0.0.0-20230515195305-f3d0a9c9a5cc // indirect
    golang.org/x/net v0.19.0 // indirect
    golang.org/x/sys v0.15.0 // indirect
    golang.org/x/text v0.14.0 // indirect
    golang.org/x/xerrors v0.0.0-20220907171357-04be3eba64a2 // indirect
    gopkg.in/ini.v1 v1.67.0 // indirect
    gopkg.in/yaml.v3 v3.0.1 // indirect
)

After the changes above from PR #223 which are released in Huma v2.4.0, the dependencies are significantly reduced:

module example.com/demo

go 1.22

require github.com/danielgtaylor/huma/v2 v2.3.0

require (
    github.com/danielgtaylor/casing v1.0.0 // indirect
    github.com/fxamacker/cbor/v2 v2.5.0 // indirect
    github.com/inconshreveable/mousetrap v1.1.0 // indirect
    github.com/spf13/cobra v1.8.0 // indirect
    github.com/spf13/pflag v1.0.5 // indirect
    github.com/x448/float16 v0.8.4 // indirect
)

It's now so small and fast that it's easy to show off various features in the Go Playground, for example: https://go.dev/play/p/Qmv5VwNTg-L.

There could be further improvement, but not without breaking changes. For example, the optional CLI functionality could be moved to its own package and that would get rid of the spf13/cobra, spf13/pflag, and inconshreveable/mousetrap dependencies if not using that functionality. In practice, every service I've seen does use it so the utility of doing so is limited. Maybe this can be deprecated and removed over time, but for now I'm extremely happy with the results.

Aside from the smaller go.mod, build times were reduced by 12% and compiled binary size was reduced by 20% just from having to parse/compile/link less code.

Hopefully this dive into reducing dependencies in Huma has given you some ideas for your own projects. It's a good exercise to go through your dependencies and see if you really need them all, and if not, to consider whether you can replace them with a small amount of code or a different approach.