Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I had a bit of a read but I didn't find where it explains (code or doc) how it achieves reproducible builds.

It seems like a stricter, huge make-like harness (in fact it reminds me of the mozilla firefox python build system a bit).

It's not bad by any means, but it seems like to me it doesn't "magically" fix the "be reproducible" problem at all (which is what it seem to claim)

Am I missing something?



You are absolutely correct: Bazel by itself does not make your builds reproducible. If a tool calls rand() or bakes the current time into its output, reproducibility goes out of the window.

What Bazel does, however, is to make it possible to run build steps in a sandbox (although the current one is kinda leaky) so that your build is isolated from the environment and thus behaves in the same way on any computer. It also tracks dependencies correctly so that it knows when a specific action needs to be re-run.

This makes it possible to diagnose non-reproducible build steps easily. At Google, the hit rate of our distributed build cache usually floats around 99%, and this would be impossible without reproducible build steps.


Does work done by Debian to make Linux packages build reproducibly help Bazel?

https://wiki.debian.org/ReproducibleBuilds

Would Bazel help with the remaining long tail of packages in Debian?


Conceptually, your build results should be a pure function of your source tree. If I understand correctly, within Google, the cross-compilers are actually checked in to the source tree, so that the distributed jobs will use the same compiler to build your code. It seems like currently bazel only uses whatever is in /usr/bin though[0]. For Java compilations, bazel additionally has its own jar builder that sorts the filenames and zeros the timestamps within the zip file[1].

[0]: https://github.com/google/bazel/tree/master/tools/cpp [1]: https://github.com/google/bazel/tree/master/src/java_tools/b...


You're right - it doesn't magically solve build reproducibility. Bazel pushes you towards a build configuration where you have to describe (in a terse way) the entire dependency graph of what is being built. It allows Bazel to be smart about where in the graph things are stale.

If you run a script that outputs intermediate files, Bazel needs to know about that scripts inputs and outputs. And it works better if it knows them ahead of time.


I invented the Python bits of the Firefox build system (moz.build files). I learned after I implemented them that Google's internal approach with Blaze was very similar. It felt reassuring that I independently reinvented a similar solution :)

There are a handful of Blaze derivatives built by Xooglers. Pants and Buck come to mind. They also share the trait of using sandboxed Python to define a build configuration. I'll take it over make syntax any day!


It's not magic; you have to work at it. (For example, make sure that zip doesn't put timestamps in the file.) But it's designed so that code generators should act as pure functions from input files to output files, and many generators actually are, especially the built in ones. If you do this then the build system will help you.

Writing generators to run this way is kind of a pain, actually, sort of like writing code to run in a sandbox. Also, the generators themselves must be checked in, and often built from source. But we consider the results worth it.


I gather that it runs builds inside a chroot where the only available files are the dependencies you specified explicitly (including the compiler[1]), at least in "strict" mode[2]. Or else it must monitor what files are opened during the build step and fails the build if it saw an unexpected file being opened.

It never explains any of this explicitly, but there are hints. [1], [2], [3].

[1] "Many rules also have additional attributes for rule-specific kinds of dependency, e.g. 'compiler'" -- http://bazel.io/docs/build-ref.html#types_of_dependencies

[2] http://bazel.io/docs/build-encyclopedia.html#cc_binary.hdrs_...

[3] "The build system runs tests in an isolated directory where only files listed as 'data' are available" -- http://bazel.io/docs/build-ref.html#data

Edit: A comment below seems to suggest that this is not the case: "Within Google we use a form of sandboxing to enforce that" (emphasis mine). -- https://news.ycombinator.com/item?id=9259147




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: