Exploring the cross platform dependency management situation in Python: piptools
Contents
I’ve chosen to split this post into at least two parts, as the preamble to give context became a blog post in itself. So this first piece focuses the context around introducing stricter dependency management, and outlining the cracks that appear when trying to come up with a solution that works on multiple platforms.
Recently, I’ve been looking into transitioning a project (in this case, a large transport model dev codebase) from a heavy development cycle into a production like state. The main goals of such a shift are to reduce inconsistencies between different developers, their local hardware and the deployed environment where official results are produced. It’s also a reflection of the of the maturity of the project, a lot has changed over the course of two years of development.
There are a number of things we’re implementing to make this transition, but I thought I’d share a little bit on the
dependency management situation. There are already a number of blog posts out there comparing poetry
, pipenv
and
pip-tools
so I won’t repeat what others have already said better (but if you are interested in how they compare,
ideology and strong opinions, I’d recommend
Should You Use Upper Bound Version Constraints? and
Python Application Dependency Management which I find rather
compelling).
I’ve elected to proceed with piptools
as it’s relatively lightweight
and just solves the dependency management/ lockfile problem I’m trying to solve and not 15 other things at the same
time.
I suppose I should also address another question which comes to mind. Why bother? Why go through the hassle of
researching and setting up a tool when pip freeze | requirements.txt
probably would have got me four-fifths of the way
there? Working at a firm where most people would hesitate to call themselves software developers, this definitely would
have been the path of least resistance. But there are a few key reasons that come to mind. To summarise these
succinctly;
- It is difficult to track direct dependencies separate to transitive dependencies (the dependencies of your
dependencies)
- In turn this makes it difficult to keep transitive dependencies up to date
- Consequently one is less likely to get security patches in a timely fashion
- Determining if a dependency is no longer required becomes hard
- The status quo becomes avoiding updating dependencies if at all possible (which is short sighted, but remains a tempting business decision in the world of consulting. )
Collectively, these amount to the situation where either packages are almost never updated, or every request to update a
package or refresh dependencies comes to me, or was prompted by me in the first place. Not a workflow I’m especially
keen on supporting until the end of time. Fortunately, through a combination of pip-tools
some wrapper tooling, and a
healthy amount of documentation I’ve managed to cobble together a solution which I’m optimistic will provide a simple
way for the project team to interact with requirements files themselves, and avoid situations like working with
geopandas 0.5
in 2021 (unfortunately no, I’m not making that up). Generally working with packages a little out of date
isn’t so bad, but geopandas pains me particularly, being involved in the development of it, and knowing significant
re-architecting of the internals has happened since to improve performance and generally be more robust. Not to mention
and old version of it pulls in old versions of most of the python geospatial stack, which is notoriously difficult to
install and work with, even at the best of times.
Cross platform dependencies - the problem
With preamble aside, I wanted to share an aspect of the process that I’m still not quite happy with, and dig into it a little bit, as on the surface it seems rather perplexing that this isn’t a solved problem.
To explain in more detail, let’s consider the following unpinned dependencies, which correspond to a requirements.in
file in the piptools model.
|
|
The packages in question are a little confected, but they illustrate the situation quite nicely. Jax is not supported on
windows, so is listed with a platform constraint (our codebase consists of many related but independent pieces, so there
portions which run quite happily on windows despite missing a “requirement”. Technically we could have a plethora of
requirements files for different bits of the code and then this “inconsistency” would disappear. But then I’d have a
plethora of requirements to maintain which is certainly not the lesser evil). jupyterlab
is a cross platform package
but has some windows specific dependencies which show up if I use pip-compile
to generate a requirements file. Here’s
want that looks like if I compare the pip-compile output for linux and windows:
|
|
|
|
Firstly, its clear that results are platform dependent. This is clearly noted in the pip-tools documentation, along with the suggestion that one should have a requirements file per platform, python version, cpu type etc. This however isn’t a solution that scales well, especially if one already has more than one target requirements file.
Secondly, we notice that the platform markers don’t propagate. jaxlib
pulled in numpy
, pexpect
, ptyprocess
and
scipy
, without the restriction that these were transitive dependencies of a linux only package. Fortunately, these all
install on windows (even if they’re not intended for or tested on the platform) but the situation for windows
dependencies is not so kind. pywin32
and pywinpty
, these packages don’t make sense nor exist for linux, so the
requirements file is not installable on linux.
Realising this limitation of piptools was a bit of a let-down. I now had some nice, clear, easy to use tooling with an
ugly end step requiring manual intervention to merge OS specific files together. It also seemed to me to be a strange
limitation, why would dependency resolution be dependent on the operating system? It seems like the metadata pip
collates from pyproject.toml
/ setup.cfg
/ setup.py
around dependencies of a package should be queryable on any
platform, even if its not installable on any package. That question is what prompted this blog post to see if there’s a
better answer than “it’s not supported, cope”.
The second part of this will dive deeper into the rabbit hole of why piptools works how it does, and whether my naive expectation that this problem should be solvable stacks up. I also have a look at how some of the contemporaries to piptools behave in the same situation.
Author Matt Richards
LastMod February 27, 2023 (480d57c)