Setting up a pandas/main environment for GeoPandas
Contents
Although it’s not something I tend to advertise very often, I’m a maintainer for
GeoPandas which is the de-facto standard tool for tabular geospatial analysis
in Python. My contributions here wax and wane here with the amount of free time and mental space I have to volunteer. As
part of this, we keep the software in sync with developments in pandas, and our
CI tests against pandas/main
. From time to time (more frequently with the upcoming pandas 2.0 release) it’s necessary
to test locally again the main branch, make use of a debugger, and work out how best to migrate the code towards an
upcoming change. This short post is just some reminders for me for setting up this process. I’m borrowing heavily the
respective pandas and
GeoPandas contributor
guides, but slicing down to the bits I care about (and changing the bits I don’t like).
Environment Setup
I’ve been using mambaforge for environment management outside of work for quite a while, and these instructions are assuming that lightly (note, from normal conda, the biggest implicit assumption is that the default channel is conda-forge).
- Clone pandas
git clone git@github.com:<fork_author>/pandas
andgit remote add upstream git@github.com:pandas-dev/pandas
cd
into pandas fork checkout.git fetch upstream
andgit merge upstream/main
git fetch --all --tags
. This is sometimes a gotcha when dealing with compatibility code based on pandas version. In a dev environmentpandas.__version__
is git aware.mamba env create -n {dev_env_name} python=3.11 Cython versioneer pytest pytest-xdist numpy python-dateutil pytz matplotlib pyarrow scipy numpy
, I use this manual list as theenvironment.yml
has a zoo of optional dependencies I don’t overly care about from a GeoPandas context, and I’d rather use a python more recent than 3.8.mamba activate pandas-dev
python setup.py build_ext -j 4
(build the cython extension modules, with 4 cores)python -m pip install -e . --no-build-isolation --no-use-pep517
pre-commit install
(technically optional if only working on geopandas)
Add GeoPandas on top
mamba install -y fiona pyproj shapely pyogrio black pre-commit ipython jupyterlab
- cd to geopandas fork dir and
python -m pip install -e .
pre-commit install
At this point, if I’ve done everything right, I should be able to run the geopandas test suite mostly successfully.
Reminder: how to update the pandas code
Despite being reasonably happy reading and editing (and on rare occasion, writing) cython, I never seem to dig into it enough to remember what the right invocation for the compile step is. This is my self reminder:
cd
into pandas fork checkout.git fetch upstream --tags
andgit merge upstream/main
git checkout main
andgit pull
python setup.py build_ext -j 4
- this should indicate it’s cython-ising a bunch of things Note that this last on windows will often produce a bunch of warnings and sometime fail with exit code 1, even if it succeeds.python -m pip install -e . --no-build-isolation --no-use-pep517
(Sometimes, I’ve also had this stuck in a bad state and running python setup.py develop
has gotten things back. This
should be mostly equivalent to the pip install editable though)
Extra: pyogrio dev env on windows with OSGeo4W.
After experimenting with this, I’ve reverted to using WSL as it seems less flaky overall, ended up with a broken environment down the track and not exactly sure why These are my notes on installing pyogrio from source on windows, which flesh out the notes in (the docs)[https://pyogrio.readthedocs.io/en/latest/install.html#windows]. I’ve done this most recently with GDAL 3.6.4 from OSGeo4W with QGIS 3.30, but also with GDAL 3.5.1 in the past.
- Download OSGeo4W network installer https://www.qgis.org/en/site/forusers/download.html
- (As administrator) run installer for all users, install gdal and gdal-devel (the latter adds header files and populates the \include dir)
- Create conda env
conda create -n pyogrio_dev python=3.11 pandas shapely Cython pyproj ipython pytest pyarrow versioneer
. (Do not install fiona! - this will cause DLL loading errors from the conflicting versions of GDAL. Perhaps this can work if building fiona from source as well, but i haven’t tried.) - Activate the environment:
conda activate pyogrio_dev
- In OSGeo4W shell, run
gdalinfo --version
we need to know the version of GDAL to pass to the installler. - Switch to dir containing checkout of pyogrio
- Install pyogrio
python -m pip install --install-option=build_ext --install-option="-IC:\OSGeo4W\include" --install-option="-lgdal_i" --install-option="-LC:\OSGeo4W\lib" --no-deps --no-use-pep517 --install-option=--gdalversion --install-option=3.6.4 -e . -v
(where you replace 3.6.4 with whatever version of gdal is reported by gdalinfo). Note this looks a bit odd supplying--gdalversion
and3.6.4
separately, but the pyogrio setup code looks specifically for the key--gdalversion
, so we have to pass these as two consecutive arguments. - Alternatively, set environment variables
$env:GDAL_VERSION="3.6.4"; $env:GDAL_LIBRARY_PATH="C:\OSGeo4W\lib"; $env:GDAL_INCLUDE_PATH="C:\OSGeo4W\include"
and runpython -m pip install --no-deps --force-reinstall --no-use-pep517 -e . -v
- You might have to set the environment variable
GDAL_DATA
. I’ve now set this to$env:GDAL_DATA="C:\OSGeo4W\apps\gdal\share\gdal"
, but I remember this “just working” in the past. - If everything has gone well, importing pyogrio will work and the tests will pass when run.
pip 23.1 compatibility
In pip 23.1, the --install-option
flag in pip was removed. For now, it seems that using --config-settings
(the
apparent replace) doesn’t behave. Instead, supply the environment variables as in (9). There’s potentially some work to
do on the packaging of pyogrio to make this a little easier, but not a packaging expert.
Extra: pyogrio linux install
- conda env
mamba create -n pyogrio_dev python=3.11 pandas shapely Cython pyproj ipython pytest pyarrow versioneer gdal
- clone pyogrio & fetch tags
- Get us a GDAL to build against:
- Using apt:
sudo apt install gdal-bin
andsudo apt-get install libgdal-dev
- Using conda: I’m yet to actually test this directly because it seems to require there to not be a system GDAL /
system GDAL not on path, and I can’t do that without breaking my existing environment. Originally I presumed this
wouldn’t install the requisite header files to build other packages against, but pyogrio does this in its CI. In
theory though, this is great because you’re not tied to the version of GDAL bundled into debian stable, and don’t
have to updated ubuntu to get a new version of GDAL.
mamba install gdal
- Using apt:
python setup.py develop
pip install --no-deps geopandas
- don’t want to install fiona which has another version of GDAL bundled into the wheel (could perhaps use conda/ mamba to install this too)
Author Matt Richards
LastMod May 27, 2023 (72feab2)