🚫🐍: Using Python without using Python - Part 1
Python is one of the most-used programming languages in the world.
By some measure that I didn’t bother to look up, the Github blog has Python holding steady as the second-most popular programming language as of 2023 (first-most: JavaScript 😿).
Python is the lingua franca (“French language”) of data science, machine learning, and technical phone screens, beloved by students, researchers, and engineers alike for its relatively light syntax and extremely large collection of high quality libraries.
Most computers today come with a Python interpreter pre-installed or just-a-click-away (I’m looking at you, Windows), and for those that don’t there is a simple, standardized method for setting up a new Python interpreter from scratch (actually, two), and a number of standard tools for Python package and environment management (a,b,c).
Ultimately, though, popularity is not quality and “widely-used” does not mean “suitable for all problems”. In my career I have seen Python applied in many questionable ways (e.g., pytest as a universal build orchestrator), and in many places where its nimble, dynamic, fast-and-loose nature quickly became more of a liability than a selling point.
In this post, I’m going to play with a few different methods of using Python “on the outside” to talk to some other language “on the inside”, enabling spending most of your time writing “not-Python” while still ultimately incorporating your code as part of a Python library or application. I won’t comment on why or when you might want to do this except to say that there are some good standard reasons (compile-time type safety and code optimization, true multithreaded parallelism, etc.) and that you can do some really fun but probably ill-advised things.
In everything that follows I will assume we are using CPython since I have never used PyPy or Jython or any other flavor that might exist. In particular, I will be working with a conda
environment:
Further, the example we will use for interop will be relatively contrived: we would like a function to concatenate two strings, separated by a comma and single space, and return the result.
pybind11: writing C++ and liking it
The header-only pybind11 library is maybe the most popular approach to Python<->C++ interoperability.
Since I’m already using conda
, I’ll just install pybind11
using the command
With a slight variation on the basic example provided by the pybind11
docs, we can write a simple C++ file, pybind_example.cxx
, whose filename must match the name of the Python module being defined with PYBIND11_MODULE
:
To build this Python module, we will use setuptools
with a minimal setup.py
file:
We build and install this locally into the current directory with
and that’s it, we’re done.
One of the benefits of pybind11
is just how little boilerplate is required to get started, though admittedly there is not much complicated going on in this example.
Vanilla extensions: C with some really strange types
Because CPython is implemented in C, it is in theory straight-forward to add new Python modules “just” by writing C and gluing it to Python. The Python documentation refers to these as extension modules. In practice, however, writing extension modules directly in this fashion can be quite a bit more verbose than using pybind11
.
In particular, writing the C extension module requires manually specifying directly how your function should be used in terms of the CPython runtime types. In our example here, this is mostly mechanical, but it can get cumbersome in practice. The upshot of this verbosity is that extension modules written directly in this fashion have full control of how data ownership and reference counting works between the C/C++ code and the Python code, which becomes necessary for more complex applications, especially when performance is a factor. The file extension_example.cxx
is:
As before, we will use setuptools
to install the extension module locally by defining an appropriate setup.py
:
and running
The result can be used as in the pybind11
case:
Cython: “[We have] outlived most other […] static compilers for the Python language”
I am sure I have used cython in the past, but it is a bit of an older methodology in my mind and not something I’ve ever set up and used on purpose.
We once again begin by install Cython into the current conda
environment and write our C++ function (“cython_example.hpp”):
New to Cython, we write a “wrapper.pxd” file containing our C++ code interface and a “cython_example.pyx” file containing our Python wrapper (surprisingly, each with working syntax highlighting!), mixing in some custom Cython syntax to define our Python wrapper for our C++ function:
Finally, we define our setuptools
setup.py
, install, and use our function:
SWIG: the Simplified Wrapper and Interface Generator
Much more general than Python, SWIG is an interesting tool for generating wrappers for all sorts of languages (e.g., go
, OCaml, etc.). It is pretty old and involves a few more pieces working together than the other methods I’ve played with here, but it is still very much in use.
To begin, we will install swig
and pipx
packages into our conda
environment so that we can build our SWIG wrappers completely from Python. Obviously this is not the preferred method across other languages, but it is pretty convenient for Python:
We separate our C++ code into the header file and implementation file, with nothing out of the ordinary here:
New to SWIG, now, is the interface file, swig_example.i
, which defines the code to be wrapped. We do nothing more here than include the header with the declaration of our greet
function and redeclare greet
as part of the interface.
Finally, to build the Python module using our SWIG wrapper, we again define a setup.py
:
Whereas previously we just used pip
to install the module to the local directory, with SWIG I found it necessary to explicitly install the extensions with --inplace
, (as suggested by SWIG docs), compiling the dynamic library into the local directory directly such that the Python module can load it at runtime without having to worry about paths. However, the basics are the same.
Finally, we have our greeter:
cppyy: a challenger approaches!
A relatively new method for Python<->C++ interop is cppyy (pronounced “cppyy”), by which I mean I had never heard of it before starting to write this. Based on the Cling
C++ interpreter, cppyy
differs from some of the other options here in that it can compile your C++ code at runtime, which can lead to additional opportunities for performance.
We begin by installing cppyy
with
and then write a minimal cppyy_example.hpp
, which knows nothing about Python:
Compiling and using this code in Python is as easy as:
Parting Thoughts
That’s a lot of syntax for one day, so let’s stop here for now and follow up in another post.
My initial impressions, looking at all of this together, are that pybind11
is really nice, cppyy
seems interesting for toy use cases, and everything else gets a but more involved. Not to mention, this code barely does anything! Things get much more complicated as we venture into wrapping objects, memory management / reference counting, etc., and I imagine that is where some of the more complicated options here start to become necessary rather than just verbose.
Next time, I aim to look at CFFI
, PyO3
, gRPC
, and Mojo 🔥
.