Wednesday, December 6, 2023
Google search engine
HomeUncategorizedGIL removal and the Faster CPython project

GIL removal and the Faster CPython project

This article brought to you by LWN subscribers

Subscribers to LWN.net made this article — and everything that
surrounds it — possible. If you appreciate our content, please
buy a subscription and make the next
set of articles possible.

By Jake Edge

August 2, 2023

The Python global interpreter lock (GIL) has long been a barrier to
increasing the performance of programs by using multiple threads—the GIL
serializes access to the interpreter’s virtual machine such that only one thread
can be executing Python code at any given time. There are other mechanisms
to provide
concurrency for the language, but the specter of the GIL—and its reality as
well—have often been cited as a major negative for Python. Back in October
2021, Sam Gross introduced
a proof-of-concept, no-GIL version of the
language
. It was met with a lot of excitement at the time, but
seemed to languish to a certain extent for more than a year; now, the Python
Steering
Council has announced its intent to accept the
no-GIL feature
. It will still be some time before it lands in a
released Python version—and there is the possibility that it all has to be
rolled back at some point—but there are several companies backing the
effort, which gives it all a good chance to succeed.

After its introduction in 2021, and the discussion around that, the next public
appearance for the feature was at the 2022
Python Language Summit
in April. Gross gave a talk about
his no-GIL fork in the
hopes of getting some tacit agreement on proceeding with the work. That
agreement
was not forthcoming, in part because the full details and implications of
a no-GIL interpreter were not really known.
Meanwhile, the Faster CPython project, which came about in mid-2021 had been working along
on its plan to increase the single-threaded performance of the
interpreter.
Mark Shannon reported
on the status of that effort
at the 2022 summit as well. He also
authored PEP 659
(“Specialized Adaptive Interpreter”) that describes the kinds of changes
being made, some of which have found their way into Python 3.10
and 3.11.

At this year’s PyCon, two of the Faster CPython team gave talks describing
the techniques they have been using to improve the performance of the
interpreter: Brandt Bucher looked at adaptive
instructions
, while Shannon described memory layout improvements and other
optimizations
.
Given the GIL, nearly all existing Python
programs are single threaded, so improving the performance of
those programs will effectively speed up the entire Python world. One of
the concerns that has been heard about no-GIL Python is what its impact on
single-threaded programs would be.

PEP 703

In January 2023, Łukasz Langa posted
the first version of PEP 703 (“Making the
Global Interpreter Lock Optional in CPython”) that is authored by Gross;
Langa is sponsoring the PEP
as a core developer. As might be guessed, that set off a lengthy thread,
with, once again, a lot of excitement. There were also some concerns expressed
with regard to the implications of not having a GIL, especially for Python
extensions written in C; since the GIL protects that code from
many concurrency problems, removing it might well lead to bugs.

One
thing that everyone wants to avoid is another “flag day” transition like
that of Python 2 to 3. The huge and unfortunate impact of
Python 3 being incompatible with its predecessor was not foreseen—the
core developers vastly underestimated the growing popularity of the
language, for one thing—but that mistake will not be repeated. Any switch
to remove the GIL will need to smoothly work with code that is not (yet)
ready for it.

There was a question
from Shannon about “what people think is a acceptable slowdown for
single-threaded code
“. To a large extent, that question went
unanswered in the thread, but he had estimated an impact “in the 15-20%
range, but it could be more, depending on the impact on PEP 659
“.

Another Faster CPython team member, Eric Snow, posted
a lengthy analysis with a bunch of questions, which he summarized as:
tl;dr I’m really excited by this proposal but have significant
concerns, which I genuinely hope the PEP can address.
” He noted that
he was the author of a “competing” concurrency option in PEP 684 (“A
Per-Interpreter GIL”), along with the related PEP 683 (“Immortal
Objects, Using a Fixed Refcount”), though he does not truly see multiple
sub-interpreters, each with their own GIL, as being incompatible with the
no-GIL work. Much of his concern was focused on the impacts on the C
extensions (which is also a problem for PEP 684, though to a lesser
extent), but single-threaded performance was also mentioned. Gross replied
that the impact on the extensions was not completely negative:

There are also substantial benefits to extension module maintainers. The
PEP includes quotes from a number of maintainers of widely used C API
extensions who suffer the complexity of working around the GIL. For
example, Zachary DeVito, PyTorch core developer, wrote “On three separate
occasions in the past couple of months… I spent an order-of-magnitude more
time figuring out how to work around GIL limitations than actually solving
the particular problem.”

Updated PEP

The thread had mostly run its course by the end of January.
In early May, Gross posted
an updated version of PEP 703, along with an implementation based on the
in-progress Python 3.12
. There was just one response
early on
(which Gross replied
to
). On May 12, Gross asked the
steering council to decide on the PEP
. As it turned out, there was
still a lot more discussion to go before any decision would be made.

On June 2, Shannon posted
a performance assessment
of the PEP with some pretty eye-opening
numbers (that were disputed) on the impact of the changes; his estimates
of the impact ranged from 11 to 30%.
He also noted that removing the GIL had some negative impacts on the
existing and planned Faster CPython work:

The adaptive specializing interpreter relies on the GIL; it is not
thread-friendly.
If NoGIL is accepted, then some redesign of our optimization strategy will
be necessary.
While this is perfectly possible, it does have a cost.
The effort spent on this redesign and resulting implementation is not being
spent on actual optimizations.

Shannon has noted that he is not a fan of the free-threading, shared-memory
concurrency model; his assessment ends with a suggestion that
sub-interpreters provide a better concurrency solution with fewer of the
performance and other concerns that no-GIL brings. Others, including
steering council member Gregory P. Smith
found that analysis to be
somewhat oversimplified. Langa posted
benchmark numbers
that showed considerably less impact than Shannon’s
estimates. Langa followed
that up
with some additional results that correspond closely with what
Gross had reported in the PEP.

Guido van Rossum, who heads up the Faster CPython team, wanted to
ensure that everyone
learned from the mistakes made in the past:

If there’s one lesson we’ve learned from the Python 2 to 3 transition, it’s
that it would have been very beneficial if Python 2 and 3 code could
coexist in the same Python interpreter. We blew it that time, and it
set us
back by about a decade.

Let’s not blow it this time. If we’re going forward with nogil (and I’m not
saying we are, but I can’t exclude it), let’s make sure there is a way to
be able to import extensions requiring the GIL in a nogil interpreter
without any additional shenanigans – neither the application code nor the
extension module should have to be modified in any way […]

Meanwhile, Smith replied to Gross’s steering-council request (and copied
it to the forum thread
):

The steering council is going to take its time on this. A huge thank you
for working to keep it up to date! We’re not ready to simply pronounce on
703 as it has a HUGE blast radius.

[…] That does not mean “no” to this. There is demand for it. (personally,
I’ve wanted this since forever!
) It’s just that it won’t be easy and we’ll
need to consider the entire ecosystem and how to smoothly allow such a
change to happen without breaking the world.

I’m glad to see the continued discuss thread with faster-cpython folks in
particular piping up. The intersection between this work and ongoing single
threaded performance improvements will always be high and we don’t want to
hamper that in the near term.

Gross largely
disagreed
with Shannon’s assessment and, in particular, with his
characterization of threading. He was also, seemingly, somewhat
unhappy
with Smith’s reply:

You wrote that the Steering Council’s decision does not mean “no,” but the
steering council has not set a bar for acceptance, stated what evidence is
actually needed, nor said when a final decision will be made. Given the
expressed demand for PEP 703, it makes sense to me
for the steering committee to develop a timeline for identifying the
factors it may need to consider and for determining the steps that would be
required for the change to happen smoothly.

Without these timelines and milestones in place, I would like to explain
that the effect of the Steering Council’s answer is a “no” in practice. I
have been funded to work on this for the past few years with the milestone
of submitting the PEP along with a comprehensive implementation to convince
the Python community. Without specific concerns or a clear bar for
acceptance, I (and my funding organization) will have to treat the current
decision-in-limbo as a “no” and will be unable to pursue the PEP further.

That obviously put pressure on the council, as did the users who were
clamoring for a no-GIL Python, but the decision is clearly not a simple
one. On June 14, more pressure was applied from the Faster CPython
team. Van Rossum described
some of
the costs of no-GIL, but also expressed concern about waiting for a
decision:

We’ve had a group discussion about how our work would be affected by free
threading. Our key conclusion is that merging nogil will set back our
current results by a significant amount of time, and in addition will
reduce our velocity in the future. We don’t see this as a reason to reject
nogil – it’s just a new set of problems we would have to overcome, and we
expect that our ultimate design would be quite different as a result. But
there is a significant cost, and it’s not a one-time cost. We could use
help from someone (Sam?) who has experience thinking about the problems
posed by the new environment.

[…] In the meantime we’re treading water, unsure whether to put our
efforts in continuing with the current plan, or in designing a new,
thread-safe optimization architecture.

Fast, free threading

The next day, Shannon started
a new thread
(titled: “A fast, free threading Python”) that described
three possible options for a way forward. It started with a lengthy
description of the tradeoffs for optimization of a dynamic language like
Python. Of the three aspects that he thinks need to be considered,
single-threaded performance, parallelism, and mutability, the last has
mostly been glossed over in earlier discussions, “but it is key“:

It isn’t quite:

Performance, parallelism, mutability: pick two.

but more like:

Performance, parallelism, mutability: pick one to restrict.

He also cautioned that there are some unknowns:

Performing the optimizations necessary to make Python fast in a
free-threading environment will need some original research. That makes it
more costly and a lot more risky.

The options for the steering council amount to choosing a fast
single-threaded interpreter as currently planned, a no-GIL free-threading
interpreter with an unknown (but non-zero) impact on single-threaded
performance, or both at the same time. His preference is for both,
but he is concerned that the council might choose no-GIL without also
committing to the rest of the work needed:

Please don’t choose option 2 [no-GIL] hoping that we will get option 3
[both], because
“someone will sort out the performance issue”. They won’t, unless the
resources are there.

If we must choose option 1 [current Faster CPython plans] or 2, then I
think it has to be option 1.
It gives us a speedup at much lower cost in CPUs and energy,
by doing things more efficiently rather than just throwing lots of cores at
the problem.

Marc-André Lemburg asked
about a phased approach, where, effectively, GIL or no-GIL were chosen at
the command line; over time, the two could slowly be merged. “Or would
this not be feasible because the ‘slow merge’ would actually require
redesigning the whole specialization approach?
” Smith replied
that he thinks that is more or less what PEP 703 is proposing; even
though Shannon basically recommended against it, Smith thinks pursuing
both at once is possible:

I’d more or less expect work on specialization for to proceed in parallel
without worrying if those benefits cannot yet be available in a free
threaded build for a few of releases. Turning it mostly into an additional
code maintenance and test matrix burden on the CPython core dev side to
keep both our still-primary single threaded GIL based interpreter and the
experimental free threaded build working.

I figure this is basically exactly what Mark claims not to
want. Presumably
due to the interim added build and maintenance complexity. But also seems
like the most likely way to get to his “both” option 3 that I suspect we
all magically wish would just happen.

Smith followed
that up
by noting that free threading will need to addressed at some
point; even if the Faster CPython plans work out and Python 3.15 is
five times faster than Python 3.10, nobody will “be satisfied at ‘just
5x’ in the end
“. Van Rossum agreed,
but was also concerned that the council “might be betting on hope as a
strategy
” by choosing no-GIL and hoping for the best.

Like Mark, I hope that you’re choosing (3) – like Mark says, it’s clearly
the best option. But we will need to be honest about it, and accept that we
need more resources to improve single-threaded performance. (And, as I
believe someone already pointed out, it will also be harder to do future
maintenance on CPython’s C code, since so much of it is now exposed to
potential race conditions. This is a problem for a language that’s for a
large part maintained by volunteers.)

The talk of “more resources” led Itamar Oren to
wonder
what that means: “It’s not clear to me to what extent the SC
[steering council] is in a position to tie PEP acceptance or rejection to
allocation of funding.
” Van Rossum replied
that Microsoft was committed to continue funding the team and that “our
charter
is not limited to single-threaded performance work
“, but that there is
extra work to do in a no-GIL world:

Meanwhile, we can start adapting the specialization and optimization work
to a no-GIL world, with the goal of obtaining Mark’s Option 3 (free
threading and faster per-thread performance). Ideally we would reach a
state where we can make no-GIL the one and only build mode without a drop
in single-threaded performance (important for apps that haven’t been
re-architected, e.g. apps that currently use multi-processing, or
algorithms that are hard to parallelize).

It is this latter step (getting to Option 3) that requires extra resources
– for example, it would be great if Meta or another tech company could
spare some engineers with established CPython internals experience to help
the core dev team with this work.

Finally, I want to re-emphasize that while Microsoft has a team using the
Faster CPython moniker, we don’t intend to own CPython performance – we
believe in good citizenship and want to contribute in a way that puts our
skills and experience to the best possible use for the Python community.

Van Rossum did not just choose Meta out of a hat, here; Gross works for the
company, which presumably funded his no-GIL work, and the Cinder CPython fork
is maintained by a team at Meta. Carl Meyer said
that he expected the Cinder team to work on no-GIL Python. In fact, on
July 7, Meyer announced
that Meta would fund work on the no-GIL interpreter:

If PEP 703 is accepted, Meta can commit to support in the form of three
engineer-years (from engineers experienced working in CPython internals)
between the acceptance of PEP 703 and the end of 2025, to collaborate with
the core dev team on landing the PEP 703 implementation smoothly in CPython
and on ongoing improvements to the compatibility and performance of nogil
CPython.

On July 19, Anaconda followed
suit
. Stan Seibert said that the company would fund work on the
packaging challenges that will be associated with adopting PEP 703,
including any work on pip, cibuildwheel, and conda-forge that will be
needed to get nogil-compatible packages into the hands of the Python
community
“. Some of that funding commitment likely helped the council
reach a verdict, but the results of a core-developer
poll on no-GIL
also pushed the council in the direction of accepting the
PEP. That poll showed 87% of 46 voters thought that free-threaded Python
should be actively pursued and 63% of 38 voters said that they were willing
to help support and maintain a no-GIL Python based on PEP 703.

Steering council decision

On July 28, council member Thomas Wouters announced
that the council would be accepting PEP 703, though it was “still
working on the acceptance details
“. The idea would be to introduce the
no-GIL version of the interpreter in order to give everyone a chance to
figure out what pieces are missing, so that they can be filled in before
no-GIL becomes the default and, eventually, the only, version of Python.
The time frame for that transition is estimated to be around five years,
but there will be no repeat of earlier mistakes:

We do not want another Python 3 situation, so any changes in third-party
code needed to accommodate no-GIL builds should just work in with-GIL
builds (although backward compatibility with older Python versions will
still need to be addressed). This is not Python 4. We are still considering
the requirements we want to place on ABI compatibility and other details
for the two builds and the effect on backward compatibility.

As was noted in the various discussions, there is more to removing the GIL
than simply adopting a PEP. Wouters made it clear that the core developers
will need to gain experience with no-GIL Python so that they can lead the
rest of the community:

We will probably need to figure out new C APIs and Python APIs as we sort
out thread safety in existing code. We also need to bring along the rest of
the Python community as we gain those insights and make sure the changes we
want to make, and the changes we want them to make, are palatable.

If the Python community finds that the switch is “just
going to be too disruptive for too little gain
“, the council wants to
be able to change its mind anytime before declaring no-GIL as the default
mode for the language. He outlined the steps that the council sees,
starting with a short-term (perhaps for Python 3.13, which is due in
October 2024) experimental no-GIL build of the interpreter that core
developers and others can try out. In the medium term, no-GIL would be a
supported option, but not the default; when that happens depends a lot on
how quickly the community adopts and supports the no-GIL build. In the
long term,
no-GIL would be the default build and the GIL would be completely excised
(“without unnecessarily breaking backward compatibility“). Along
the way, periodic reviews will be needed:

Throughout the process we (the core devs, not just the SC) will need to
re-evaluate the progress and the suggested timelines. We don’t want this to
turn into another ten year backward compatibility struggle, and we want to
be able to call off PEP 703 and find another solution if it looks to become
problematic, and so we need to regularly check that the continued work is
worth it.

As might be guessed, that spawned multiple congratulatory and
excited-for-the-future responses, though there are a few who think that
keeping the GIL would be a better choice for the language. The
announcement presumably
also sent the Faster CPython folks back to their drawing boards; though
there were some accusations of turf wars in the discussions, that did not
really seem to be the case. The Faster CPython team simply wanted to
ensure that all of the costs were taken into consideration; overall, the
team seems quite excited to work on surmounting the challenges of producing
a no-GIL
interpreter, with minimal (or, ideally, no) performance impact on
single-threaded
code.

It is quite a turning point in the history of the language, but the work is
(obviously) not done yet. There is a huge amount of researching, coding,
testing, experimenting,
documenting, and so on between here and a no-GIL-only version of the language
in, say, Python 3.17 in October 2028. One guesses that the work
will not be done, then, either—there will be more optimizations to be
found and applied if there is still funding available to do so.
Meanwhile, we have yet to dig into the details of the PEP itself; that will
come soon. We will be keeping an eye on the no-GIL development process as
it plays
out over the coming years as well.






(Log in to post comments)

Read More

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

- Advertisment -
Google search engine

Most Popular

Recent Comments