Python etc
6.08K subscribers
18 photos
194 links
Regular tips about Python and programming in general

Owner — @pushtaev

© CC BY-SA 4.0 — mention if repost
Download Telegram
This post is provided by @PavelDurmanov:

As you may know, generators in Python are executed step-by-step. This means that there should be a possibility to "see" that state between the steps.

All generator's local variables are stored in frame locals, and we can access the frame through the gi_frame attribute on a generator:

def gen():
x = 5
yield x
yield x
yield x

g = gen()
next(g) # 5
g.gi_frame.f_locals # {'x': 5}


So if we can see it, we should be able to modify it, right?

g.gi_frame.f_locals["x"] = 10
next(g) # still gives us 5


Frame locals returned as a dict is a newly created object from actual frame local vars, meaning that returned dict doesn't reference the actual variables in the frame.

But there's a way to bypass that with C API:

import ctypes

# after we've changed the frame locals, we need to "freeze" it
# which is basically telling the interpreter to update the underlying frame based on newly added attributes
ctypes.pythonapi.PyFrame_LocalsToFast(ctypes.py_object(g.gi_frame), ctypes.c_int(0))


So now we can verify that the generator's locals have actually changed:

next(g)  # 10


You might wonder what is ctypes.c_int(0)? There are 2 "modes" you can use to update the underlying frame, 0 and 1. If you use 1, it'll add and/or update frame local vars that are already present in the frame. So if we'd remove the x from the locals dict and call the update with c_int(0), it'd do nothing as it cannot delete the vars.

if you want to actually delete some variable from the frame, call the update with c_int(1). That will replace underlying frame locals with the new locals we've defined .f_locals dict.

And as you may know, coroutines in Python are implemented using generators, so the same logic is present there as well, but instead of gi_frame it's cr_frame.
The os.curdir is a trap!

import os
os.curdir
# '.'

It's a constant indicating how the current directory is denoted in the current OS. And for all OSes that CPython supports (Windows and POSIX), it's always a dot. It might be different, though, if you run your code with MicroPython on some niche OS.

Anyway, to actually get the path to the current directory, you need os.getcwd:

os.getcwd()
# '/home/gram'

Or use pathlib:

from pathlib import Path
Path().absolute()
# PosixPath('/home/gram')
Python 3.11 is released! The most interesting features:

+ Fine-grained error location in tracebacks.
+ ExceptionGroup and the new except* syntax to handle it.
+ A new module to parse TOML.
+ Atomic grouping and possessive quantifiers for regexes.
+ Significant performance improvements.
+ New Self type.
+ Variadic generics.
+ Data class transforms.

That's a lot of smart words! Don't worry, we'll tell you in details about each of these features in the upcoming posts. Stay tuned!
PEP 657 (landed into Python 3.11) enhanced tracebacks so that they now include quite a precise location of where the error occurred:

Traceback (most recent call last):
File "query.py", line 24, in add_counts
return 25 + query_user(user1) + query_user(user2)
^^^^^^^^^^^^^^^^^
File "query.py", line 32, in query_user
return 1 + query_count(db, response['a']['b']['c']['user'], retry=True)
~~~~~~~~~~~~~~~~~~^^^^^
TypeError: 'NoneType' object is not subscriptable

It shows not only where the error occurred for each frame, but also which code was executed. Beautiful!
PEP 678 (landed in Python 3.11) introduced a new method add_note for BaseException class. You can call it on any exception to provide additional context which will be shown at the end of the traceback for the exception:

try:
1/0
except Exception as e:
e.add_note('oh no!')
raise
# Traceback (most recent call last):
# File "<stdin>", line 2, in <module>
# ZeroDivisionError: division by zero
# oh no!

The PEP gives a good example of how it can be useful. The hypothesis library includes in the traceback the arguments that caused the tested code to fail.
PEP 654 (landed in Python 3.11) introduced ExceptionGroup. It's an exception that nicely wraps and shows multiple exceptions:

try:
1/0
except Exception as e:
raise ExceptionGroup('wow!', [e, ValueError('oh no')])

# Traceback (most recent call last):
# File "<stdin>", line 2, in <module>
# ZeroDivisionError: division by zero

# During handling of the above exception, another exception occurred:

# + Exception Group Traceback (most recent call last):
# | File "<stdin>", line 4, in <module>
# | ExceptionGroup: wow! (2 sub-exceptions)
# +-+---------------- 1 ----------------
# | Traceback (most recent call last):
# | File "<stdin>", line 2, in <module>
# | ZeroDivisionError: division by zero
# +---------------- 2 ----------------
# | ValueError: oh no
# +------------------------------------

It's very helpful in many cases when multiple unrelated exceptions have occurred and you want to show all of them: when retrying an operation or when calling multiple callbacks.
PEP 654 introduced not only ExceptionGroup itself but also a new syntax to handle it. Let's start right with an example:

try:
raise ExceptionGroup('', [
ValueError(),
KeyError('hello'),
KeyError('world'),
OSError(),
])
except* KeyError as e:
print('caught1:', repr(e))
except* ValueError as e:
print('caught2:', repr(e))
except* KeyError as e:
1/0

The output:

caught1: ExceptionGroup('', [KeyError('hello'), KeyError('world')])
caught2: ExceptionGroup('', [ValueError()])
+ Exception Group Traceback (most recent call last):
| File "<stdin>", line 2, in <module>
| ExceptionGroup: (1 sub-exception)
+-+---------------- 1 ----------------
| OSError
+------------------------------------

This is what happened:

1. When ExceptionGroup is raised, it's checked against each except* block.

2. except* KeyError block catches ExceptionGroup that contains KeyError.

3. The matched except* block receives not the whole ExceptionGroup but its copy containing only matched sub-exceptions. In case of except* KeyError, it includes both KeyError('hello') and KeyError('world')

4. For each sub-exception, only the first match is executed (1/0 in the example wasn't reached).

5. While there are unmatched sub-exceptions, they will be tried to match to remaining except* blocks.

6. If there are still sub-exceptions left after all of that, the ExceptionGroup with them is raised. So, ExceptionGroup('', [OSError()]) was raised (and beautifully formatted).
There is one more thing you should know about except*. It can match not only sub-exceptions from ExceptionGroup but regular exceptions too. And for simplicity of handling, regular exceptions will be wrapped into ExceptionGroup:

try:
raise KeyError
except* KeyError as e:
print('caught:', repr(e))
# caught: ExceptionGroup('', (KeyError(),))
I often find myself writing a context manager to temporarily change the current working directory:

import os
from contexlib import contextmanager

@contextmanager
def enter_dir(path):
old_path = os.getcwd()
os.chdir(path)
try:
yield
finally:
os.chdir(old_path)


Since Python 3.11, a context manager with the same behavior is available as contextlib.chdir:

import os
from contextlib import chdir

print('before:', os.getcwd())
# before: /home/gram
with chdir('/'):
print('inside:', os.getcwd())
# inside: /
print('after:', os.getcwd())
# after: /home/gram
The typing.assert_type function (added in Python 3.11) does nothing in runtime as most of the stuff from the typing module. However, if the type of the first argument doesn't match the type provided as the second argument, the type checker will return an error. It can be useful to write simple "tests" for your library to ensure it is well annotated.

For example, you have a library that defines a lot of decorators, like this:

from typing import Callable, TypeVar

C = TypeVar('C', bound=Callable)

def good_dec(f: C) -> C:
return f

def bad_dec(f) -> Callable:
return f


We want to be 100% sure that all decorators preserve the original type of decorated function. So, let's write a test for it:

from typing import Callable, assert_type

@good_dec
def f1(a: int) -> str: ...

@bad_dec
def f2(a: int) -> str: ...

assert_type(f1, Callable[[int], str]) # ok
assert_type(f2, Callable[[int], str]) # not ok
PEP 681 (landed in Python 3.11) introduced typing.dataclass_transform decorator. It can be used to mark a class that behaves like a dataclass. The type checker will assume that it has init that accepts annotated attributes as arguments, eq, ne, and str. For example, it can be used to annotate SQLAlchemy or Django models, attrs classes, pydantic validators, and so on. It's useful not only for libraries that don't provide a mypy plugin but also if you use a non-mypy type checker. For instance, pyright, which is used by vscode Python plugin to show types, highlight syntax, provide autocomplete, and so on.
As we covered a 3 years back (gosh, the channel is old), if the result of a base class is the current class, a TypeVar should be used as the annotation:

from typing import TypeVar

U = TypeVar('U', bound='BaseUser')

class BaseUser:
@classmethod
def new(cls: type[U]) -> U:
...

def copy(self: U) -> U:
...

That's quite verbose, but it's how it should be done for the return type for inherited classes to be correct.

PEP 673 (landed in Python 3.11) introduced a new type Self that can be used as a shortcut for exactly such cases:

from typing import Self

class BaseUser:
@classmethod
def new(cls) -> Self:
...

def copy(self) -> Self:
...
The reveal_type function doesn't exist. However, if you call it and then run a type-checker (like mypy or pyright) on the file, it will show the type of the passed object:

a = 1
reveal_type(a)
reveal_type(len)

Now, let's run mypy:

$ mypy tmp.py
tmp.py:2: note: Revealed type is "builtins.int"
tmp.py:3: note: Revealed type is "def (typing.Sized) -> builtins.int"

It's quite helpful to see what type mypy inferred for the variable in some tricky cases.

For convenience, the reveal_type function was also added in typing module in Python 3.11:

from typing import reveal_type
a = 1
reveal_type(a)
# prints: Runtime type is 'int'
reveal_type(len)
# prints: Runtime type is 'builtin_function_or_method'

And for curious, here is the definition:

def reveal_type(__obj: T) -> T:
print(
f"Runtime type is {type(__obj).__name__!r}",
file=sys.stderr,
)
return __obj
PEP 675 (landed in Python 3.11) introduced a new type typing.LiteralString. It matches any Literal type, which is the type for explicit literals and constants in the code. The PEP shows a very good example of how it can be used to implement a SQL driver with protection on the type-checker level against SQL injections:

from typing import LiteralString, Final

def run_query(sql: LiteralString): ...

run_query('SELECT * FROM students') # ok

ALL_STUDENTS: Final = 'SELECT * FROM students'
run_query(ALL_STUDENTS) # ok

arbitrary_query = input()
run_query(arbitrary_query) # type error, don't do that
The isinstance function checks whether an object is an instance of a class or of a subclass thereof:

class A: pass
class B(A): pass
b = B()
isinstance(b, B) # True
isinstance(b, A) # True
isinstance(b, object) # True
isinstance(b, str) # False
isinstance(str, type) # True


Type-checkers understand isinstance checks and use them to refine the type:

a: object
reveal_type(a)
# ^ Revealed type is "builtins.object"
if isinstance(a, str):
reveal_type(a)
# ^ Revealed type is "builtins.str"


One more cool thing about isinstance is that you can pass in it a tuple of types to check if the object is an instance of any of them:

isinstance(1, (str, int)) # True
PEP 427 introduced (and PEP 491 improved) a new format for Python distributions called "wheel".

Before the PEP, Python distributions were just tar.gz archives containing the source code of the library distributed, some additional files (README.rst, LICENSE, sometimes tests), and setup.py file. To install the library from the distribution, pip had to download the archive, extract it into a temporary directory, and execute python setup.py install to install the package.

Did it work? Well, kind of. It works well enough for pure Python packages, but if the package has C code, it had to be built on the target machine each time the package needs to be installed, because the built binary highly depends on the target OS, architecture, and Python version.

The new wheel format allows to significantly speed up the process. It changed 2 significant things:

1. The file name for wheel packages is standardized. It contains the name and version of the package, the required minimal version (2.7, 3.8), the type (CPython, PyPy) of the Python interpreter, OS name, architecture, and ABI version. For example, flask-1.0.2-py2.py3-none-any.whl says "it is flask package version 1.0.2 for both Python 2 and 3, any ABI, and any OS". That means, Flask is a pure Python package, so can be installed anywhere. Or psycopg2-2.8.6-cp310-cp310-linux_x86_64.whl says "it is psycopg2 version 2.8.6 for CPython 3.10 Linux 64bit". That means psycopg2 has some prebuild C libraries for a very specific environment. The package can have multiple wheel distributions per version, and pip will pick and download the one that is made for you.

2. Instead of setup.py, the archive (which is now zip instead of tar.gz) contains already parsed metadata. So, to install the package, it's enough to just extract it into site-packages directory, no need to execute anything.

Currently, the wheel distribution format is well-adopted and available for almost all modern packages.

When you create a new virtual environment, make sure you have the latest version of setuptools for tarballs, and the latest version of the wheel package for wheels. No, really, do it. The wheel package is not installed by default in the new venvs, and without it, installation of some packages will be slow and painful.

python3 -m venv .venv
.venv/bin/pip install -U pip setuptools wheel
PEP-518 introduced changes not in Python itself but rather in its ecosystem. The idea is pretty simple: let's store configs for all tools in pyproject.toml file, in tool.TOOL_NAME section. For example, for mypy:

[tool.mypy]
files = ["my_project"]
python_version = 3.8

At this moment, almost all popular tools support pyproject.toml as the configuration file, in one way or another: mypy, pytest, coverage, isort, bandit, tox, etc. The only exception from the tooling I know is flake8.

Before pyproject.toml, many tools used to use setup.cfg for the same purpose, but this format (INI) has a few disadvantages compared to TOML: it's not well-standardized, and the only supported type of values is string.
PEP-517 and PEP-518 introduced the build-system section in pyproject.toml that tells package management tools (like pip) how to build wheel distributions for the project. For example, this is the section if you use flit:

[build-system]
requires = ["flit_core >=3.2,<4"]
build-backend = "flit_core.buildapi"


It tells pip to install flit_core of the given version and then call callbacks inside flit_core.buildapi, which should build the distribution for the project.

Having this section allows pip to build and install any Python project from the source, doesn't matter what build system it uses. Before the PEP, tools like poetry and flit had to generate a special setup.py file for pip to be able to install the project from the source (or a non-wheel tarball distribution).
To recap: PEP-518 introduced pyproject.toml, and many Python tools started to use it to store their configs. The issue, however, is that there is no module in stdlib to parse TOML. So, different tools started to use different third-party packages for the task:

+ tomli (used by mypy) is a pure Python library that can only read TOML.
+ toml (used by most of the tools) can both read and write TOML.
+ tomlkit (used by poetry) can read, write, and modify TOML (preserving the original formatting and comments).

PEP 680 (landed in Python 3.11) introduced tomli into stdlib. But why tomli and not another library? It's pure Python and minimalistic. It cannot write TOML files, but reading is enough for most of the tools to work with pyproject.toml. And to avoid unpleasant conflicts when tomli is installed in the same environment, the name of the module was changed to tomllib.
The float type is infamous for being not as precise as you might expect. When you add 2 numbers, the result might contain a small error in precision. And the more numbers you add together, the higher the error:

sum([.9] * 1_000)
# 899.9999999999849

sum([.9] * 1_000_000)
# 900000.0000153045


If you want to minimize the error when summing together a list of floats, use math.fsum:

import math

math.fsum([.9] * 1_000_000)
# 900000.0