Learning: Python Study Resources & Notes

Published on

Original language: Chinese . AI translations: English , Japanese .


Finally started learning Python. Here are some recommended resources, and I’m putting my notes here too.

  • 卡瓦邦噶! - 如何学Python?
    A very detailed collection of Python learning resources. From zero-to-one, intermediate, deep dives, interviews: it has pretty much everything.

    The author also has another page: 卡瓦邦噶! - 珍藏资料, with lots of good stuff as well.

  • Piglei - Python 工匠
    Python gives you a lot of freedom, but the tradeoff is that the gap between the floor and the ceiling can be huge.
    This book discusses how to turn code that runs into great code, and how to write more Pythonic code by taking advantage of Python language features.
    It also covers knowledge that applies beyond Python: naming, how to unify code style in large projects, etc.
    It’s easy to read. The GitHub version is free; if you like it, you can buy the book version, which is more detailed than the web version.

  • Python Documentation
    Sometimes you read third-hand materials and still can’t understand them. In those moments, checking the official docs might be what makes it click. If you can, read more docs.
    Both 卡瓦邦噶! and Python 工匠 recommend reading the official docs for Python Documentation - itertools, to learn the wheels in the standard library and avoid reinventing them.

  • CS61A
    The famous CS61A. I’ve heard about it forever, but haven’t watched/read it yet.

  • 捕蛇者说
    A podcast. The snake in the name is of course Python.
    Listening casually once in a while can lead you to lots of new resources.

    One of the hosts is laixintao, the author of 卡瓦邦噶! mentioned above.


Notes

These are notes I took while learning Python: excerpts or paraphrases from what I read, plus a few points I personally find valuable. It helps reinforce memory and makes it easier to look things up later.

== None or is None

Reading Python 工匠: 与 None 值的比较
On 2024.07.14

When checking for None, use is None

In Python, there are two ways to compare values: == and is.
The author’s explanation:

  • ==: checks whether the values are equal
  • is: checks whether they refer to the same object in memory, i.e. whether id(x) equals id(y)

None is a singleton object in Python. If you want to check whether a variable is None, remember to use is instead of ==, because only is strictly means “this object is None”.

== is actually calling the __eq__ magic method, which can be overridden to influence truthiness.
So when checking whether a variable is None, you should use is rather than ==.

A case where you need to override __eq__

In the book version of Python 工匠, section 12.1.2 比较运算符重载, the author gives an example where you should override __eq__.
For example, a class Square representing a square, taking side length as input and computing area:

class Square:
"""正方形
:param length: 边长
"""
def __init__(self, length):
self.length = length
def area(self):
return self.length ** 2

Obviously, two squares with the same side length can be considered “the same square”.
But by default, Python considers them different objects and returns False:

>>> x = Square(4)
>>> y = Square(4)
>>> x == y
False

So you need to override __eq__, and the other five comparison operators: __ne__, __lt__, __le__, __gt__, __ge__.

Using the built-in @total_ordering decorator from the functools module reduces the work.
You only need to override __eq__ plus one of __lt__, __le__, __gt__, __ge__.

from functools import total_ordering
@total_ordering
class Square:
"""正方形
:param length: 边长
"""
def __init__(self, length):
self.length = length
def area(self):
return self.length ** 2
def __eq__(self, other):
if isinstance(other, self.__class__):
return self.length == other.length
return False
def __lt__(self, other):
if isinstance(other, self.__class__):
return self.length < other.length
return NotImplemented

With this, the Square class can be compared normally.

Iterable objects (iterable) and iterators (iterator)

Reading Python 工匠: 区分迭代器与可迭代对象
On 2024.07.14

What is an iterable

The official docs define an iterable here: official definition

An object capable of returning its members one at a time. Examples of iterables include all sequence types (such as list, str, and tuple) and some non-sequence types like dict, file objects, and objects of any classes you define with an __iter__() method or with a __getitem__() method that implements sequence semantics.

So an iterable is an object type that can return one member at a time.
Each iteration yields one member, until the iterable is exhausted.

Obviously list, str, and tuple satisfy those conditions.
The docs also list dict and file objects as iterables, which seems fine (I don’t know the exact details).
And you can make any class iterable by defining an __iter__() method.

I originally thought a set should not be iterable, because it’s unordered.
But when I happened to read this: Python Documentation: Howto - Functional, I learned that sets are iterable too.
I looked it up: a set doesn’t guarantee the same order across iterations. The order is determined by hash values, and can change when the set changes.

Different from an iterator

Now that we know what an iterable is, let’s try iterating with next().

>>> # 定义一个简单 list
>>> l = [4, 3, 2, 1]
>>> # 看看这个列表
>>> l
[4, 3, 2, 1]
>>> # 嗯,没问题
>>> # 迭代一下
>>> next(l)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'list' object is not an iterator
>>> # 报错: list 不是迭代器

Why doesn’t it work?

Back to the docs: the official definition of an iterable also has a second paragraph:

Iterables can be used in a for loop and in many other places where a sequence is needed (zip(), map(), …). When an iterable object is passed as an argument to the built-in function iter(), it returns an iterator for the object. This iterator is good for one pass over the set of values. When using iterables, it is usually not necessary to call iter() or deal with iterator objects yourself. The for statement does that automatically for you, creating a temporary unnamed variable to hold the iterator for the duration of the loop.

  • Iterables can be used in loops and other places that need sequences.
  • If you pass an iterable to iter(), it returns an iterator for that iterable.
  • Iterators are suitable for one-time use.
  • When using iterables, you usually don’t need to call iter() or deal with iterators manually.
  • For example, in a for loop, Python automatically manages the iterator for you.

Therefore, the common loop for i in range(4): is not iterating range(4), but iterating the iterator returned by iter(range(4)).
Python’s for loop just automates this process.

Let’s verify in the REPL:

>>> # 接着刚才的 list
>>> l
[4, 3, 2, 1]
>>> # 对 list 对象调用 iter()
>>> iter(l)
<list_iterator object at 0x7faffaac46d0>
>>> # 创建了一个 list_iterator 对象
>>> # 对 iterator 使用 next()
>>> next(iter(l))
4
>>> # 成功拿到 list 第一个值

If you keep calling next() on an iterator, you can get values from the list in order.

>>> l
[4, 3, 2, 1]
>>> iter(l)
<list_iterator object at 0x7faaf1ac46d0>
>>> next(iter(l))
4
>>> next(iter(l))
4
>>> next(iter(l))
4
>>> next(iter(l))
4
>>>

You might expect 4, 3, 2, 1, but next(iter(l)) keeps returning the first value.
That’s because iter(l) inside next(iter(l)) creates a new iterator each time.
So every time you call next(iter(l)), you’re calling next() on a fresh iterator, which can only return the first value.

You can see it in the REPL:

>>> iter(l)
<list_iterator object at 0x7faaf1ae4430>
>>> iter(l)
<list_iterator object at 0x7faaf1ac46d0>
>>> iter(l)
<list_iterator object at 0x7faaf1c2d240>

Calling iter(l) three times returns three different list_iterator objects (different addresses after at).

So if you want next() to walk through the whole iterator, you need an intermediary:

>>> # 可迭代对象
>>> l
[4, 3, 2, 1]
>>> # 调用 next 会报错
>>> next(l)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'list' object is not an iterator
>>> # 把迭代器对象赋值给 l_iterator
>>> l_iterator = iter(l)
>>> # next 它
>>> next(l_iterator)
4
>>> next(l_iterator)
3
>>> next(l_iterator)
2
>>> next(l_iterator)
1
>>> next(l_iterator)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration

Now you can get all values from the iterator.

Generators and generator expressions

Iterators seem convenient, but they’re one-time consumables. If you need a bunch of setup every time, it defeats the purpose.
So Python provides two easy ways to use iterators:

  • Generators
    Use yield instead of return in a function. Then the function can prepare a sequence, and next() can pull values out.

  • Generator expressions
    A common example is for i in range(4):.

    Python Documentation: generator expression gives another example:

    >>> sum(i*i for i in range(10)) # sum of squares 0, 1, 4, ... 81
    285

How to count elements in an iterator

For a normal list, you can do len(list) to get element count.
But an iterator is lazy; you can’t know how many elements it has without iterating through it.
And once you iterate through it, the iterator is exhausted.
So counting it doesn’t seem that meaningful (

An iterator is a one-time consumable. Use it and throw it away.
If you need to loop multiple times, use a normal object like a list.

Ref:
Python Documentation: iterable
Python Documentation: iterator
Python Documentation: generator
Python Documentation: generator expression

Functional programming

Reading Python Documentation: Functional Programming HOWTO
On 2024.07.15

TL;DR: the last section recommends not using lambda expressions

Fredrik Lundh once suggested the following set of rules for refactoring uses of lambda:

  1. Write a lambda function.
  2. Write a comment explaining what the heck that lambda does.
  3. Study the comment for a while, and think of a name that captures the essence of the comment.
  4. Convert the lambda to a def statement, using that name.
  5. Remove the comment.

I really like these rules, but you’re free to disagree about whether this lambda-free style is better.
Using lambda expressions may save you two lines of code, but the comprehension cost skyrockets.
From the perspective of readability and comprehension cost, functional programming is not recommended. Just use def and for loops in a plain way.

The intro part of this article is nice. It says programming languages have roughly four schools of thought for solving problems:
Procedual: procedural, e.g. C, Pascal, Unix Shell
Declarative: SQL
Object-oriented: OOP
Functional: functional programming
Python is very flexible: you can do procedural, object-oriented, or functional, depending on the scenario.

This article includes many examples about iterables, iterators, and generators, which also served as a review for the previous section.
I also saw a few functions that might be useful:

  • filter(predicate, iter)
    Same effect as the generator expression (x for x in range(10) if is_even(x))

  • enumerate(iter, start=0)
    Often used to get index and value at the same time

  • any(iter) and all(iter)
    Check booleans in an iterator

  • zip(iterA, iterB, …, strict=False)
    Similar to pivoting rows/columns in Excel. By default, it uses the shortest iterator as the length.
    If you want to strictly require all iterators to have the same length, set strict=True.

  • itertools.count(start, step)
    Returns an infinite iterator

  • itertools.repeat(elem, [n])
    Repeat elem n times

  • itertools.islice(iter, [start], stop, [step])
    Slicing. One arg is stop; two args are start and stop; three args are start, stop, step

  • itertools.tee(iter, [n])
    Copy one iterator into n iterators. I looked it up: seems used in data analysis, log analysis, etc.
    If you need to analyze the same dataset multiple times, tee can split it into multiple iterators for separate analyses.

  • itertools.filterfalse(predicate, iter)
    The opposite of filter

  • itertools.dropwhile(predicate, iter)
    Literally what it says

  • itertools.combinations(iterable, r)
    Combinatorics

    itertools.combinations([1, 2, 3, 4, 5], 2) =>
    (1, 2), (1, 3), (1, 4), (1, 5),
    (2, 3), (2, 4), (2, 5),
    (3, 4), (3, 5),
    (4, 5)

After reading it, I thought it was pretty interesting.

One more: itertools.product(*iterables, repeat=1) is basically nested loops.

Tools commonly used in large projects

Reading 图书版 Python 工匠 13.1 开发大型项目常用工具介绍 section
On 2024.07.16

Linter:

  • flake8: checks code style, customizable
  • black: stricter style checker, almost not customizable
  • isort: sorts import statements
  • pre-commit: enforces checks before commit
  • mypy: static type checking with type annotations

Four common built-in container types

Reading Python 工匠:4. 容器的门道
On 2024.07.11

In Python, the four most common built-in container types are: list, tuple, dict, set.

  • List: ordered, mutable, allows duplicates. Suitable for storing/accessing data in order.
  • Tuple: ordered, immutable, allows duplicates. Suitable for storing immutable data.
  • Dictionary: ordered (Python 3.7+), mutable, unique keys. Suitable for fast key lookup.
  • Set: unordered, mutable, no duplicates. Suitable for unique elements and set operations (intersection/union).

Python 工匠:写更快的代码

  1. Avoid frequently growing lists / creating new lists

    • Use iterators more: yield, generator expressions
    • Prefer lazy objects provided by modules:
      • Use re.finditer instead of re.findall
      • Use iterable file objects directly: for line in fp instead of for line in fp.readlines()
  2. Use deque when you often operate at the head of a list

    A list is implemented as an array. When you insert at the head (list.insert(0, item)), all later elements must be moved, so the time complexity is O(n). This makes head insertion much slower than appending at the tail (list.append(item) is O(1)).

    If your code needs to do this many times, consider using collections.deque instead. deque is a double-ended queue; whether you append at the head or tail, it’s O(1).

  3. Use sets/dicts to test membership

    When you need to check whether an element exists in a container, sets are better than lists. item in [...] is O(n), while item in {...} is O(1). That’s because dicts and sets are implemented as hash tables.

    Hint: strongly recommend reading TimeComplexity - Python Wiki for time complexity details of common container types.

    If you’re interested in dict implementation details, also strongly recommend Raymond Hettinger’s talk: Modern Dictionaries(YouTube)

Edge cases, and the walrus operator

Reading Python 工匠:15. 在边界处思考
Reading Python 工匠:16. 语句、表达式和海象操作符
On 2024.07.16

The book version of Python 工匠 does not include these two chapters.

Thinking at the boundaries

The author thinks it’s best to use language features to handle edge cases in a more elegant way. Concretely in Python:

  • Use try to catch exceptions rather than if else for edge cases
    The Python community prefers EAFP (sEasier to Ask for Forgiveness than Permission) over LBYL (Look Before You Leap), because raising exceptions is lightweight in Python.
    The author recommends: Write Cleaner Python: Use Exceptions
  • Handling missing items in containers
    collections.defaultdict handles missing keys in dicts
    Use setdefault to read and update dict values
    Use dict.pop to delete missing keys
    List slicing doesn’t raise IndexError, so you don’t need boundary handling
  • When using or short-circuit to set defaults, besides None, empty containers are also False, so using if is more accurate for boundary handling
  • Don’t hand-roll user input validation; use pydantic or your framework’s built-in validators
  • Use %, abs(), math.floor() well

Statements, expressions, and the walrus operator

The walrus operator combines assignment and expressions.
It can be used in conditionals, loops, and list comprehensions, and helps reduce duplicated code.
But it also increases comprehension cost, so you shouldn’t blindly chase conciseness.

The walrus operator feels a bit like functional programming: concise, but higher comprehension cost.