Finally started learning Python. Here are some recommended resources, and I’m putting my notes here too.
Recommended Resources
-
卡瓦邦噶! - 如何学Python?
A very detailed collection of Python learning resources. From zero-to-one, intermediate, deep dives, interviews: it has pretty much everything.The author also has another page: 卡瓦邦噶! - 珍藏资料, with lots of good stuff as well.
-
Piglei - Python 工匠
Python gives you a lot of freedom, but the tradeoff is that the gap between the floor and the ceiling can be huge.
This book discusses how to turncode that runsintogreat code, and how to write morePythoniccode by taking advantage of Python language features.
It also covers knowledge that applies beyond Python: naming, how to unify code style in large projects, etc.
It’s easy to read. The GitHub version is free; if you like it, you can buy the book version, which is more detailed than the web version. -
Python Documentation
Sometimes you read third-hand materials and still can’t understand them. In those moments, checking the official docs might be what makes it click. If you can, read more docs.
Both卡瓦邦噶!andPython 工匠recommend reading the official docs for Python Documentation - itertools, to learn the wheels in the standard library and avoid reinventing them. -
CS61A
The famousCS61A. I’ve heard about it forever, but haven’t watched/read it yet. -
捕蛇者说
A podcast. Thesnakein the name is of coursePython.
Listening casually once in a while can lead you to lots of new resources.One of the hosts is
laixintao, the author of卡瓦邦噶!mentioned above.
Notes
These are notes I took while learning Python: excerpts or paraphrases from what I read, plus a few points I personally find valuable. It helps reinforce memory and makes it easier to look things up later.
== None or is None
Reading Python 工匠: 与 None 值的比较
On 2024.07.14
When checking for None, use is None
In Python, there are two ways to compare values: == and is.
The author’s explanation:
==: checks whether the values are equalis: checks whether they refer to the same object in memory, i.e. whetherid(x)equalsid(y)
Noneis a singleton object in Python. If you want to check whether a variable is None, remember to useisinstead of==, because onlyisstrictly means “this object is None”.
== is actually calling the __eq__ magic method, which can be overridden to influence truthiness.
So when checking whether a variable is None, you should use is rather than ==.
A case where you need to override __eq__
In the book version of Python 工匠, section 12.1.2 比较运算符重载, the author gives an example where you should override __eq__.
For example, a class Square representing a square, taking side length as input and computing area:
class Square: """正方形
:param length: 边长 """
def __init__(self, length): self.length = length
def area(self): return self.length ** 2Obviously, two squares with the same side length can be considered “the same square”.
But by default, Python considers them different objects and returns False:
>>> x = Square(4)>>> y = Square(4)>>> x == yFalseSo you need to override __eq__, and the other five comparison operators: __ne__, __lt__, __le__, __gt__, __ge__.
Using the built-in @total_ordering decorator from the functools module reduces the work.
You only need to override __eq__ plus one of __lt__, __le__, __gt__, __ge__.
from functools import total_ordering
@total_orderingclass Square: """正方形
:param length: 边长 """
def __init__(self, length): self.length = length
def area(self): return self.length ** 2
def __eq__(self, other): if isinstance(other, self.__class__): return self.length == other.length return False
def __lt__(self, other): if isinstance(other, self.__class__): return self.length < other.length return NotImplementedWith this, the Square class can be compared normally.
Iterable objects (iterable) and iterators (iterator)
Reading Python 工匠: 区分迭代器与可迭代对象
On 2024.07.14
What is an iterable
The official docs define an iterable here: official definition
An object capable of returning its members one at a time. Examples of iterables include all sequence types (such as list, str, and tuple) and some non-sequence types like dict, file objects, and objects of any classes you define with an
__iter__()method or with a__getitem__()method that implements sequence semantics.
So an iterable is an object type that can return one member at a time.
Each iteration yields one member, until the iterable is exhausted.
Obviously list, str, and tuple satisfy those conditions.
The docs also list dict and file objects as iterables, which seems fine (I don’t know the exact details).
And you can make any class iterable by defining an __iter__() method.
I originally thought a set should not be iterable, because it’s unordered.
But when I happened to read this: Python Documentation: Howto - Functional, I learned that sets are iterable too.
I looked it up: a set doesn’t guarantee the same order across iterations. The order is determined by hash values, and can change when the set changes.
Different from an iterator
Now that we know what an iterable is, let’s try iterating with next().
>>> # 定义一个简单 list>>> l = [4, 3, 2, 1]
>>> # 看看这个列表>>> l[4, 3, 2, 1]>>> # 嗯,没问题
>>> # 迭代一下>>> next(l)Traceback (most recent call last): File "<stdin>", line 1, in <module>TypeError: 'list' object is not an iterator>>> # 报错: list 不是迭代器Why doesn’t it work?
Back to the docs: the official definition of an iterable also has a second paragraph:
Iterables can be used in a for loop and in many other places where a sequence is needed (zip(), map(), …). When an iterable object is passed as an argument to the built-in function iter(), it returns an iterator for the object. This iterator is good for one pass over the set of values. When using iterables, it is usually not necessary to call iter() or deal with iterator objects yourself. The for statement does that automatically for you, creating a temporary unnamed variable to hold the iterator for the duration of the loop.
- Iterables can be used in loops and other places that need sequences.
- If you pass an iterable to
iter(), it returns aniteratorfor that iterable.- Iterators are suitable for one-time use.
- When using iterables, you usually don’t need to call
iter()or deal with iterators manually.- For example, in a
forloop, Python automatically manages the iterator for you.
Therefore, the common loop for i in range(4): is not iterating range(4), but iterating the iterator returned by iter(range(4)).
Python’s for loop just automates this process.
Let’s verify in the REPL:
>>> # 接着刚才的 list>>> l[4, 3, 2, 1]
>>> # 对 list 对象调用 iter()>>> iter(l)<list_iterator object at 0x7faffaac46d0>>>> # 创建了一个 list_iterator 对象
>>> # 对 iterator 使用 next()>>> next(iter(l))4>>> # 成功拿到 list 第一个值If you keep calling next() on an iterator, you can get values from the list in order.
>>> l[4, 3, 2, 1]>>> iter(l)<list_iterator object at 0x7faaf1ac46d0>>>> next(iter(l))4>>> next(iter(l))4>>> next(iter(l))4>>> next(iter(l))4>>>You might expect 4, 3, 2, 1, but next(iter(l)) keeps returning the first value.
That’s because iter(l) inside next(iter(l)) creates a new iterator each time.
So every time you call next(iter(l)), you’re calling next() on a fresh iterator, which can only return the first value.
You can see it in the REPL:
>>> iter(l)<list_iterator object at 0x7faaf1ae4430>>>> iter(l)<list_iterator object at 0x7faaf1ac46d0>>>> iter(l)<list_iterator object at 0x7faaf1c2d240>Calling iter(l) three times returns three different list_iterator objects (different addresses after at).
So if you want next() to walk through the whole iterator, you need an intermediary:
>>> # 可迭代对象>>> l[4, 3, 2, 1]
>>> # 调用 next 会报错>>> next(l)Traceback (most recent call last): File "<stdin>", line 1, in <module>TypeError: 'list' object is not an iterator
>>> # 把迭代器对象赋值给 l_iterator>>> l_iterator = iter(l)
>>> # next 它>>> next(l_iterator)4>>> next(l_iterator)3>>> next(l_iterator)2>>> next(l_iterator)1>>> next(l_iterator)Traceback (most recent call last): File "<stdin>", line 1, in <module>StopIterationNow you can get all values from the iterator.
Generators and generator expressions
Iterators seem convenient, but they’re one-time consumables. If you need a bunch of setup every time, it defeats the purpose.
So Python provides two easy ways to use iterators:
-
Generators
Useyieldinstead ofreturnin a function. Then the function can prepare asequence, andnext()can pull values out. -
Generator expressions
A common example isfor i in range(4):.Python Documentation: generator expression gives another example:
>>> sum(i*i for i in range(10)) # sum of squares 0, 1, 4, ... 81285
How to count elements in an iterator
For a normal list, you can do len(list) to get element count.
But an iterator is lazy; you can’t know how many elements it has without iterating through it.
And once you iterate through it, the iterator is exhausted.
So counting it doesn’t seem that meaningful (
An
iteratoris a one-time consumable. Use it and throw it away.
If you need to loop multiple times, use a normal object like a list.
Ref:
Python Documentation: iterable
Python Documentation: iterator
Python Documentation: generator
Python Documentation: generator expression
Functional programming
Reading Python Documentation: Functional Programming HOWTO
On 2024.07.15
TL;DR: the last section recommends not using lambda expressions
Fredrik Lundh once suggested the following set of rules for refactoring uses of lambda:
- Write a lambda function.
- Write a comment explaining what the heck that lambda does.
- Study the comment for a while, and think of a name that captures the essence of the comment.
- Convert the lambda to a def statement, using that name.
- Remove the comment.
I really like these rules, but you’re free to disagree about whether this lambda-free style is better.
Using lambda expressions may save you two lines of code, but the comprehension cost skyrockets.
From the perspective of readability and comprehension cost, functional programming is not recommended. Just use def and for loops in a plain way.
The intro part of this article is nice. It says programming languages have roughly four schools of thought for solving problems:
Procedual: procedural, e.g. C, Pascal, Unix Shell
Declarative: SQL
Object-oriented: OOP
Functional: functional programming
Python is very flexible: you can do procedural, object-oriented, or functional, depending on the scenario.
This article includes many examples about iterables, iterators, and generators, which also served as a review for the previous section.
I also saw a few functions that might be useful:
-
filter(predicate, iter)
Same effect as the generator expression(x for x in range(10) if is_even(x)) -
enumerate(iter, start=0)
Often used to get index and value at the same time -
any(iter) and all(iter)
Check booleans in an iterator -
zip(iterA, iterB, …, strict=False)
Similar to pivoting rows/columns in Excel. By default, it uses the shortest iterator as the length.
If you want to strictly require all iterators to have the same length, setstrict=True. -
itertools.count(start, step)
Returns an infinite iterator -
itertools.repeat(elem, [n])
Repeat elem n times -
itertools.islice(iter, [start], stop, [step])
Slicing. One arg is stop; two args are start and stop; three args are start, stop, step -
itertools.tee(iter, [n])
Copy one iterator into n iterators. I looked it up: seems used in data analysis, log analysis, etc.
If you need to analyze the same dataset multiple times, tee can split it into multiple iterators for separate analyses. -
itertools.filterfalse(predicate, iter)
The opposite of filter -
itertools.dropwhile(predicate, iter)
Literally what it says -
itertools.combinations(iterable, r)
Combinatoricsitertools.combinations([1, 2, 3, 4, 5], 2) =>
(1, 2), (1, 3), (1, 4), (1, 5),
(2, 3), (2, 4), (2, 5),
(3, 4), (3, 5),
(4, 5)
After reading it, I thought it was pretty interesting.
One more: itertools.product(*iterables, repeat=1) is basically nested loops.
Tools commonly used in large projects
Reading 图书版 Python 工匠
13.1 开发大型项目常用工具介绍section
On 2024.07.16
Linter:
- flake8: checks code style, customizable
- black: stricter style checker, almost not customizable
- isort: sorts import statements
- pre-commit: enforces checks before commit
- mypy: static type checking with type annotations
Four common built-in container types
Reading Python 工匠:4. 容器的门道
On 2024.07.11
In Python, the four most common built-in container types are: list, tuple, dict, set.
- List: ordered, mutable, allows duplicates. Suitable for storing/accessing data in order.
- Tuple: ordered, immutable, allows duplicates. Suitable for storing immutable data.
- Dictionary: ordered (Python 3.7+), mutable, unique keys. Suitable for fast key lookup.
- Set: unordered, mutable, no duplicates. Suitable for unique elements and set operations (intersection/union).
Python 工匠:写更快的代码
-
Avoid frequently growing lists / creating new lists
- Use iterators more:
yield, generator expressions - Prefer lazy objects provided by modules:
- Use
re.finditerinstead ofre.findall - Use iterable file objects directly:
for line in fpinstead offor line in fp.readlines()
- Use
- Use iterators more:
-
Use deque when you often operate at the head of a list
A list is implemented as an array. When you insert at the head (
list.insert(0, item)), all later elements must be moved, so the time complexity isO(n). This makes head insertion much slower than appending at the tail (list.append(item)isO(1)).If your code needs to do this many times, consider using collections.deque instead. deque is a double-ended queue; whether you append at the head or tail, it’s
O(1). -
Use sets/dicts to test membership
When you need to check whether an element exists in a container, sets are better than lists.
item in [...]isO(n), whileitem in {...}isO(1). That’s because dicts and sets are implemented as hash tables.Hint: strongly recommend reading TimeComplexity - Python Wiki for time complexity details of common container types.
If you’re interested in dict implementation details, also strongly recommend Raymond Hettinger’s talk: Modern Dictionaries(YouTube)
Edge cases, and the walrus operator
Reading Python 工匠:15. 在边界处思考
Reading Python 工匠:16. 语句、表达式和海象操作符
On 2024.07.16
The book version of Python 工匠 does not include these two chapters.
Thinking at the boundaries
The author thinks it’s best to use language features to handle edge cases in a more elegant way. Concretely in Python:
- Use
tryto catch exceptions rather thanif elsefor edge cases
The Python community prefersEAFP (sEasier to Ask for Forgiveness than Permission)overLBYL (Look Before You Leap), because raising exceptions is lightweight in Python.
The author recommends: Write Cleaner Python: Use Exceptions - Handling missing items in containers
collections.defaultdicthandles missing keys in dicts
Usesetdefaultto read and update dict values
Usedict.popto delete missing keys
List slicing doesn’t raise IndexError, so you don’t need boundary handling - When using
orshort-circuit to set defaults, besidesNone, empty containers are alsoFalse, so usingifis more accurate for boundary handling - Don’t hand-roll user input validation; use pydantic or your framework’s built-in validators
- Use
%,abs(),math.floor()well
Statements, expressions, and the walrus operator
The walrus operator combines assignment and expressions.
It can be used in conditionals, loops, and list comprehensions, and helps reduce duplicated code.
But it also increases comprehension cost, so you shouldn’t blindly chase conciseness.
The walrus operator feels a bit like functional programming: concise, but higher comprehension cost.