Efficient Python

Weiwei QI

2022/11/27

[] run python in RStudio

package required:

reticulate

reticulate::py_config()$version_string
## [1] "3.8.8 (default, Apr 13 2021, 15:08:03) [MSC v.1916 64 bit (AMD64)]"

[] instead of lst[-1]…

Suppose I want the first and last element of the list

lst = [x for x in range(10)]

I know you want to do:

first, last = lst[0], lst[-1]

print(first, last)
## 0 9

But you can also:

first, *_, last = lst # only works if you have a non-single list

print(first, last)
## 0 9

[] flatten your list

lst = [[1,2,3], [4,5], [6]]

sum(lst, []) # does not work if you have deeper layers

# list(chain(*lst))
## [1, 2, 3, 4, 5, 6]

[] good old map

map makes an iterator object

def square(x):
    return x * x
# map(square, range(10))
res = map(square, range(10))
# list(_) # underscore is the last REPL object! same as list(res) but the knitr id'd the wrong obj
list(res) 
## [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

map can be used with more than one argument:

def stradd(a, b):
  return f"{a}{b}"
list(map(stradd, "ABCDEFGHIJ", range(10)))
## ['A0', 'B1', 'C2', 'D3', 'E4', 'F5', 'G6', 'H7', 'I8', 'J9']

same result as this list comp

[stradd(x, y) for x, y in zip("ABCDEFGHIJ", range(10))]
## ['A0', 'B1', 'C2', 'D3', 'E4', 'F5', 'G6', 'H7', 'I8', 'J9']

[] f-string date format

from  datetime import datetime
print(datetime.now().strftime("%Y-%m-%d"))
## 2022-11-28
print(f"{datetime.now():%Y-%m-%d}")
## 2022-11-28

[] multi-var assignment (via Ned Batchelder)

You can assign multiple values at once

# equals shows the expression and the value; note this syntax a =  is only available for python 3.8+

a, b = 17, 42
print(f"{a = }, {b = }")
## a = 17, b = 42

You can use this to swap variables

a, b = b, a
print(f"{a = }, {b = }")
## a = 42, b = 17

it can be used to define an infinite fibanacci generator

def fib():
    a, b = 1, 1
    while True:
        yield a
        a, b = b, a+b

# get the first 12 items from the fib generator

import itertools
list(itertools.islice(fib(), 12))
## [1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144]

[] combine dictionaries

some_data = {"name": "Rodrigo", "email": None}
more_data = {"email": "@somesite.com"}

the dictionary on the right overrides data;

{**some_data, **more_data}
## {'name': 'Rodrigo', 'email': '@somesite.com'}

using |, new in Python 3.9

# I'm using 3.8.8 on this PC when writing, so...

# some_data | more_data

Updating a dict in place

# again, I am using 3.8.8 when writing this

# some_data |= more_data
# some_data

[] Pandas DF multiproc

[] itertuples is faster

import pandas as pd
import numpy as np

cols = np.random.rand(1_000_000, 2)

df = pd.DataFrame(data = {'a': cols[:, 0], 'b': cols[:, 1]})

df.shape
## (1000000, 2)
%%time

for idx, row in df.iterrows():
    pass
%%time

for tup in df.itertuples():
    pass