Random Baat: Python trick

The following tricks I find pretty useful in my daily Python work. I also added a few I stumbled upon lately.

1. Use collections

This really makes your code more elegant and less verbose, a few examples I absorbed this week:

Named tuples:


>>> Point = collections.namedtuple('Point', ['x', 'y'])
>>> p = Point(x=1.0, y=2.0)
>>> p
Point(x=1.0, y=2.0)

Now you can index by keyword, much nicer than offset into tuple by number (less readable)


>>> p.x
1.0
>>> p.y

Elegantly used when looping through a csv:


with open('stock.csv') as f:
    f_csv = csv.reader(f)
 headings = next(f_csv)
 Row = namedtuple('Row', headings)
 for r in f_csv:
 row = Row(*r) # note the star extraction
 # ... process row ...

I like the unpacking star feature to throw away useless fields:


line = 'nobody:*:-2:-2:Unprivileged User:/var/empty:/usr/bin/false'
>>> uname, *fields, homedir, sh = line.split(':')
>>> uname
'nobody'
>>> homedir
'/var/empty'
>>> sh
'/usr/bin/false'

Superconvenient: the defaultdict:


from collections import defaultdict
rows_by_date = defaultdict(list)
for row in rows:
    rows_by_date[row['date']].append(row)",

Before I would init the list each time which leads to needless code:

if row['date'] not in rows_by_date:


 rows_by_date[row['date']] = []

You can use OrderedDict to leave the order of inserted keys:


>>> import collections
>>> d = collections.OrderedDict()
>>> d['a'] = 'A'
>>> d['b'] = 'B'
>>> d['c'] = 'C'
>>> d['d'] = 'D'
>>> d['e'] = 'E'
>>> for k, v in d.items():
...     print k, v
... 
a A
b B
c C
d D
e E

Another nice one is Counter:

from collections import Counter


words = [
  'look', 'into', 'my', 'eyes', 'look', 'into', 'my', 'eyes',
  'the', 'eyes', 'the', 'eyes', 'the', 'eyes', 'not', 'around', 'the',
  'eyes', ""don't"", 'look', 'around', 'the', 'eyes', 'look', 'into',
  'my', 'eyes', ""you're"", 'under'
]
word_counts = Counter(words)
top_three = word_counts.most_common(3)
print(top_three)
# Outputs [('eyes', 8), ('the', 5), ('look', 4)]",

Again, before I would write most_common manually. Not necessary, this is all done already somewhere in the stdlib :)

2. sorted() accepts a key arg which you can use to sort on something else

Here for example we sort on surname:


>>> sorted(names, key=lambda name: name.split()[-1].lower())
['Ned Batchelder', 'David Beazley', 'Raymond Hettinger', 'Brian Jones']

3. Create XMl from dict

Creating XML tags manually is usually a bad idea, I bookmarked this simple dict_to_xml helper:


from xml.etree.ElementTree import Element
def dict_to_xml(tag, d):
    '''
    Turn a simple dict of key/value pairs into XML
    '''
    elem = Element(tag)
    for key, val in d.items():
        child = Element(key)
        child.text = str(val)
        elem.append(child)
    return elem"

4. Oneliner to see if there are any python files in a particular directory

Sometimes ‘any’ is pretty useful:


import os
files = os.listdir('dirname')
if any(name.endswith('.py') for name in files):
5. Use set operations to match common items in lists
>>> a = [1, 2, 3, 'a']
>>> b = ['a', 'b', 'c', 3, 4, 5]
>>> set(a).intersection(b)
{3, 'a'}

6. Use re.compile

If you are going to check a regular expression in a loop, don’t do this:


for i in longlist:
  if re.match(r'^...', i)
yet define the regex once and use the pattern:
p = re.compile(r'^...')
for i in longlist: 
  if p.match(i)

7. Printing files with potential bad (Unicode) characters

The book suggested to print filenames of unknown origin, use this convention to avoid errors:


def bad_filename(filename):
    return repr(filename)[1:-1]
try:
    print(filename)
except UnicodeEncodeError:
    print(bad_filename(filename))

Handling unicode chars in files can be nasty because they can blow up your script. However the logic behind it is not that hard to grasp. A good snippet to bookmark is the encoding / decoding of Unicode:


>>> a
'pýtĥöñ is awesome\n'
>>> b = unicodedata.normalize('NFD', a)
>>> b.encode('ascii', 'ignore').decode('ascii')
'python is awesome\n'

O’Reilly has a course on Working with Unicode in Python.

8. Print is pretty cool (Python 3)

I am probably not the only one writing this kind of join operations:


>>> row = ["1", "bob", "developer", "python"]
>>> print(','.join(str(x) for x in row))
1,bob,developer,python

Turns out you can just write it like this:


>>> print(*row, sep=',')
1,bob,developer,python
Note again the * unpacking.

9. Functions like sum() accept generators / use the right variable type

I wrote this at a conference to earn me a coffee mug ;)


sum = 0
for i in range(1300):
    if i % 3 == 0 or i % 5 == 0:
        sum += i
print(sum)

Returns 394118, while handing it in I realized this could be written much shorter and efficiently:


>>> sum(i for i in range(1300) if i % 3 == 0 or i % 5 == 0)
394118

A generator:


lines = (line.strip() for line in f)

is more memory efficient than:


lines = [line.strip() for line in f] # loads whole list into memory at once

And concatenating strings is inefficient:


s = "line1\n"
s += "line2\n"
s += "line3\n"
print(s)

Better build up a list and join when printing:


lines = []
lines.append("line1")
lines.append("line2")
lines.append("line3")
print("\n".join(lines))
Another one I liked from the cookbook:
portfolio = [
  {'name':'GOOG', 'shares': 50},
  {'name':'YHOO', 'shares': 75},
  {'name':'AOL', 'shares': 20},
  {'name':'SCOX', 'shares': 65}
]
min_shares = min(s['shares'] for s in portfolio)

One line to get the min of a numeric value in a nested data structure.

10. Enumerate lines in for loop

You can number lines (or whatever you are looping over) and start with 1 (2nd arg), this is a nice debugging technique


for lineno, line in enumerate(lines, 1): # start counting at 0
    fields = line.split()
    try:
        count = int(fields[1])
        ...
    except ValueError as e:
        print('Line {}: Parse error: {}'.format(lineno, e))

11. Pandas

Import pandas and numpy:


import pandas as pd
import numpy as np

12. Make random dataframe with three columns:


df = pd.DataFrame(np.random.rand(10,3), columns=list('ABC'))
Select:
# Boolean indexing (remember the parentheses)
df[(df.A < 0.5) & (df.B > 0.5)]
# Alternative, using query which depends on numexpr
df.query('A < 0.5 & B > 0.5')
Project:
# One columns
df.A
# Multiple columns
# there may be another shorter way, but I don't know it
df.loc[:,list('AB')]

Often used snippets
Dates

13. Difference (in days) between two dates:

from datetime import date


d1 = date(2013,1,1)
d2 = date(2013,9,13)
abs(d2-d1).days
directory-of-script snippet
os.path.dirname(os.path.realpath(__file__))
# combine with
os.path.join(os.path.dirname(os.path.realpath(__file__)), 'foo','bar','baz.txt')

14. PostgreSQL-connect-query snippet


import psycopg2
conn = psycopg2.connect("host='localhost' user='xxx'  password='yyy' dbname='zzz'")
cur = conn.cursor()
cur.execute("""SELECT * from foo;""")
rows = cur.fetchall()
for row in rows:
  print "   ", row[0]
conn.close()
Input parsing functions

15. Expand input-file args:


# input_data: e.g. 'file.txt' or '*.txt' or 'foo/file.txt' 'bar/file.txt'
filenames = [glob.glob(pathexpr) for pathexpr in input_data]
filenames = [item for sublist in filenames for item in sublist]

15. Parse key-value pair strings like ‘x=42.0,y=1’:


kvp = lambda elem,t,i: t(elem.split('=')[i])
parse_kvp_str = lambda args : dict([(kvp(elem,str,0), kvp(elem,float,1)) for elem in args.split(',')])
parse_kvp_str('x=42.0,y=1')

Postgres database functions

16. Upper case in Python (just for example):


-- create extension plpythonu;
CREATE OR REPLACE FUNCTION python_upper
(
  input text
) RETURNS text AS
$$
  return input.upper()
$$ LANGUAGE plpythonu STRICT;

17. Convert IP address from text to integer:


CREATE FUNCTION ip2int(input text) RETURNS integer
LANGUAGE plpythonu
AS $$
if 'struct' in SD:
    struct = SD['struct']
else:
    import struct
    SD['struct'] = struct
if 'socket' in SD:
    socket = SD['socket']
else:
    import socket
    SD['socket'] = socket
return struct.unpack("!I", socket.inet_aton(input))[0]
$$;
Convert IP address from integer to text:
CREATE FUNCTION int2ip(input integer) RETURNS text
LANGUAGE plpythonu
AS $$
if 'struct' in SD:
    struct = SD['struct']
else:
    import struct
    SD['struct'] = struct
if 'socket' in SD:
    socket = SD['socket']
else:
    import socket
    SD['socket'] = socket
return socket.inet_ntoa(struct.pack("!I", input))
$$;

18. Commandline options


optparse-commandline-options snippet
from optparse import OptionParser
usage = "usage: %prog [options] arg "
parser = OptionParser(usage=usage)
parser.add_option("-x", "--some-option-x", dest="x", default=42.0, type="float",
                  help="a floating point option")
(options, args) = parser.parse_args()
print options.x
print args[0]

19. print-in-place (progress bar) snippet


import time
import sys
for progress in range(100):
  time.sleep(0.1)
  sys.stdout.write("Download progress: %d%%   \r" % (progress) ) 
  sys.stdout.flush()

Packaging snippets

20. poor-mans-python-executable trick

Learned this trick from voidspace. The trick uses two files (__main__.py and hashbang.txt):


__main__.py:
print 'Hello world'
hashbang.txt (adding a newline after ‘python2.6’ is important):
#!/usr/bin/env python2.6
Build an “executable”:
zip main.zip __main__.py
cat hashbang.txt main.zip > hello
rm main.zip
chmod u+x hello
Run “executable”:
$ ./hello
Hello world

21. import-class-from-file trick

Import class MyClass from a module file (adapted from stackoverflow):


import imp
mod = imp.load_source('name.of.module', 'path/to/module.py')
obj = mod.MyClass()

22. Occusional-usage snippets

Extract words from string


words = lambda text: ''.join(c if c.isalnum() else ' ' for c in text).split()
words('Johnny.Appleseed!is:a*good&farmer')
# ['Johnny', 'Appleseed', 'is', 'a', 'good', 'farmer']

23. IP address to integer and back


import struct
import socket
def ip2int(addr):                                                               
    return struct.unpack("!I", socket.inet_aton(addr))[0]                       
def int2ip(addr):                                                               
    return socket.inet_ntoa(struct.pack("!I", addr))

24. Fluent Python Interface

Copied from riaanvddool.


# Fluent Interface Definition
class sql:
  class select:
    def __init__(self, dbcolumn, context=None):
      self.dbcolumn = dbcolumn
      self.context = context
    def select(self, dbcolumn):
      return self.__class__(dbcolumn,self)
# Demo
q = sql.select('foo').select('bar')
print q.dbcolumn  #bar
print q.context.dbcolumn  #foo
Flatten a nested lists
def flatten(elems):
 """
 [['a'], ['b','c',['d'],'e',['f','g']]]
 """
 stack = [elems]
 top = stack.pop()
 while top:  
  head, tail = top[0], top[1:]
  if tail: stack.append(tail)
  if not isinstance(head, list): yield head   
  else: stack.append(head)
  if stack: top = stack.pop()
  else: break
snap rounding
EPSILON = 0.000001
snap_ceil = lambda x: math.ceil(x) if abs(x - round(x)) > EPSILON else round(x)
snap_floor = lambda x: math.floor(x) if abs(x - round(x)) > EPSILON else round(x)
merge-two-dictionaries snippet
x = {'a': 42}
y = {'b': 127}
z = dict(x.items() + y.items())
# z = {'a': 42, 'b': 127}

25. anonymous-object snippet

Adapted from stackoverflow:


class Anon(object):
    def __new__(cls, **attrs):
        result = object.__new__(cls)
        result.__dict__ = attrs
        return result

26. Alternative:


class Anon(object):
    def __init__(self, **kwargs):
        self.__dict__.update(kwargs)
    def __repr__(self):
        return self.__str__()
    def __str__(self):
        return ", ".join(["%s=%s" % (key,value) for key,value in self.__dict__.items()])

27. generate-random-word snippet

Function that returns a random word (could also use random.choicewith this list of words):


import string, random
randword = lambda n: "".join([random.choice(string.letters) for i in range(n)])
setdefault tricks
Increment (and initialize) value:
d = {}
d[2] = d.setdefault(2,39) + 1
d[2] = d.setdefault(2,39) + 1
d[2] = d.setdefault(2,39) + 1
d[2] # value is 42

29. Append value to (possibly uninitialized) list stored under a key in dictionary:


d = {}
d.setdefault(2, []).append(42)
d.setdefault(2, []).append(127)
d[2] # value is [42, 127]

Binary tricks

30. add-integers-using-XOR snippet

Swap two integer variables using the XOR swap algorithm:


x = 42
y = 127
x = x ^ y
y = y ^ x
x = x ^ y
x # value is 127
y # value is 42

Random Baat

Monday, 9 October 2017

Python trick

No comments:

Post a Comment