Monday, 9 October 2017

Python trick


The following tricks I find pretty useful in my daily Python work. I also added a few I stumbled upon lately.
1. Use collections
This really makes your code more elegant and less verbose, a few examples I absorbed this week:
Named tuples:
  1. >>> Point = collections.namedtuple('Point', ['x', 'y'])
  2. >>> p = Point(x=1.0, y=2.0)
  3. >>> p
  4. Point(x=1.0, y=2.0)
Now you can index by keyword, much nicer than offset into tuple by number (less readable)
  1. >>> p.x
  2. 1.0
  3. >>> p.y
Elegantly used when looping through a csv:
  1. with open('stock.csv') as f:
  2. f_csv = csv.reader(f)
  3. headings = next(f_csv)
  4. Row = namedtuple('Row', headings)
  5. for r in f_csv:
  6. row = Row(*r) # note the star extraction
  7. # ... process row ...
I like the unpacking star feature to throw away useless fields:
  1. line = 'nobody:*:-2:-2:Unprivileged User:/var/empty:/usr/bin/false'
  2. >>> uname, *fields, homedir, sh = line.split(':')
  3. >>> uname
  4. 'nobody'
  5. >>> homedir
  6. '/var/empty'
  7. >>> sh
  8. '/usr/bin/false'
Superconvenient: the defaultdict:
  1. from collections import defaultdict
  2. rows_by_date = defaultdict(list)
  3. for row in rows:
  4. rows_by_date[row['date']].append(row)",
Before I would init the list each time which leads to needless code:
if row['date'] not in rows_by_date:
  1. rows_by_date[row['date']] = []
You can use OrderedDict to leave the order of inserted keys:
  1. >>> import collections
  2. >>> d = collections.OrderedDict()
  3. >>> d['a'] = 'A'
  4. >>> d['b'] = 'B'
  5. >>> d['c'] = 'C'
  6. >>> d['d'] = 'D'
  7. >>> d['e'] = 'E'
  8. >>> for k, v in d.items():
  9. ... print k, v
  10. ...
  11. a A
  12. b B
  13. c C
  14. d D
  15. e E
Another nice one is Counter:
from collections import Counter
  1. words = [
  2. 'look', 'into', 'my', 'eyes', 'look', 'into', 'my', 'eyes',
  3. 'the', 'eyes', 'the', 'eyes', 'the', 'eyes', 'not', 'around', 'the',
  4. 'eyes', ""don't"", 'look', 'around', 'the', 'eyes', 'look', 'into',
  5. 'my', 'eyes', ""you're"", 'under'
  6. ]
  7. word_counts = Counter(words)
  8. top_three = word_counts.most_common(3)
  9. print(top_three)
  10. # Outputs [('eyes', 8), ('the', 5), ('look', 4)]",
Again, before I would write most_common manually. Not necessary, this is all done already somewhere in the stdlib :)
2. sorted() accepts a key arg which you can use to sort on something else
Here for example we sort on surname:
  1. >>> sorted(names, key=lambda name: name.split()[-1].lower())
  2. ['Ned Batchelder', 'David Beazley', 'Raymond Hettinger', 'Brian Jones']
3. Create XMl from dict
Creating XML tags manually is usually a bad idea, I bookmarked this simple dict_to_xml helper:
  1. from xml.etree.ElementTree import Element
  2. def dict_to_xml(tag, d):
  3. '''
  4. Turn a simple dict of key/value pairs into XML
  5. '''
  6. elem = Element(tag)
  7. for key, val in d.items():
  8. child = Element(key)
  9. child.text = str(val)
  10. elem.append(child)
  11. return elem"
4. Oneliner to see if there are any python files in a particular directory
Sometimes ‘any’ is pretty useful:
  1. import os
  2. files = os.listdir('dirname')
  3. if any(name.endswith('.py') for name in files):
  4. 5. Use set operations to match common items in lists
  5. >>> a = [1, 2, 3, 'a']
  6. >>> b = ['a', 'b', 'c', 3, 4, 5]
  7. >>> set(a).intersection(b)
  8. {3, 'a'}
6. Use re.compile
If you are going to check a regular expression in a loop, don’t do this:
  1. for i in longlist:
  2. if re.match(r'^...', i)
  3. yet define the regex once and use the pattern:
  4. p = re.compile(r'^...')
  5. for i in longlist:
  6. if p.match(i)
7. Printing files with potential bad (Unicode) characters
The book suggested to print filenames of unknown origin, use this convention to avoid errors:
  1. def bad_filename(filename):
  2. return repr(filename)[1:-1]
  3. try:
  4. print(filename)
  5. except UnicodeEncodeError:
  6. print(bad_filename(filename))
Handling unicode chars in files can be nasty because they can blow up your script. However the logic behind it is not that hard to grasp. A good snippet to bookmark is the encoding / decoding of Unicode:
  1. >>> a
  2. 'pýtĥöñ is awesome\n'
  3. >>> b = unicodedata.normalize('NFD', a)
  4. >>> b.encode('ascii', 'ignore').decode('ascii')
  5. 'python is awesome\n'
O’Reilly has a course on Working with Unicode in Python.
8. Print is pretty cool (Python 3)
I am probably not the only one writing this kind of join operations:
  1. >>> row = ["1", "bob", "developer", "python"]
  2. >>> print(','.join(str(x) for x in row))
  3. 1,bob,developer,python
Turns out you can just write it like this:
  1. >>> print(*row, sep=',')
  2. 1,bob,developer,python
  3. Note again the * unpacking.
9. Functions like sum() accept generators / use the right variable type
I wrote this at a conference to earn me a coffee mug ;)
  1. sum = 0
  2. for i in range(1300):
  3. if i % 3 == 0 or i % 5 == 0:
  4. sum += i
  5. print(sum)
Returns 394118, while handing it in I realized this could be written much shorter and efficiently:
  1. >>> sum(i for i in range(1300) if i % 3 == 0 or i % 5 == 0)
  2. 394118
A generator:
  1. lines = (line.strip() for line in f)
is more memory efficient than:
  1. lines = [line.strip() for line in f] # loads whole list into memory at once
And concatenating strings is inefficient:
  1. s = "line1\n"
  2. s += "line2\n"
  3. s += "line3\n"
  4. print(s)
Better build up a list and join when printing:
  1. lines = []
  2. lines.append("line1")
  3. lines.append("line2")
  4. lines.append("line3")
  5. print("\n".join(lines))
  6. Another one I liked from the cookbook:
  7. portfolio = [
  8. {'name':'GOOG', 'shares': 50},
  9. {'name':'YHOO', 'shares': 75},
  10. {'name':'AOL', 'shares': 20},
  11. {'name':'SCOX', 'shares': 65}
  12. ]
  13. min_shares = min(s['shares'] for s in portfolio)
One line to get the min of a numeric value in a nested data structure.
10. Enumerate lines in for loop
You can number lines (or whatever you are looping over) and start with 1 (2nd arg), this is a nice debugging technique
  1. for lineno, line in enumerate(lines, 1): # start counting at 0
  2. fields = line.split()
  3. try:
  4. count = int(fields[1])
  5. ...
  6. except ValueError as e:
  7. print('Line {}: Parse error: {}'.format(lineno, e))
11. Pandas
Import pandas and numpy:
  1. import pandas as pd
  2. import numpy as np
12. Make random dataframe with three columns:
  1. df = pd.DataFrame(np.random.rand(10,3), columns=list('ABC'))
  2. Select:
  3. # Boolean indexing (remember the parentheses)
  4. df[(df.A < 0.5) & (df.B > 0.5)]
  5. # Alternative, using query which depends on numexpr
  6. df.query('A < 0.5 & B > 0.5')
  7. Project:
  8. # One columns
  9. df.A
  10. # Multiple columns
  11. # there may be another shorter way, but I don't know it
  12. df.loc[:,list('AB')]
  • Often used snippets
  • Dates
13. Difference (in days) between two dates:
from datetime import date
  1. d1 = date(2013,1,1)
  2. d2 = date(2013,9,13)
  3. abs(d2-d1).days
  4. directory-of-script snippet
  5. os.path.dirname(os.path.realpath(__file__))
  6. # combine with
  7. os.path.join(os.path.dirname(os.path.realpath(__file__)), 'foo','bar','baz.txt')
14. PostgreSQL-connect-query snippet
  1. import psycopg2
  2. conn = psycopg2.connect("host='localhost' user='xxx' password='yyy' dbname='zzz'")
  3. cur = conn.cursor()
  4. cur.execute("""SELECT * from foo;""")
  5. rows = cur.fetchall()
  6. for row in rows:
  7. print " ", row[0]
  8. conn.close()
  9. Input parsing functions
15. Expand input-file args:
  1. # input_data: e.g. 'file.txt' or '*.txt' or 'foo/file.txt' 'bar/file.txt'
  2. filenames = [glob.glob(pathexpr) for pathexpr in input_data]
  3. filenames = [item for sublist in filenames for item in sublist]
15. Parse key-value pair strings like ‘x=42.0,y=1’:
  1. kvp = lambda elem,t,i: t(elem.split('=')[i])
  2. parse_kvp_str = lambda args : dict([(kvp(elem,str,0), kvp(elem,float,1)) for elem in args.split(',')])
  3. parse_kvp_str('x=42.0,y=1')
Postgres database functions
16. Upper case in Python (just for example):
  1. -- create extension plpythonu;
  2. CREATE OR REPLACE FUNCTION python_upper
  3. (
  4. input text
  5. ) RETURNS text AS
  6. $$
  7. return input.upper()
  8. $$ LANGUAGE plpythonu STRICT;
17. Convert IP address from text to integer:
  1. CREATE FUNCTION ip2int(input text) RETURNS integer
  2. LANGUAGE plpythonu
  3. AS $$
  4. if 'struct' in SD:
  5. struct = SD['struct']
  6. else:
  7. import struct
  8. SD['struct'] = struct
  9. if 'socket' in SD:
  10. socket = SD['socket']
  11. else:
  12. import socket
  13. SD['socket'] = socket
  14. return struct.unpack("!I", socket.inet_aton(input))[0]
  15. $$;
  16. Convert IP address from integer to text:
  17. CREATE FUNCTION int2ip(input integer) RETURNS text
  18. LANGUAGE plpythonu
  19. AS $$
  20. if 'struct' in SD:
  21. struct = SD['struct']
  22. else:
  23. import struct
  24. SD['struct'] = struct
  25. if 'socket' in SD:
  26. socket = SD['socket']
  27. else:
  28. import socket
  29. SD['socket'] = socket
  30. return socket.inet_ntoa(struct.pack("!I", input))
  31. $$;
18. Commandline options
  1. optparse-commandline-options snippet
  2. from optparse import OptionParser
  3. usage = "usage: %prog [options] arg "
  4. parser = OptionParser(usage=usage)
  5. parser.add_option("-x", "--some-option-x", dest="x", default=42.0, type="float",
  6. help="a floating point option")
  7. (options, args) = parser.parse_args()
  8. print options.x
  9. print args[0]
19. print-in-place (progress bar) snippet
  1. import time
  2. import sys
  3. for progress in range(100):
  4. time.sleep(0.1)
  5. sys.stdout.write("Download progress: %d%% \r" % (progress) )
  6. sys.stdout.flush()
Packaging snippets
20. poor-mans-python-executable trick
Learned this trick from voidspace. The trick uses two files (__main__.py and hashbang.txt):
  1. __main__.py:
  2. print 'Hello world'
  3. hashbang.txt (adding a newline after python2.6 is important):
  4. #!/usr/bin/env python2.6
  5. Build an executable”:
  6. zip main.zip __main__.py
  7. cat hashbang.txt main.zip > hello
  8. rm main.zip
  9. chmod u+x hello
  10. Run executable”:
  11. $ ./hello
  12. Hello world
21. import-class-from-file trick
Import class MyClass from a module file (adapted from stackoverflow):
  1. import imp
  2. mod = imp.load_source('name.of.module', 'path/to/module.py')
  3. obj = mod.MyClass()
22. Occusional-usage snippets
Extract words from string
  1. words = lambda text: ''.join(c if c.isalnum() else ' ' for c in text).split()
  2. words('Johnny.Appleseed!is:a*good&farmer')
  3. # ['Johnny', 'Appleseed', 'is', 'a', 'good', 'farmer']
23. IP address to integer and back
  1. import struct
  2. import socket
  3. def ip2int(addr):
  4. return struct.unpack("!I", socket.inet_aton(addr))[0]
  5. def int2ip(addr):
  6. return socket.inet_ntoa(struct.pack("!I", addr))
24. Fluent Python Interface
Copied from riaanvddool.
  1. # Fluent Interface Definition
  2. class sql:
  3. class select:
  4. def __init__(self, dbcolumn, context=None):
  5. self.dbcolumn = dbcolumn
  6. self.context = context
  7. def select(self, dbcolumn):
  8. return self.__class__(dbcolumn,self)
  9. # Demo
  10. q = sql.select('foo').select('bar')
  11. print q.dbcolumn #bar
  12. print q.context.dbcolumn #foo
  13. Flatten a nested lists
  14. def flatten(elems):
  15. """
  16. [['a'], ['b','c',['d'],'e',['f','g']]]
  17. """
  18. stack = [elems]
  19. top = stack.pop()
  20. while top:
  21. head, tail = top[0], top[1:]
  22. if tail: stack.append(tail)
  23. if not isinstance(head, list): yield head
  24. else: stack.append(head)
  25. if stack: top = stack.pop()
  26. else: break
  27. snap rounding
  28. EPSILON = 0.000001
  29. snap_ceil = lambda x: math.ceil(x) if abs(x - round(x)) > EPSILON else round(x)
  30. snap_floor = lambda x: math.floor(x) if abs(x - round(x)) > EPSILON else round(x)
  31. merge-two-dictionaries snippet
  32. x = {'a': 42}
  33. y = {'b': 127}
  34. z = dict(x.items() + y.items())
  35. # z = {'a': 42, 'b': 127}
25. anonymous-object snippet
Adapted from stackoverflow:
  1. class Anon(object):
  2. def __new__(cls, **attrs):
  3. result = object.__new__(cls)
  4. result.__dict__ = attrs
  5. return result
26. Alternative:
  1. class Anon(object):
  2. def __init__(self, **kwargs):
  3. self.__dict__.update(kwargs)
  4. def __repr__(self):
  5. return self.__str__()
  6. def __str__(self):
  7. return ", ".join(["%s=%s" % (key,value) for key,value in self.__dict__.items()])
27. generate-random-word snippet
Function that returns a random word (could also use random.choicewith this list of words):
  1. import string, random
  2. randword = lambda n: "".join([random.choice(string.letters) for i in range(n)])
  3. setdefault tricks
  4. Increment (and initialize) value:
  5. d = {}
  6. d[2] = d.setdefault(2,39) + 1
  7. d[2] = d.setdefault(2,39) + 1
  8. d[2] = d.setdefault(2,39) + 1
  9. d[2] # value is 42
29. Append value to (possibly uninitialized) list stored under a key in dictionary:
  1. d = {}
  2. d.setdefault(2, []).append(42)
  3. d.setdefault(2, []).append(127)
  4. d[2] # value is [42, 127]
Binary tricks
30. add-integers-using-XOR snippet
Swap two integer variables using the XOR swap algorithm:
  1. x = 42
  2. y = 127
  3. x = x ^ y
  4. y = y ^ x
  5. x = x ^ y
  6. x # value is 127
  7. y # value is 42

No comments:

Post a Comment