Functional Programming
With toolz
I'm assuming you are well aware of the builtin functools
module. You may have come across toolz.
Until recently, I used it almost exclusively for get-in and groupby
I recently rediscovered it for functional programming. Here are some examples of how it can be useful.
Created: 2021-04-30
Toolz
Installation
pip install toolz
There is also cytoolz
which is toolz
written in Cython. When I first picked up toolz
around Python 3.4 it provided significant performance improvements. Recently (Python 3.8) I've gotten nearly even performance.
Data Pipeline
For this example, I'll use the task of extracting data from HTML, a.k.a.
I'm using Scrape This Site as an example
As you can see, we have 250 countries that we'd like to extract. If we think about our general steps it should look something like:
- Select a Country Element
Select Country Name Element
- Get Country Name Text
Select Country Info Element
- Select Capital Element
- Get Capital Text
- Select Capital Element
etc...
These are simple operations - selecting or getting. Here's how we could define them with toolz
and BeautifulSoup
from toolz import curry, excepts, compose_left
@curry
def select(element, sel, method):
if method == 'one':
return element.select_one(sel)
return element.select(sel)
@curry
def get_text(element):
return element.text.strip()
@curry
def cast_to(x, to_type):
return to_type(x)
@curry
def for_each(coll, func):
return [func(c) for c in coll]
select_all = select(method='all')
select_one = select(method='one')
select_countries = select(sel="div.country")
to_int = cast_to(to_type=int)
to_float = cast_to(to_type=float)
get_country_name = compose_left(select_one(sel="h3.country-name"), get_text)
get_country_info = select_one(sel="div.country-info")
get_country_capital = compose_left(get_country_info, select_one(sel="span.country-capital"), get_text)
get_country_pop = compose_left(get_country_info, select_one(sel="span.country-population"), get_text, to_int)
get_country_area = compose_left(get_country_info, select_one(sel="span.country-area"), get_text, to_float)
I love how succinct it is. We can pipe an input through several functions in 1-2 lines.
So how do we handle exceptions? What if an element does not exist?
# Apply excepts where ever it makes sense.
@curry
def in_case(ex, func, handler=lambda _: None):
return excepts(ex, func, handler)
get_country_name = in_case(AttributeError, compose_left(select_one(sel="h3.country-name"), get_text))
get_country_info = select_one(sel="div.country-info")
get_country_capital = in_case(AttributeError,
compose_left(get_country_info, select_one(sel="span.country-capital"), get_text))
get_country_pop = in_case((AttributeError, ValueError),
compose_left(get_country_info, select_one(sel="span.country-population"), get_text, to_int))
get_country_area = in_case(AttributeError, ValueError,
compose_left(get_country_info, select_one(sel="span.country-area"), get_text, to_float))