Writing Extensions¶
The pandex
package allows any function which operates on a dataframe to
be installed as an extension. Functions may be stored in individual .py
files, local directories or modules or on GitHub.
The only requirement is that the function accept a pandas DataFrame
as
the first argument.
Example extension¶
The following function is a valid pandex
extension:
from math import pi
def circle_calculations(df, radius='radius'):
"""
Calculates the circumference and area of a circle
given a column of radius and adds the result to the
dataframe
Input:
df -- dataframe
radius -- column name containing the radius values
"""
df['circumference'] = 2 * pi * df[radius]
df['area'] = pi * df[radius] ** 2
Note
The first argument to the function need not be named df
.
If this file is stored in a file named circle_calc.py
, then
the extension may be installed as follows:
pd.ext.import_extension('/path/to/circle_calc_file -> circle_calculations')
Note
circle_calc.py
does not have to be in
/path/to/circle_calc_file
itself - as long as it’s in
a subdirectory somewhere it will get installed.
Extensions may make use of any installed package (dependencies that are not
installed with be highlighted with a warning during import, and also in
the output of pd.ext
.show_extensions()
).
Extension usage¶
If the user has created a dataframe named df
, then the extension can be
accessed as follows:
df.ext.circle_calculations()
The docstring defined will be available for the dataframe df
:
help(df.ext.circle_calculations)
Extension dependencies¶
If an extension requires supporting functions, these can be located in the same
file. So the example above may be rewritten to utilise an
area()
function:
from math import pi
def circle_calculations(df, radius='radius'):
"""
Calculates the circumference and area of a circle
given a column of radius and adds the result to the
dataframe
Input:
df -- dataframe
radius -- column name containing the radius values
"""
df['circumference'] = 2 * pi * df[radius]
df['area'] = area(df[radius])
def area(radius_col):
return pi * radius_col ** 2
The installation and usage remains as before.
Dependencies in an external file¶
If the extension depends on functions defined in another file these must be imported using the relative dot notation as you would for modules.
So, if the area()
function was defined in a file named
area.py
in the same directory as circle_calc.py
then the
extension would be written as follows:
from math import pi
from .area import area
def circle_calculations(df, radius='radius'):
"""
Calculates the circumference and area of a circle
given a column of radius and adds the result to the
dataframe
Input:
df -- dataframe
radius -- column name containing the radius values
"""
df['circumference'] = 2 * pi * df[radius]
df['area'] = area(df[radius])
The pd.ext
.import_extension
above remains the same - all the
correct files will be installed.