Writing Extensions

The pandex package allows any function which operates on a dataframe to be installed as an extension. Functions may be stored in individual .py files, local directories or modules or on GitHub.

The only requirement is that the function accept a pandas DataFrame as the first argument.

Example extension

The following function is a valid pandex extension:

from math import pi

def circle_calculations(df, radius='radius'):
    """
    Calculates the circumference and area of a circle
    given a column of radius and adds the result to the
    dataframe
    Input:
        df -- dataframe
        radius -- column name containing the radius values
    """
    df['circumference'] = 2 * pi * df[radius]
    df['area'] = pi * df[radius] ** 2

Note

The first argument to the function need not be named df.

If this file is stored in a file named circle_calc.py, then the extension may be installed as follows:

pd.ext.import_extension('/path/to/circle_calc_file -> circle_calculations')

Note

circle_calc.py does not have to be in /path/to/circle_calc_file itself - as long as it’s in a subdirectory somewhere it will get installed.

Extensions may make use of any installed package (dependencies that are not installed with be highlighted with a warning during import, and also in the output of pd.ext .show_extensions()).

Extension usage

If the user has created a dataframe named df, then the extension can be accessed as follows:

df.ext.circle_calculations()

The docstring defined will be available for the dataframe df:

help(df.ext.circle_calculations)

Extension dependencies

If an extension requires supporting functions, these can be located in the same file. So the example above may be rewritten to utilise an area() function:

from math import pi

def circle_calculations(df, radius='radius'):
    """
    Calculates the circumference and area of a circle
    given a column of radius and adds the result to the
    dataframe
    Input:
        df -- dataframe
        radius -- column name containing the radius values
    """
    df['circumference'] = 2 * pi * df[radius]
    df['area'] = area(df[radius])

def area(radius_col):
    return pi * radius_col ** 2

The installation and usage remains as before.

Dependencies in an external file

If the extension depends on functions defined in another file these must be imported using the relative dot notation as you would for modules.

So, if the area() function was defined in a file named area.py in the same directory as circle_calc.py then the extension would be written as follows:

from math import pi
from .area import area

def circle_calculations(df, radius='radius'):
    """
    Calculates the circumference and area of a circle
    given a column of radius and adds the result to the
    dataframe
    Input:
        df -- dataframe
        radius -- column name containing the radius values
    """
    df['circumference'] = 2 * pi * df[radius]
    df['area'] = area(df[radius])

The pd.ext .import_extension above remains the same - all the correct files will be installed.