How to Install

Installing the package

The package can be installed from pyPI using pip:

pip install pandex

The only dependency is pandas which will be installed if not already present.

The package is used as follows:

import pandex as pd

Once this has been imported, pd now references the standard pandas and can be used in the normal way. The extension functionality resides in the pd.ext namespace.

Installing Extensions

Extensions are simply normal functions that take a DataFrame as the first parameter, along with any other arguments needed. So, an example extension might look like:

import numpy as np

def my_extension(df, arg1, arg2):
    # do something to df ...
    df['col2'] = df.col1 * arg2
    df['col3'] = np.where(df.col2==arg1, 'foo', 'bar')

No special return value is needed - the dataframe will be modified in place when the extension is executed, but of course you can return a value if required.

They can be pre-existing, or written specially. The function needs to be located on the local filesystem somewhere or in a publicly accessible GitHub repository

To install an extension located in a file in a directory bar you simply import:

pd.ext.import_extension('path/to/bar -> my_extension')

You can also organise extensions into collection namespaces to make them easier to manage. This is done during import:

pd.ext.import_extension('path/to/bar -> collection1.my_extension')

If an extension was located in a GitHub repository, then it can be imported as follows:

pd.ext.import_extension('github:username/repo -> my_extension')

Extensions from GitHub can also be organised into collections if required.

Note, this could be issued from the python interpreter, or included in a script. If an extension is already installed, no action is taken.

Accessing installed extensions

Once installed, the extension are accessed from the ext namespace attached to the dataframe:

import pandex as pd
df = pd.DataFrame({'col1': [3,4,5]})

# call the extension
df.ext.my_extension(10, 2)

# if the extension was installed in a collection ...
df.ext.ext.collection1.my_extension(10, 2)