dasi.utils.npdf

NumpyDataFrame.

Classes

Null

Not None.

NumpyDataFrame([data, apply])

The NumpyDataFrame is a class halfway between pandas and numpy.

NumpyDataFrameIndexer(df)

The indexer for NumpyDataFrames.

Exceptions

NumpyDataFrameException

Generic exceptions for NumpyDataFrame.

Classes

Null

Not None.

NumpyDataFrame([data, apply])

The NumpyDataFrame is a class halfway between pandas and numpy.

NumpyDataFrameIndexer(df)

The indexer for NumpyDataFrames.

Exceptions

NumpyDataFrameException

Generic exceptions for NumpyDataFrame.

class dasi.utils.npdf.Null[source]

Bases: object

Not None.

class dasi.utils.npdf.NumpyDataFrame(data=None, apply=None)[source]

Bases: collections.abc.Mapping

The NumpyDataFrame is a class halfway between pandas and numpy. It has named columns, indexing, slicing, function applications, and mathematical operations. Unlike pandas however, it maintains the multi-dimensionality of underlying data (as np.ndarray), allowing broadcasting and complex indexing.

Usage:

indexing and columns

All of the underlying arrays can be slices and indexed using the slicing operations. The native np.ndarray indexing is used for each column, meaning a view is returned with the same memory locations.

df = NumpyDataFrame({'A': np.arange(10), 'B': np.arange(10)})
df[0]
df[0, 1]
df[np.array([0, 1])]

Methods

aggregate(func, *args[, cols])

Group all of the np.ndarrays across all columns as a list and apply a function.

append(other)

Append the contents of the other df to this df.

apply(func, *args[, astype, preprocess, inplace])

Apply a function to each np.ndarray.

apply_to_col_names(func, *args[, cols, inplace])

Apply a function to the column names and returns a new dataframe.

concat(others[, fill_value])

Concatenate several dfs into a single df.

copy()

Copy the df.

dump(f)

Dump byte repr of df to the specified path.

dumps()

Use msgpack to dump df to a byte string.

fill_value(cols, value)

Create new columns, if they are missing, and fill them with the specified value.

get(k[,d])

group_apply(others, func, *args[, expand, …])

Groups np.arrays according to their column name for several dataframes (as a list) and applies a function to each group.

hstack(others)

Apply np.hstack to each column for several dfs.

items()

Iterate key: arr for the the underlying data dict.

keys()

load(f)

Load the byte repr of df from the specified path.

loads(s)

Use msgpack to load a df from a byte string.

merge(others)

Merge many dfs into a single df.

prefix(s[, cols, inplace])

Adds a prefix to all of the column names and returns a new dataframe.

reshape(shape)

Reshape all arrays in the df.

stack(others, axis)

Apply np.stack to each column for several dfs.

suffix(s[, cols, inplace])

Adds a prefix to all of the column names and returns a new dataframe.

to_df([force])

Force the NumpyDataFrame into a pandas.DataFrame.

update(data[, apply])

Update the df from a dict or another df.

validate()

Validate that the shapes of all of the np.ndarrays are the same.

values()

vstack(others)

Apply np.vstack to each column for several dfs.

Attributes

col

Return the column indexer.

columns

Return the column names.

data

The underlying data dict of the dataframe.

shape

Return the expected shape for the underlying np.ndarray.

Columns can be selected and returned:

df.col['A', 'B']    # return df with 'A' and 'B'
df.col['A']         # return df with only 'A'
print(list(df.col)) # return the column names
print(df.columns)   # also returns the column names

New columns can be added:

df.col['A'] = np.arange(10)

Columns can be deleted:

del df.col['B']

Add prefix or suffix to column names:

df.prefix(‘prefix_’, cols=[‘A’]) # add prefix only to ‘A’, return new df df.suffix(‘__suffix’) # add suffix to all columns, return new df

apply

Functions can be apply to each column using np.apply. For example, the following applies np.reshape to all of the columns individually, returning a new df:

df = NumpyDataFrame({'A': np.arange(10), 'B': np.arange(10)})
df.apply(np.reshape, (-1, 1))

Functions can be applied to tuples of all of the columns by np.aggregate. For example, the following stacks all of the columns horizontally, returning a new df:

df = NumpyDataFrame({'A': np.arange(10), 'B': np.arange(10)})
df.aggregate(np.hstack)

Functions can be applied to grouped columns of multiple dataframes using np.group_apply. For example, the following applies stackes each column in each df horizontally:

df1 = NumpyDataFrame({'A': np.arange(10), 'B': np.arange(10)})
df2 = NumpyDataFrame({'A': np.arange(10), 'B': np.arange(10)})
NumpyDataFrame.group_apply((df1, df2), np.hstack)

In another example, we can apply np.divide to two dfs, using expand=True to expand the underlying arguments to properly run np.divide. The following two strategies are functionally equivalent:

# strategy 1
df1.group_apply((df1, df2), np.divide, expand=True)

# strategy 2
def div(a):
  return np.divide(a[0], a[1])
df1.group_apply((df1, df2), div)

operations

Mathematical operations can be performed the same as you would np.ndarrays:

df1 = NumpyDataFrame({'A': np.arange(10), 'B': np.arange(10)})
df2 = NumpyDataFrame({'A': np.arange(10), 'B': np.arange(10)})

df3 = df1 + df2  # add each element in each column
df3 += 10 # add 10 to each element
df3 *= 2. # multiply each element by 2.
df3 * df3 # multiply each element in each column element wise
df3 ** 2
df3 ** df2

concatenations and appending

Dataframes can be concatenated together by the following:

df1 = NumpyDataFrame({'A': np.arange(10), 'B': np.arange(10)})
df2 = NumpyDataFrame({'A': np.arange(10), 'B': np.arange(10)})
NumpyDataFrame.concat((df1, df2))

Dataframes with different column can be concatenated together by setting a fill value

df1 = NumpyDataFrame({'A': np.arange(10), 'B': np.arange(10)})
df2 = NumpyDataFrame({'A': np.arange(10), 'C': np.arange(10)})
NumpyDataFrame.concat((df1, df2), fill_value=np.nan)

conversions

to pandas .. code-block:

df1.to_df()

to numpy .. code-block:

df1.aggragate(np.hstack)

change dtype

Initializes a numpy data frame from a dict of string to np.ndarrays. The dict keys are representative of column names and the values are rows of that column. The shapes of each np.ndarray must be the same, else NumpyDataFrameException is raised.

Parameters
  • data – A dict of string to np.ndarrays

  • apply – Function to apply across the numpy data frame

aggregate(func, *args, cols=None, **kwargs)[source]

Group all of the np.ndarrays across all columns as a list and apply a function.

Parameters
  • func – the function to apply (e.g. np.hstack)

  • args – the additional arguments of the function

  • cols – the cols to apply the function to. If False, all columns are used.

  • kwargs – the keyword arguments to apply to the function

Return type

Any

Returns

the result of the function

append(other)[source]

Append the contents of the other df to this df.

Return type

NumpyDataFrame

apply(func, *args, astype=None, preprocess=None, inplace=False, **kwargs)[source]

Apply a function to each np.ndarray.

Parameters
  • func (Callable) – The function to appluy

  • args – the additional arguments of the function

  • astype – the type of data frame to return

  • preprocess – preprocess function to apply to each np.ndarray before

applying ‘func’ :param inplace: If True, will apply the function to the current df and return the current df. :param kwargs: the keyword arguments to apply to the function :rtype: Any :return: a new dataframe

apply_to_col_names(func, *args, cols=None, inplace=False, **kwargs)[source]

Apply a function to the column names and returns a new dataframe.

Parameters
  • func – the function to apply

  • args – the additional function arguments

  • cols – the columns to apply the function to. If False, all columns are

used. :param inplace: if True, will apply the function to the current df and return the current df. :param kwargs: the additional function keyword arguments :rtype: NumpyDataFrame :return:

property col

Return the column indexer.

Return type

NumpyDataFrameIndexer

property columns

Return the column names.

Return type

Tuple[str, …]

classmethod concat(others, fill_value=<class 'dasi.utils.npdf.Null'>)[source]

Concatenate several dfs into a single df.

Return type

NumpyDataFrame

copy()[source]

Copy the df.

Return type

NumpyDataFrame

property data

The underlying data dict of the dataframe.

Return type

Dict[str, ndarray]

dump(f)[source]

Dump byte repr of df to the specified path.

dumps()[source]

Use msgpack to dump df to a byte string.

fill_value(cols, value)[source]

Create new columns, if they are missing, and fill them with the specified value.

Return type

None

get(k[, d]) → D[k] if k in D, else d. d defaults to None.
classmethod group_apply(others, func, *args, expand=False, _fill_value=<class 'dasi.utils.npdf.Null'>, **kwargs)[source]

Groups np.arrays according to their column name for several dataframes (as a list) and applies a function to each group. Returns a new df with the results.

Parameters
  • others (Iterable[NumpyDataFrame]) – iterable of dfs

  • func – the function to apply

  • args – additional arguments for the function

  • expand – If true, the list of np.arrays will be expanded, as in

func(*list_of_arrs, …) :param _fill_value: :param kwargs: additional keyword arguments for the function :rtype: NumpyDataFrame :return: a new df

classmethod hstack(others)[source]

Apply np.hstack to each column for several dfs.

items()[source]

Iterate key: arr for the the underlying data dict.

Return type

ItemsView[str, ndarray]

keys() → a set-like object providing a view on D's keys
classmethod load(f)[source]

Load the byte repr of df from the specified path.

Return type

NumpyDataFrame

classmethod loads(s)[source]

Use msgpack to load a df from a byte string.

Return type

NumpyDataFrame

classmethod merge(others)[source]

Merge many dfs into a single df.

Return type

NumpyDataFrame

prefix(s, cols=None, inplace=False)[source]

Adds a prefix to all of the column names and returns a new dataframe.

Return type

NumpyDataFrame

reshape(shape)[source]

Reshape all arrays in the df.

Return type

NumpyDataFrame

property shape

Return the expected shape for the underlying np.ndarray.

This is the shape of the array for any given column.

Return type

Tuple[int, …]

classmethod stack(others, axis)[source]

Apply np.stack to each column for several dfs.

suffix(s, cols=None, inplace=False)[source]

Adds a prefix to all of the column names and returns a new dataframe.

Return type

NumpyDataFrame

to_df(force=True)[source]

Force the NumpyDataFrame into a pandas.DataFrame.

Return type

DataFrame

update(data, apply=None)[source]

Update the df from a dict or another df.

validate()[source]

Validate that the shapes of all of the np.ndarrays are the same.

values() → an object providing a view on D's values
classmethod vstack(others)[source]

Apply np.vstack to each column for several dfs.

exception dasi.utils.npdf.NumpyDataFrameException[source]

Bases: Exception

Generic exceptions for NumpyDataFrame.

with_traceback()

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

class dasi.utils.npdf.NumpyDataFrameIndexer(df)[source]

Bases: collections.abc.Mapping

The indexer for NumpyDataFrames.

Methods

get(k[,d])

items()

keys()

values()

get(k[, d]) → D[k] if k in D, else d. d defaults to None.
items() → a set-like object providing a view on D's items
keys() → a set-like object providing a view on D's keys
values() → an object providing a view on D's values