dasi.utils.npdf¶

NumpyDataFrame.

Classes

`Null`	Not None.
`NumpyDataFrame`([data, apply])	The NumpyDataFrame is a class halfway between pandas and numpy.
`NumpyDataFrameIndexer`(df)	The indexer for NumpyDataFrames.

Exceptions

NumpyDataFrameException

Generic exceptions for NumpyDataFrame.

Classes

`Null`	Not None.
`NumpyDataFrame`([data, apply])	The NumpyDataFrame is a class halfway between pandas and numpy.
`NumpyDataFrameIndexer`(df)	The indexer for NumpyDataFrames.

Exceptions

NumpyDataFrameException

Generic exceptions for NumpyDataFrame.

class dasi.utils.npdf.Null[source]¶

Bases: object

Not None.

class dasi.utils.npdf.NumpyDataFrame(data=None, apply=None)[source]¶

Bases: collections.abc.Mapping

The NumpyDataFrame is a class halfway between pandas and numpy. It has named columns, indexing, slicing, function applications, and mathematical operations. Unlike pandas however, it maintains the multi-dimensionality of underlying data (as np.ndarray), allowing broadcasting and complex indexing.

Usage:

indexing and columns

All of the underlying arrays can be slices and indexed using the slicing operations. The native np.ndarray indexing is used for each column, meaning a view is returned with the same memory locations.

df = NumpyDataFrame({'A': np.arange(10), 'B': np.arange(10)})
df[0]
df[0, 1]
df[np.array([0, 1])]

Methods

`aggregate`(func, *args[, cols])	Group all of the np.ndarrays across all columns as a list and apply a function.
`append`(other)	Append the contents of the other df to this df.
`apply`(func, *args[, astype, preprocess, inplace])	Apply a function to each np.ndarray.
`apply_to_col_names`(func, *args[, cols, inplace])	Apply a function to the column names and returns a new dataframe.
`concat`(others[, fill_value])	Concatenate several dfs into a single df.
`copy`()	Copy the df.
`dump`(f)	Dump byte repr of df to the specified path.
`dumps`()	Use msgpack to dump df to a byte string.
`fill_value`(cols, value)	Create new columns, if they are missing, and fill them with the specified value.
`get`(k[,d])
`group_apply`(others, func, *args[, expand, …])	Groups np.arrays according to their column name for several dataframes (as a list) and applies a function to each group.
`hstack`(others)	Apply np.hstack to each column for several dfs.
`items`()	Iterate key: arr for the the underlying data dict.
`keys`()
`load`(f)	Load the byte repr of df from the specified path.
`loads`(s)	Use msgpack to load a df from a byte string.
`merge`(others)	Merge many dfs into a single df.
`prefix`(s[, cols, inplace])	Adds a prefix to all of the column names and returns a new dataframe.
`reshape`(shape)	Reshape all arrays in the df.
`stack`(others, axis)	Apply np.stack to each column for several dfs.
`suffix`(s[, cols, inplace])	Adds a prefix to all of the column names and returns a new dataframe.
`to_df`([force])	Force the NumpyDataFrame into a pandas.DataFrame.
`update`(data[, apply])	Update the df from a dict or another df.
`validate`()	Validate that the shapes of all of the np.ndarrays are the same.
`values`()
`vstack`(others)	Apply np.vstack to each column for several dfs.

Attributes

`col`	Return the column indexer.
`columns`	Return the column names.
`data`	The underlying data dict of the dataframe.
`shape`	Return the expected shape for the underlying np.ndarray.

Columns can be selected and returned:

df.col['A', 'B']    # return df with 'A' and 'B'
df.col['A']         # return df with only 'A'
print(list(df.col)) # return the column names
print(df.columns)   # also returns the column names

New columns can be added:

df.col['A'] = np.arange(10)

Columns can be deleted:

del df.col['B']

Add prefix or suffix to column names:

df.prefix(‘prefix_’, cols=[‘A’]) # add prefix only to ‘A’, return new df df.suffix(‘__suffix’) # add suffix to all columns, return new df

apply

Functions can be apply to each column using np.apply. For example, the following applies np.reshape to all of the columns individually, returning a new df:

df = NumpyDataFrame({'A': np.arange(10), 'B': np.arange(10)})
df.apply(np.reshape, (-1, 1))

Functions can be applied to tuples of all of the columns by np.aggregate. For example, the following stacks all of the columns horizontally, returning a new df:

df = NumpyDataFrame({'A': np.arange(10), 'B': np.arange(10)})
df.aggregate(np.hstack)

Functions can be applied to grouped columns of multiple dataframes using np.group_apply. For example, the following applies stackes each column in each df horizontally:

df1 = NumpyDataFrame({'A': np.arange(10), 'B': np.arange(10)})
df2 = NumpyDataFrame({'A': np.arange(10), 'B': np.arange(10)})
NumpyDataFrame.group_apply((df1, df2), np.hstack)

In another example, we can apply np.divide to two dfs, using expand=True to expand the underlying arguments to properly run np.divide. The following two strategies are functionally equivalent:

# strategy 1
df1.group_apply((df1, df2), np.divide, expand=True)

# strategy 2
def div(a):
  return np.divide(a[0], a[1])
df1.group_apply((df1, df2), div)

operations

Mathematical operations can be performed the same as you would np.ndarrays:

df1 = NumpyDataFrame({'A': np.arange(10), 'B': np.arange(10)})
df2 = NumpyDataFrame({'A': np.arange(10), 'B': np.arange(10)})

df3 = df1 + df2  # add each element in each column
df3 += 10 # add 10 to each element
df3 *= 2. # multiply each element by 2.
df3 * df3 # multiply each element in each column element wise
df3 ** 2
df3 ** df2

concatenations and appending

Dataframes can be concatenated together by the following:

df1 = NumpyDataFrame({'A': np.arange(10), 'B': np.arange(10)})
df2 = NumpyDataFrame({'A': np.arange(10), 'B': np.arange(10)})
NumpyDataFrame.concat((df1, df2))

Dataframes with different column can be concatenated together by setting a fill value

df1 = NumpyDataFrame({'A': np.arange(10), 'B': np.arange(10)})
df2 = NumpyDataFrame({'A': np.arange(10), 'C': np.arange(10)})
NumpyDataFrame.concat((df1, df2), fill_value=np.nan)

conversions

to pandas .. code-block:

df1.to_df()

to numpy .. code-block:

df1.aggragate(np.hstack)

change dtype

Initializes a numpy data frame from a dict of string to np.ndarrays. The dict keys are representative of column names and the values are rows of that column. The shapes of each np.ndarray must be the same, else NumpyDataFrameException is raised.

Parameters

data – A dict of string to np.ndarrays
apply – Function to apply across the numpy data frame

aggregate(func, *args, cols=None, **kwargs)[source]¶

Group all of the np.ndarrays across all columns as a list and apply a function.

Parameters

func – the function to apply (e.g. np.hstack)
args – the additional arguments of the function
cols – the cols to apply the function to. If False, all columns are used.
kwargs – the keyword arguments to apply to the function

Return type

Any

Returns

the result of the function

append(other)[source]¶

Append the contents of the other df to this df.

Return type: NumpyDataFrame

apply(func, *args, astype=None, preprocess=None, inplace=False, **kwargs)[source]¶

Apply a function to each np.ndarray.

Parameters

func (Callable) – The function to appluy
args – the additional arguments of the function
astype – the type of data frame to return
preprocess – preprocess function to apply to each np.ndarray before

applying ‘func’ :param inplace: If True, will apply the function to the current df and return the current df. :param kwargs: the keyword arguments to apply to the function :rtype: Any :return: a new dataframe

apply_to_col_names(func, *args, cols=None, inplace=False, **kwargs)[source]¶

Apply a function to the column names and returns a new dataframe.

Parameters

func – the function to apply
args – the additional function arguments
cols – the columns to apply the function to. If False, all columns are

used. :param inplace: if True, will apply the function to the current df and return the current df. :param kwargs: the additional function keyword arguments :rtype: NumpyDataFrame :return:

property col¶

Return the column indexer.

Return type: NumpyDataFrameIndexer

property columns¶

Return the column names.

Return type: Tuple[str, …]

classmethod concat(others, fill_value=<class 'dasi.utils.npdf.Null'>)[source]¶

Concatenate several dfs into a single df.

Return type: NumpyDataFrame

copy()[source]¶

Copy the df.

Return type: NumpyDataFrame

property data¶

The underlying data dict of the dataframe.

Return type: Dict[str, ndarray]

dump(f)[source]¶: Dump byte repr of df to the specified path.

dumps()[source]¶: Use msgpack to dump df to a byte string.

fill_value(cols, value)[source]¶

Create new columns, if they are missing, and fill them with the specified value.

Return type: None

get(k[, d]) → D[k] if k in D, else d. d defaults to None.¶

classmethod group_apply(others, func, *args, expand=False, _fill_value=<class 'dasi.utils.npdf.Null'>, **kwargs)[source]¶

Groups np.arrays according to their column name for several dataframes (as a list) and applies a function to each group. Returns a new df with the results.

Parameters

others (Iterable[NumpyDataFrame]) – iterable of dfs
func – the function to apply
args – additional arguments for the function
expand – If true, the list of np.arrays will be expanded, as in

func(*list_of_arrs, …) :param _fill_value: :param kwargs: additional keyword arguments for the function :rtype: NumpyDataFrame :return: a new df

classmethod hstack(others)[source]¶: Apply np.hstack to each column for several dfs.

items()[source]¶

Iterate key: arr for the the underlying data dict.

Return type: ItemsView[str, ndarray]

keys() → a set-like object providing a view on D's keys¶

classmethod load(f)[source]¶

Load the byte repr of df from the specified path.

Return type: NumpyDataFrame

classmethod loads(s)[source]¶

Use msgpack to load a df from a byte string.

Return type: NumpyDataFrame

classmethod merge(others)[source]¶

Merge many dfs into a single df.

Return type: NumpyDataFrame

prefix(s, cols=None, inplace=False)[source]¶

Adds a prefix to all of the column names and returns a new dataframe.

Return type: NumpyDataFrame

reshape(shape)[source]¶

Reshape all arrays in the df.

Return type: NumpyDataFrame

property shape¶

Return the expected shape for the underlying np.ndarray.

This is the shape of the array for any given column.

Return type: Tuple[int, …]

classmethod stack(others, axis)[source]¶: Apply np.stack to each column for several dfs.

suffix(s, cols=None, inplace=False)[source]¶

Adds a prefix to all of the column names and returns a new dataframe.

Return type: NumpyDataFrame

to_df(force=True)[source]¶

Force the NumpyDataFrame into a pandas.DataFrame.

Return type: DataFrame

update(data, apply=None)[source]¶: Update the df from a dict or another df.

validate()[source]¶: Validate that the shapes of all of the np.ndarrays are the same.

values() → an object providing a view on D's values¶

classmethod vstack(others)[source]¶: Apply np.vstack to each column for several dfs.

exception dasi.utils.npdf.NumpyDataFrameException[source]¶

Bases: Exception

Generic exceptions for NumpyDataFrame.

with_traceback()¶: Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

class dasi.utils.npdf.NumpyDataFrameIndexer(df)[source]¶

Bases: collections.abc.Mapping

The indexer for NumpyDataFrames.

Methods

`get`(k[,d])
`items`()
`keys`()
`values`()

get(k[, d]) → D[k] if k in D, else d. d defaults to None.¶

items() → a set-like object providing a view on D's items¶

keys() → a set-like object providing a view on D's keys¶

values() → an object providing a view on D's values¶