dasi.utils.npdf¶
NumpyDataFrame.
Classes
Not None. |
|
|
The NumpyDataFrame is a class halfway between pandas and numpy. |
The indexer for NumpyDataFrames. |
Exceptions
Generic exceptions for NumpyDataFrame. |
Classes
Not None. |
|
|
The NumpyDataFrame is a class halfway between pandas and numpy. |
The indexer for NumpyDataFrames. |
Exceptions
Generic exceptions for NumpyDataFrame. |
-
class
dasi.utils.npdf.
NumpyDataFrame
(data=None, apply=None)[source]¶ Bases:
collections.abc.Mapping
The NumpyDataFrame is a class halfway between pandas and numpy. It has named columns, indexing, slicing, function applications, and mathematical operations. Unlike pandas however, it maintains the multi-dimensionality of underlying data (as np.ndarray), allowing broadcasting and complex indexing.
Usage:
indexing and columns
All of the underlying arrays can be slices and indexed using the slicing operations. The native np.ndarray indexing is used for each column, meaning a view is returned with the same memory locations.
df = NumpyDataFrame({'A': np.arange(10), 'B': np.arange(10)}) df[0] df[0, 1] df[np.array([0, 1])]
Methods
aggregate
(func, *args[, cols])Group all of the np.ndarrays across all columns as a list and apply a function.
append
(other)Append the contents of the other df to this df.
apply
(func, *args[, astype, preprocess, inplace])Apply a function to each np.ndarray.
apply_to_col_names
(func, *args[, cols, inplace])Apply a function to the column names and returns a new dataframe.
concat
(others[, fill_value])Concatenate several dfs into a single df.
copy
()Copy the df.
dump
(f)Dump byte repr of df to the specified path.
dumps
()Use msgpack to dump df to a byte string.
fill_value
(cols, value)Create new columns, if they are missing, and fill them with the specified value.
get
(k[,d])group_apply
(others, func, *args[, expand, …])Groups np.arrays according to their column name for several dataframes (as a list) and applies a function to each group.
hstack
(others)Apply np.hstack to each column for several dfs.
items
()Iterate key: arr for the the underlying data dict.
keys
()load
(f)Load the byte repr of df from the specified path.
loads
(s)Use msgpack to load a df from a byte string.
merge
(others)Merge many dfs into a single df.
prefix
(s[, cols, inplace])Adds a prefix to all of the column names and returns a new dataframe.
reshape
(shape)Reshape all arrays in the df.
stack
(others, axis)Apply np.stack to each column for several dfs.
suffix
(s[, cols, inplace])Adds a prefix to all of the column names and returns a new dataframe.
to_df
([force])Force the NumpyDataFrame into a pandas.DataFrame.
update
(data[, apply])Update the df from a dict or another df.
validate
()Validate that the shapes of all of the np.ndarrays are the same.
values
()vstack
(others)Apply np.vstack to each column for several dfs.
Attributes
Return the column indexer.
Return the column names.
The underlying data dict of the dataframe.
Return the expected shape for the underlying np.ndarray.
Columns can be selected and returned:
df.col['A', 'B'] # return df with 'A' and 'B' df.col['A'] # return df with only 'A' print(list(df.col)) # return the column names print(df.columns) # also returns the column names
New columns can be added:
df.col['A'] = np.arange(10)
Columns can be deleted:
del df.col['B']
Add prefix or suffix to column names:
df.prefix(‘prefix_’, cols=[‘A’]) # add prefix only to ‘A’, return new df df.suffix(‘__suffix’) # add suffix to all columns, return new df
apply
Functions can be apply to each column using np.apply. For example, the following applies np.reshape to all of the columns individually, returning a new df:
df = NumpyDataFrame({'A': np.arange(10), 'B': np.arange(10)}) df.apply(np.reshape, (-1, 1))
Functions can be applied to tuples of all of the columns by np.aggregate. For example, the following stacks all of the columns horizontally, returning a new df:
df = NumpyDataFrame({'A': np.arange(10), 'B': np.arange(10)}) df.aggregate(np.hstack)
Functions can be applied to grouped columns of multiple dataframes using np.group_apply. For example, the following applies stackes each column in each df horizontally:
df1 = NumpyDataFrame({'A': np.arange(10), 'B': np.arange(10)}) df2 = NumpyDataFrame({'A': np.arange(10), 'B': np.arange(10)}) NumpyDataFrame.group_apply((df1, df2), np.hstack)
In another example, we can apply np.divide to two dfs, using expand=True to expand the underlying arguments to properly run np.divide. The following two strategies are functionally equivalent:
# strategy 1 df1.group_apply((df1, df2), np.divide, expand=True) # strategy 2 def div(a): return np.divide(a[0], a[1]) df1.group_apply((df1, df2), div)
operations
Mathematical operations can be performed the same as you would np.ndarrays:
df1 = NumpyDataFrame({'A': np.arange(10), 'B': np.arange(10)}) df2 = NumpyDataFrame({'A': np.arange(10), 'B': np.arange(10)}) df3 = df1 + df2 # add each element in each column df3 += 10 # add 10 to each element df3 *= 2. # multiply each element by 2. df3 * df3 # multiply each element in each column element wise df3 ** 2 df3 ** df2
concatenations and appending
Dataframes can be concatenated together by the following:
df1 = NumpyDataFrame({'A': np.arange(10), 'B': np.arange(10)}) df2 = NumpyDataFrame({'A': np.arange(10), 'B': np.arange(10)}) NumpyDataFrame.concat((df1, df2))
Dataframes with different column can be concatenated together by setting a fill value
df1 = NumpyDataFrame({'A': np.arange(10), 'B': np.arange(10)}) df2 = NumpyDataFrame({'A': np.arange(10), 'C': np.arange(10)}) NumpyDataFrame.concat((df1, df2), fill_value=np.nan)
conversions
to pandas .. code-block:
df1.to_df()
to numpy .. code-block:
df1.aggragate(np.hstack)
change dtype
Initializes a numpy data frame from a dict of string to np.ndarrays. The dict keys are representative of column names and the values are rows of that column. The shapes of each np.ndarray must be the same, else
NumpyDataFrameException
is raised.- Parameters
data – A dict of string to np.ndarrays
apply – Function to apply across the numpy data frame
-
aggregate
(func, *args, cols=None, **kwargs)[source]¶ Group all of the np.ndarrays across all columns as a list and apply a function.
- Parameters
func – the function to apply (e.g. np.hstack)
args – the additional arguments of the function
cols – the cols to apply the function to. If False, all columns are used.
kwargs – the keyword arguments to apply to the function
- Return type
Any
- Returns
the result of the function
-
apply
(func, *args, astype=None, preprocess=None, inplace=False, **kwargs)[source]¶ Apply a function to each np.ndarray.
- Parameters
func (
Callable
) – The function to appluyargs – the additional arguments of the function
astype – the type of data frame to return
preprocess – preprocess function to apply to each np.ndarray before
applying ‘func’ :param inplace: If True, will apply the function to the current df and return the current df. :param kwargs: the keyword arguments to apply to the function :rtype:
Any
:return: a new dataframe
-
apply_to_col_names
(func, *args, cols=None, inplace=False, **kwargs)[source]¶ Apply a function to the column names and returns a new dataframe.
- Parameters
func – the function to apply
args – the additional function arguments
cols – the columns to apply the function to. If False, all columns are
used. :param inplace: if True, will apply the function to the current df and return the current df. :param kwargs: the additional function keyword arguments :rtype:
NumpyDataFrame
:return:
-
property
col
¶ Return the column indexer.
- Return type
-
property
columns
¶ Return the column names.
- Return type
Tuple
[str
, …]
-
classmethod
concat
(others, fill_value=<class 'dasi.utils.npdf.Null'>)[source]¶ Concatenate several dfs into a single df.
- Return type
-
property
data
¶ The underlying data dict of the dataframe.
- Return type
Dict
[str
,ndarray
]
-
fill_value
(cols, value)[source]¶ Create new columns, if they are missing, and fill them with the specified value.
- Return type
None
-
get
(k[, d]) → D[k] if k in D, else d. d defaults to None.¶
-
classmethod
group_apply
(others, func, *args, expand=False, _fill_value=<class 'dasi.utils.npdf.Null'>, **kwargs)[source]¶ Groups np.arrays according to their column name for several dataframes (as a list) and applies a function to each group. Returns a new df with the results.
- Parameters
others (
Iterable
[NumpyDataFrame
]) – iterable of dfsfunc – the function to apply
args – additional arguments for the function
expand – If true, the list of np.arrays will be expanded, as in
func(*list_of_arrs, …) :param _fill_value: :param kwargs: additional keyword arguments for the function :rtype:
NumpyDataFrame
:return: a new df
-
items
()[source]¶ Iterate key: arr for the the underlying data dict.
- Return type
ItemsView
[str
,ndarray
]
-
keys
() → a set-like object providing a view on D's keys¶
-
prefix
(s, cols=None, inplace=False)[source]¶ Adds a prefix to all of the column names and returns a new dataframe.
- Return type
-
property
shape
¶ Return the expected shape for the underlying np.ndarray.
This is the shape of the array for any given column.
- Return type
Tuple
[int
, …]
-
suffix
(s, cols=None, inplace=False)[source]¶ Adds a prefix to all of the column names and returns a new dataframe.
- Return type
-
values
() → an object providing a view on D's values¶
-
exception
dasi.utils.npdf.
NumpyDataFrameException
[source]¶ Bases:
Exception
Generic exceptions for NumpyDataFrame.
-
with_traceback
()¶ Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
-
-
class
dasi.utils.npdf.
NumpyDataFrameIndexer
(df)[source]¶ Bases:
collections.abc.Mapping
The indexer for NumpyDataFrames.
Methods
get
(k[,d])items
()keys
()values
()-
get
(k[, d]) → D[k] if k in D, else d. d defaults to None.¶
-
items
() → a set-like object providing a view on D's items¶
-
keys
() → a set-like object providing a view on D's keys¶
-
values
() → an object providing a view on D's values¶
-