Utility Functions (datascience.util)

Utility functions

datascience.util.percentile(p, arr=None)[source]

Returns the pth percentile of the input array (the value that is at least as great as p% of the values in the array)

If arr is not provided, percentile returns itself curried with p

>>> percentile(67, [1, 3, 5, 9])
9
>>> percentile(66, [1, 3, 5, 9])
5
>>> f = percentile(66)
>>> f([1, 3, 5, 9])
5
datascience.util.plot_cdf_area(rbound, lbound=None, mean=0, sd=1)[source]

Plots a normal curve with specified parameters and area below curve shaded between lbound and rbound.

Args:

rbound (numeric): right boundary of shaded region

lbound (numeric): left boundary of shaded region; by default is negative infinity

mean (numeric): mean/expectation of normal distribution

sd (numeric): standard deviation of normal distribution

datascience.util.table_apply(table, func, subset=None)[source]

Applies a function to each column and returns a Table.

Uses pandas apply under the hood, then converts back to a Table

table : instance of Table
The table to apply your function to
func : function
Any function that will work with DataFrame.apply
subset : list | None
A list of columns to apply the function to. If None, function will be applied to all columns in table
tab : instance of Table
A table with the given function applied. It will either be the shape == shape(table), or shape (1, table.shape[1])