datascience.tables.Table.sample

Table.sample(k=None, with_replacement=True, weights=None)[source]

Return a new table where k rows are randomly sampled from the original table.

Args:
k – specifies the number of rows (int) to be sampled from

the table. Default is k equal to number of rows in the table.

with_replacement – (bool) By default True;

Samples k rows with replacement from table, else samples k rows without replacement.

weights – Array specifying probability the ith row of the

table is sampled. Defaults to None, which samples each row with equal probability. weights must be a valid probability distribution – i.e. an array the length of the number of rows, summing to 1.

Raises:
ValueError – if weights is not length equal to number of rows

in the table; or, if weights does not sum to 1.

Returns:

A new instance of Table with k rows resampled.

>>> jobs = Table().with_columns(
...     'job',  make_array('a', 'b', 'c', 'd'),
...     'wage', make_array(10, 20, 15, 8))
>>> jobs
job  | wage
a    | 10
b    | 20
c    | 15
d    | 8
>>> jobs.sample() 
job  | wage
b    | 20
b    | 20
a    | 10
d    | 8
>>> jobs.sample(with_replacement=True) 
job  | wage
d    | 8
b    | 20
c    | 15
a    | 10
>>> jobs.sample(k = 2) 
job  | wage
b    | 20
c    | 15
>>> ws =  make_array(0.5, 0.5, 0, 0)
>>> jobs.sample(k=2, with_replacement=True, weights=ws) 
job  | wage
a    | 10
a    | 10
>>> jobs.sample(k=2, weights=make_array(1, 0, 1, 0))
Traceback (most recent call last):
    ...
ValueError: probabilities do not sum to 1
>>> jobs.sample(k=2, weights=make_array(1, 0, 0)) # Weights must be length of table.
Traceback (most recent call last):
    ...
ValueError: 'a' and 'p' must have same size