datascience.tables.Table.sample¶
-
Table.
sample
(k=None, with_replacement=True, weights=None)[source]¶ Returns a new table where k rows are randomly sampled from the original table.
- Args:
k
– specifies the number of rows (int
) to be sampled from- the table. Default is k equal to number of rows in the table.
with_replacement
– (bool
) By default True; Samplesk
- rows with replacement from table, else samples
k
rows without replacement. weights
– Array specifying probability the ith row of the- table is sampled. Defaults to None, which samples each row
with equal probability.
weights
must be a valid probability distribution – i.e. an array the length of the number of rows, summing to 1.
- Raises:
- ValueError – if
weights
is not length equal to number of rows - in the table; or, if
weights
does not sum to 1.
- ValueError – if
- Returns:
- A new instance of
Table
withk
rows resampled.
>>> jobs = Table().with_columns( ... 'job', make_array('a', 'b', 'c', 'd'), ... 'wage', make_array(10, 20, 15, 8)) >>> jobs job | wage a | 10 b | 20 c | 15 d | 8 >>> jobs.sample() job | wage b | 20 b | 20 a | 10 d | 8 >>> jobs.sample(with_replacement=True) job | wage d | 8 b | 20 c | 15 a | 10 >>> jobs.sample(k = 2) job | wage b | 20 c | 15 >>> jobs.sample(k = 2, with_replacement = True, ... weights = make_array(0.5, 0.5, 0, 0)) job | wage a | 10 a | 10 >>> jobs.sample(k = 2, weights = make_array(1, 0, 1, 0)) Traceback (most recent call last): ... ValueError: probabilities do not sum to 1
# Weights must be length of table. >>> jobs.sample(k = 2, weights = make_array(1, 0, 0)) Traceback (most recent call last):
...ValueError: a and p must have same size