Interface¶

Pyroed’s high-level interface includes a design language and a set of functions to operate on Python data structures.

The design language allows you to specify a problem by defining a SCHEMA, a list CONSTRAINTS of Constraint objects, a list FEATURE_BLOCKS defining cross features, and a list GIBBS_BLOCKS defining groups of features that are related to each other. The examples in this module will use the following model specification:

SCHEMA = OrderedDict()
SCHEMA["aa1"] = ["P", "L", None]
SCHEMA["aa2"] = ["N", "Y",  "T", None]
SCHEMA["aa3"] = ["R", "S"]

CONSTRAINTS = [Not(And(TakesValue("aa1", None), TakesValue("aa2", None)))]

FEATURE_BLOCKS = [["aa1"], ["aa2"], ["aa3"], ["aa1", "aa2"], ["aa2", "aa3"]]

GIBBS_BLOCKS = [["aa1", "aa2"], ["aa2", "aa3"]]

After declaring the design space, we can progressively gather data into an experiment dict by using the functions in this module and by experimentally measuring sequences.

encode_design() and decode_design() convert between text-representations of designs like [["P", "N", "R"], ["P", "N", "S"]] and PyTorch representations of designs like torch.tensor([[0, 0, 0], [0, 0, 1]]).
start_experiment() initializes an experiment dict,
get_next_design() suggests a next set of sequences to test, and
update_experiment() updates an experiment dict with measured responses.

Note that get_next_design() merely retuns suggested sequences; you can ignore these suggestions or measure a different set of sequences if you want. For example if some of your measurements are lost due to technical reasons, you can simply pass a subset of the suggested design back to update_experiment().

pyroed.api.decode_design(schema: Dict[str, List[Optional[str]]], sequences: torch.Tensor) → List[List[Optional[str]]][source]¶

Converts an tensor representation of a design into a readable list of designs.

Example:

SCHEMA = OrderedDict()
SCHEMA["aa1"] = ["P", "L", None]
SCHEMA["aa2"] = ["N", "Y",  "T", None]
SCHEMA["aa3"] = ["R", "S"]

sequences = torch.tensor([[0, 0, 0], [0, 0, 1], [2, 0, 0]])
design = decode_design(SCHEMA, sequences)
print(design)
# [["P", "N", "R"], ["P", "N", "S"], [None, "N", "R"]]

Parameters

schema (OrderedDict) – A schema dict.
sequences (torch.Tensor) – A tensor of encoded sequences.

Returns

A list of list of choices (strings or None).

Return type

list

pyroed.api.encode_design(schema: Dict[str, List[Optional[str]]], design: Iterable[List[Optional[str]]]) → torch.Tensor[source]¶

Converts a human readable list of sequences into a tensor.

Example:

SCHEMA = OrderedDict()
SCHEMA["aa1"] = ["P", "L", None]
SCHEMA["aa2"] = ["N", "Y",  "T", None]
SCHEMA["aa3"] = ["R", "S"]

design = [
    ["P", "N", "R"],
    ["P", "N", "S"],
    [None, "N", "R"],
    ["P", None, "R"],
]
sequences = encode_design(SCHEMA, design)
print(sequences)
# torch.tensor([[0, 0, 0], [0, 0, 1], [2, 0, 0], [0, 3, 0]])

Parameters

schema (OrderedDict) – A schema dict.
design (list) – A list of list of choices (strings or None).

Returns

A tensor of encoded sequences.

Return type

torch.tensor

pyroed.api.get_next_design(schema: Dict[str, List[Optional[str]]], constraints: List[Callable], feature_blocks: List[List[str]], gibbs_blocks: List[List[str]], experiment: Dict[str, torch.Tensor], *, design_size: int = 10, feature_fn: Optional[Callable[[torch.Tensor], torch.Tensor]] = None, config: Optional[dict] = None) → torch.Tensor[source]¶

Generate a new design given cumulative experimental data.

Under the hood this runs thompson_sample(), which performs Bayesian inference via either variational inference fit_svi() or MCMC fit_mcmc() and performs optimization via optimize_simulated_annealing(). These algorithms can be tuned through the config dict.

Example:

# Initialize experiment.
sequences = encode_design(SCHEMA, [
    ["P", "N", "R"],
    ["P", "N", "S"],
    [None, "N", "R"],
    ["P", None, "R"],
])
print(sequences)
# torch.tensor([[0, 0, 0], [0, 0, 1], [2, 0, 0], [0, 3, 0]])
experiment = {
    "sequences": sequences,
    "responses": torch.tensor([0.1, 0.4, 0.5, 0.2]),
    "batch_ids": torch.tensor([0, 0, 1, 1]),
}

# Run Bayesian optimization to get the next sequences to measure.
new_sequences = get_next_design(
    SCHEMA, CONSTRAINTS, FEATURE_BLOCKS, GIBBS_BLOCKS,
    experiment, design_size=2,
)
print(new_sequences)
# torch.tensor([[1, 1, 1], [1, 2, 0]])
print(decode_design(SCHEMA, new_sequences))
# [["L", "Y", "S"], ["L", T", "R"]]

Parameters

schema (OrderedDict) – A schema dict.
constraints (list) – A list of zero or more Constraint objects.
feature_blocks (list) – A list of choice blocks for linear regression.
gibbs_blocks (list) – A list of choice blocks for Gibbs sampling.
experiment (dict) – A dict containing all old experiment data.
design_size (int) – Number of designs to try to return (sometimes fewer designs are found).
feature_fn (callable) – An optional callback to generate additional features. If provided, this function should input a batch of sequences (say of shape batch_shape) and return a floating point tensor of of shape batch_shape + (F,) for some number of features F. This will be called internally during inference.
config (dict) – Optional config dict. See keyword arguments to thompson_sample() for details.

Returns

A tensor of encoded new sequences to measure, i.e. a design.

Return type

torch.Tensor

pyroed.api.start_experiment(schema: Dict[str, List[Optional[str]]], sequences: torch.Tensor, responses: torch.Tensor, batch_ids: Optional[torch.Tensor] = None) → Dict[str, torch.Tensor][source]¶

Creates a cumulative experiment with initial data.

Example:

SCHEMA = OrderedDict()
SCHEMA["aa1"] = ["P", "L", None]
SCHEMA["aa2"] = ["N", "Y",  "T", None]
SCHEMA["aa3"] = ["R", "S"]

sequences = torch.tensor([[0, 0, 0], [0, 0, 1], [2, 0, 0]])
responses = torch.tensor([0.1, 0.4, 0.5])

experiment = start_experiment(SCHEMA, sequences, responses)

Parameters

schema (OrderedDict) – A schema dict.
sequences (torch.Tensor) – A tensor of encoded sequences that have been measured.
responses (torch.Tensor) – A tensor of the measured responses of sequences.
batch_ids (torch.Tensor) – An optional tensor of batch ids.

Returns

A cumulative experiment dict.

Return type

dict

pyroed.api.update_experiment(schema: Dict[str, List[Optional[str]]], experiment: Dict[str, torch.Tensor], new_sequences: torch.Tensor, new_responses: torch.Tensor, new_batch_ids: Optional[torch.Tensor] = None) → Dict[str, torch.Tensor][source]¶

Updates a cumulative experiment by appending new data.

Note this does not modify its arguments; you must capture the result:

experiment = update_experiment(
    SCHEMA, experiment, new_sequences, new_responses, new_batch_ids
)

Parameters

schema (OrderedDict) – A schema dict.
experiment (dict) – A dict containing all old experiment data.
new_sequences (torch.Tensor) – A set of new sequences that have been measured. These may simply be the design returned by get_next_design(), or may be arbitrary new sequences you have decided to measure, or old sequences you have measured again, or a combination of all three.
new_responses (torch.Tensor) – A tensor of the measured responses of sequences.
new_batch_ids (torch.Tensor) – An optional tensor of batch ids.

Returns

A concatenated experiment.

Return type

dict