Interface¶
Pyroed’s high-level interface includes a design language and a set of functions to operate on Python data structures.
The design language allows you to specify a problem by defining a
SCHEMA
, a list CONSTRAINTS
of Constraint
objects, a list FEATURE_BLOCKS
defining cross features, and a list
GIBBS_BLOCKS
defining groups of features that are related to each other.
The examples in this module will use the following model specification:
SCHEMA = OrderedDict()
SCHEMA["aa1"] = ["P", "L", None]
SCHEMA["aa2"] = ["N", "Y", "T", None]
SCHEMA["aa3"] = ["R", "S"]
CONSTRAINTS = [Not(And(TakesValue("aa1", None), TakesValue("aa2", None)))]
FEATURE_BLOCKS = [["aa1"], ["aa2"], ["aa3"], ["aa1", "aa2"], ["aa2", "aa3"]]
GIBBS_BLOCKS = [["aa1", "aa2"], ["aa2", "aa3"]]
After declaring the design space, we can progressively gather data into an
experiment
dict by using the functions in this module and by experimentally
measuring sequences.
encode_design()
anddecode_design()
convert between text-representations of designs like[["P", "N", "R"], ["P", "N", "S"]]
and PyTorch representations of designs liketorch.tensor([[0, 0, 0], [0, 0, 1]])
.start_experiment()
initializes an experiment dict,get_next_design()
suggests a next set of sequences to test, andupdate_experiment()
updates an experiment dict with measured responses.
Note that get_next_design()
merely retuns suggested sequences; you can
ignore these suggestions or measure a different set of sequences if you want.
For example if some of your measurements are lost due to technical reasons, you
can simply pass a subset of the suggested design back to
update_experiment()
.
- pyroed.api.decode_design(schema: Dict[str, List[Optional[str]]], sequences: torch.Tensor) List[List[Optional[str]]] [source]¶
Converts an tensor representation of a design into a readable list of designs.
Example:
SCHEMA = OrderedDict() SCHEMA["aa1"] = ["P", "L", None] SCHEMA["aa2"] = ["N", "Y", "T", None] SCHEMA["aa3"] = ["R", "S"] sequences = torch.tensor([[0, 0, 0], [0, 0, 1], [2, 0, 0]]) design = decode_design(SCHEMA, sequences) print(design) # [["P", "N", "R"], ["P", "N", "S"], [None, "N", "R"]]
- Parameters
schema (OrderedDict) – A schema dict.
sequences (torch.Tensor) – A tensor of encoded sequences.
- Returns
A list of list of choices (strings or None).
- Return type
- pyroed.api.encode_design(schema: Dict[str, List[Optional[str]]], design: Iterable[List[Optional[str]]]) torch.Tensor [source]¶
Converts a human readable list of sequences into a tensor.
Example:
SCHEMA = OrderedDict() SCHEMA["aa1"] = ["P", "L", None] SCHEMA["aa2"] = ["N", "Y", "T", None] SCHEMA["aa3"] = ["R", "S"] design = [ ["P", "N", "R"], ["P", "N", "S"], [None, "N", "R"], ["P", None, "R"], ] sequences = encode_design(SCHEMA, design) print(sequences) # torch.tensor([[0, 0, 0], [0, 0, 1], [2, 0, 0], [0, 3, 0]])
- Parameters
schema (OrderedDict) – A schema dict.
design (list) – A list of list of choices (strings or None).
- Returns
A tensor of encoded sequences.
- Return type
torch.tensor
- pyroed.api.get_next_design(schema: Dict[str, List[Optional[str]]], constraints: List[Callable], feature_blocks: List[List[str]], gibbs_blocks: List[List[str]], experiment: Dict[str, torch.Tensor], *, design_size: int = 10, feature_fn: Optional[Callable[[torch.Tensor], torch.Tensor]] = None, config: Optional[dict] = None) torch.Tensor [source]¶
Generate a new design given cumulative experimental data.
Under the hood this runs
thompson_sample()
, which performs Bayesian inference via either variational inferencefit_svi()
or MCMCfit_mcmc()
and performs optimization viaoptimize_simulated_annealing()
. These algorithms can be tuned through theconfig
dict.Example:
# Initialize experiment. sequences = encode_design(SCHEMA, [ ["P", "N", "R"], ["P", "N", "S"], [None, "N", "R"], ["P", None, "R"], ]) print(sequences) # torch.tensor([[0, 0, 0], [0, 0, 1], [2, 0, 0], [0, 3, 0]]) experiment = { "sequences": sequences, "responses": torch.tensor([0.1, 0.4, 0.5, 0.2]), "batch_ids": torch.tensor([0, 0, 1, 1]), } # Run Bayesian optimization to get the next sequences to measure. new_sequences = get_next_design( SCHEMA, CONSTRAINTS, FEATURE_BLOCKS, GIBBS_BLOCKS, experiment, design_size=2, ) print(new_sequences) # torch.tensor([[1, 1, 1], [1, 2, 0]]) print(decode_design(SCHEMA, new_sequences)) # [["L", "Y", "S"], ["L", T", "R"]]
- Parameters
schema (OrderedDict) – A schema dict.
constraints (list) – A list of zero or more
Constraint
objects.feature_blocks (list) – A list of choice blocks for linear regression.
gibbs_blocks (list) – A list of choice blocks for Gibbs sampling.
experiment (dict) – A dict containing all old experiment data.
design_size (int) – Number of designs to try to return (sometimes fewer designs are found).
feature_fn (callable) – An optional callback to generate additional features. If provided, this function should input a batch of sequences (say of shape
batch_shape
) and return a floating point tensor of of shapebatch_shape + (F,)
for some number of featuresF
. This will be called internally during inference.config (dict) – Optional config dict. See keyword arguments to
thompson_sample()
for details.
- Returns
A tensor of encoded new sequences to measure, i.e. a
design
.- Return type
- pyroed.api.start_experiment(schema: Dict[str, List[Optional[str]]], sequences: torch.Tensor, responses: torch.Tensor, batch_ids: Optional[torch.Tensor] = None) Dict[str, torch.Tensor] [source]¶
Creates a cumulative experiment with initial data.
Example:
SCHEMA = OrderedDict() SCHEMA["aa1"] = ["P", "L", None] SCHEMA["aa2"] = ["N", "Y", "T", None] SCHEMA["aa3"] = ["R", "S"] sequences = torch.tensor([[0, 0, 0], [0, 0, 1], [2, 0, 0]]) responses = torch.tensor([0.1, 0.4, 0.5]) experiment = start_experiment(SCHEMA, sequences, responses)
- Parameters
schema (OrderedDict) – A schema dict.
sequences (torch.Tensor) – A tensor of encoded sequences that have been measured.
responses (torch.Tensor) – A tensor of the measured responses of sequences.
batch_ids (torch.Tensor) – An optional tensor of batch ids.
- Returns
A cumulative experiment dict.
- Return type
- pyroed.api.update_experiment(schema: Dict[str, List[Optional[str]]], experiment: Dict[str, torch.Tensor], new_sequences: torch.Tensor, new_responses: torch.Tensor, new_batch_ids: Optional[torch.Tensor] = None) Dict[str, torch.Tensor] [source]¶
Updates a cumulative experiment by appending new data.
Note this does not modify its arguments; you must capture the result:
experiment = update_experiment( SCHEMA, experiment, new_sequences, new_responses, new_batch_ids )
- Parameters
schema (OrderedDict) – A schema dict.
experiment (dict) – A dict containing all old experiment data.
new_sequences (torch.Tensor) – A set of new sequences that have been measured. These may simply be the
design
returned byget_next_design()
, or may be arbitrary new sequences you have decided to measure, or old sequences you have measured again, or a combination of all three.new_responses (torch.Tensor) – A tensor of the measured responses of sequences.
new_batch_ids (torch.Tensor) – An optional tensor of batch ids.
- Returns
A concatenated experiment.
- Return type