Relevancy Interfaces
RankRelevancy
This interface defines one method: rank. The rank method
takes examples of relevant and not-relevant example descriptor vectors
as numpy.ndarray sequences and uses them to compute relevancy
scores (on a [0, 1] scale) on a provided pool of other descriptor
vectors.
- class smqtk_relevancy.interfaces.rank_relevancy.RankRelevancy(*args: Any, **kwargs: Any)[source]
Algorithm that can rank a given pool of descriptors based on positively and negatively adjudicated descriptors.
- abstract rank(pos: Sequence[numpy.ndarray], neg: Sequence[numpy.ndarray], pool: Sequence[numpy.ndarray]) Sequence[float][source]
Assign a relevancy score to each input descriptor in pool based on the positively and negatively adjudicated descriptors in pos and neg respectively.
- Parameters
pos – Sequence of positively adjudicated descriptor vectors.
neg – Sequence of negatively adjudicated descriptor vectors.
pool – A sequence of descriptor vectors that we want to rank by topical relevancy relative to the given positive and negative examples.
- Returns
An ordered sequence of float values denoting the relevancy of pool elements
RankRelevancyWithFeedback
This interface defines one method: rank_with_feedback. Like
RankRelevancy.rank(), rank_with_feedback takes examples of
relevant and not-relevant example descriptor vectors as
numpy.ndarray sequences and uses them to compute relevancy
scores (on a [0, 1] scale) on a provided pool of other descriptor
vectors. However, it also expects a sequence of corresponding UIDs
for the pool vectors and additionally returns a sequence of UIDs,
possibly not all from the pool, on which feedback would be most
useful.
- class smqtk_relevancy.interfaces.rank_relevancy.RankRelevancyWithFeedback(*args: Any, **kwargs: Any)[source]
Similar to the
RankRelevancyalgorithm but with the added feature of also returning a sequence of elements from which feedback would be “most useful”.What “most useful” means may be flexible but generally refers to the goal of reducing the amount of adjudications required in order to separate true-positive examples from true-negative examples in provided pools via the assigned relevancy scores. E.g. other elements may be adjudicated in some quantity to achieve some level of relevant sample separation, but if the feedback requests are instead adjudicated, less elements may need to be adjudicated to achieve and equivalent level of separation.
Feedback requests ought to be returned in a form that is meaningful for the user to be able to properly convey the proper information to the adjudicating agent to actually perform adjudications. Additionally, we want to be able to request feedback from elements that may not be present in the given pool of descriptors.
Towards that end, this algorithm should be given a sequence of UIDs for the given pool of descriptors. This allows the implementation to potentially coordinate with an outside source of descriptor references such that the returned feedback requests may be interpreted uniformly.
- abstract _rank_with_feedback(pos: Sequence[numpy.ndarray], neg: Sequence[numpy.ndarray], pool: Sequence[numpy.ndarray], pool_uids: Sequence[collections.abc.Hashable]) Tuple[Sequence[float], Sequence[collections.abc.Hashable]][source]
Implement
rank_with_feedback(). pool and pool_uids have already been checked to be of equal length.See also
rank_with_feedback()’s doc-string for the meanings of the parameters and their return values
- rank_with_feedback(pos: Sequence[numpy.ndarray], neg: Sequence[numpy.ndarray], pool: Sequence[numpy.ndarray], pool_uids: Sequence[collections.abc.Hashable]) Tuple[Sequence[float], Sequence[collections.abc.Hashable]][source]
Assign a relevancy score to each input descriptor in pool based on the positively and negatively adjudicated descriptors in pos and neg respectively, additionally returning a sequence of UIDs of those descriptors for which adjudication feedback would be “most useful”.
- Parameters
pos – Sequence of positively adjudicated descriptor vectors.
neg – Sequence of negatively adjudicated descriptor vectors.
pool – A sequence of descriptor vectors that we want to rank by topical relevancy relative to the given positive and negative examples.
pool_uids – A sequence of hashable UID values, parallel in association with descriptors in pool.
- Returns
Ordered sequence of float values denoting relevancy of pool elements, as well as a sequence of
Hashablevalues referencing in-pool or out-of-pool descriptors we recommend for adjudication feedback. In the latter sequence, descriptors are ordered by usefulness, most to least.- Raises
ValueError – pool and pool_uids are of different length
See also
RankRelevancyWithFeedbackclass doc-string for discussion on “most useful” meaning.
RelevancyIndex
[Deprecated] Please use RankRelevancy instead of RelevancyIndex
This interface defines two methods: build_index and rank.
The build_index method is, like a NearestNeighborsIndex, used to build an index of DescriptorElement instances.
The rank method takes examples of relevant and not-relevant DescriptorElement examples with which the algorithm uses to rank (think sort) the indexed DescriptorElement instances by relevancy (on a [0, 1] scale).