Relevancy Interfaces

RankRelevancy

This interface defines one method: rank. The rank method takes examples of relevant and not-relevant example descriptor vectors as numpy.ndarray sequences and uses them to compute relevancy scores (on a [0, 1] scale) on a provided pool of other descriptor vectors.

class smqtk_relevancy.interfaces.rank_relevancy.RankRelevancy(*args: Any, **kwargs: Any)[source]

Algorithm that can rank a given pool of descriptors based on positively and negatively adjudicated descriptors.

abstract rank(pos: Sequence[numpy.ndarray], neg: Sequence[numpy.ndarray], pool: Sequence[numpy.ndarray]) → Sequence[float][source]

Assign a relevancy score to each input descriptor in pool based on the positively and negatively adjudicated descriptors in pos and neg respectively.

Parameters

pos – Sequence of positively adjudicated descriptor vectors.
neg – Sequence of negatively adjudicated descriptor vectors.
pool – A sequence of descriptor vectors that we want to rank by topical relevancy relative to the given positive and negative examples.

Returns

An ordered sequence of float values denoting the relevancy of pool elements

RankRelevancyWithFeedback

This interface defines one method: rank_with_feedback. Like RankRelevancy.rank(), rank_with_feedback takes examples of relevant and not-relevant example descriptor vectors as numpy.ndarray sequences and uses them to compute relevancy scores (on a [0, 1] scale) on a provided pool of other descriptor vectors. However, it also expects a sequence of corresponding UIDs for the pool vectors and additionally returns a sequence of UIDs, possibly not all from the pool, on which feedback would be most useful.

class smqtk_relevancy.interfaces.rank_relevancy.RankRelevancyWithFeedback(*args: Any, **kwargs: Any)[source]

Similar to the RankRelevancy algorithm but with the added feature of also returning a sequence of elements from which feedback would be “most useful”.

What “most useful” means may be flexible but generally refers to the goal of reducing the amount of adjudications required in order to separate true-positive examples from true-negative examples in provided pools via the assigned relevancy scores. E.g. other elements may be adjudicated in some quantity to achieve some level of relevant sample separation, but if the feedback requests are instead adjudicated, less elements may need to be adjudicated to achieve and equivalent level of separation.

Feedback requests ought to be returned in a form that is meaningful for the user to be able to properly convey the proper information to the adjudicating agent to actually perform adjudications. Additionally, we want to be able to request feedback from elements that may not be present in the given pool of descriptors.

Towards that end, this algorithm should be given a sequence of UIDs for the given pool of descriptors. This allows the implementation to potentially coordinate with an outside source of descriptor references such that the returned feedback requests may be interpreted uniformly.

abstract _rank_with_feedback(pos: Sequence[numpy.ndarray], neg: Sequence[numpy.ndarray], pool: Sequence[numpy.ndarray], pool_uids: Sequence[collections.abc.Hashable]) → Tuple[Sequence[float], Sequence[collections.abc.Hashable]][source]: Implement rank_with_feedback(). pool and pool_uids have already been checked to be of equal length.

See also

rank_with_feedback()’s doc-string for the meanings of the parameters and their return values

rank_with_feedback(pos: Sequence[numpy.ndarray], neg: Sequence[numpy.ndarray], pool: Sequence[numpy.ndarray], pool_uids: Sequence[collections.abc.Hashable]) → Tuple[Sequence[float], Sequence[collections.abc.Hashable]][source]

Assign a relevancy score to each input descriptor in pool based on the positively and negatively adjudicated descriptors in pos and neg respectively, additionally returning a sequence of UIDs of those descriptors for which adjudication feedback would be “most useful”.

Parameters

pos – Sequence of positively adjudicated descriptor vectors.
neg – Sequence of negatively adjudicated descriptor vectors.
pool – A sequence of descriptor vectors that we want to rank by topical relevancy relative to the given positive and negative examples.
pool_uids – A sequence of hashable UID values, parallel in association with descriptors in pool.

Returns

Ordered sequence of float values denoting relevancy of pool elements, as well as a sequence of Hashable values referencing in-pool or out-of-pool descriptors we recommend for adjudication feedback. In the latter sequence, descriptors are ordered by usefulness, most to least.

Raises

ValueError – pool and pool_uids are of different length

RelevancyIndex

[Deprecated] Please use RankRelevancy instead of RelevancyIndex

This interface defines two methods: build_index and rank. The build_index method is, like a NearestNeighborsIndex, used to build an index of DescriptorElement instances. The rank method takes examples of relevant and not-relevant DescriptorElement examples with which the algorithm uses to rank (think sort) the indexed DescriptorElement instances by relevancy (on a [0, 1] scale).