I am implementing the Maximal Marginal Relevance criterion for selecting relevant & diverse items from a collection R. Essentially, the implementation directly follows this formula:
from typing import Optional import numpy as np import torch as T def maximal_marginal_relevance( sim_di_q: T.Tensor, sim_di_dq: T.Tensor, lam: float, num_iters: Optional(int) = None ): """Computes the Maximal Marginal Relevance (Carbonell and Goldstein, 1998) given two similarity arrays. Args: sim_di_q (T.Tensor): Similarity between candidates and query, shape: (N,). sim_di_dq (T.Tensor): Pairwise similarity between candidates, shape: (N, N). lam (float): Lambda parameter: 0 = maximal diversity, 1 = greedy. num_iters (int, optional): Number of results to return. If none, all items are used. Returns: T.Tensor: Ordering of the candidates with respect to the MMR scoring. """ if num_iters is None: num_iters = len(sim_di_q) if not (0 < num_iters <= len(sim_di_q)): raise ValueError("Number of iterations must be in (0, num_pids).") if not (0 <= lam <= 1): raise ValueError("lambda must be in (0, 1).") R = np.arange(len(sim_di_q)) S = np.array((sim_di_q.argmax())) for _ in range(num_iters - 1): cur_di = R(~np.isin(R, S)) # Di in RS lhs = sim_di_q(cur_di) rhs = sim_di_dq(S(:, None), cur_di).max(axis=0).values idx = np.argmax(lam * lhs - (1 - lam) * rhs) S = np.append(S, (cur_di(idx))) return T.tensor(S)
My code works as expected — tested on this example, but I feel like two things in particular can be improved:
- performance: I have some concerns regarding
S = np.append(S, (cur_di(idx)))
- code-quality: type annotations and possibility of expressing the shape of the arrays as type annotations — what’s the best practice here?