# python – Vectorized implementation of Maximal Marginal Relevance

I am implementing the Maximal Marginal Relevance criterion for selecting relevant & diverse items from a collection R. Essentially, the implementation directly follows this formula:

``````from typing import Optional

import numpy as np
import torch as T

def maximal_marginal_relevance(
sim_di_q: T.Tensor, sim_di_dq: T.Tensor, lam: float, num_iters: Optional(int) = None
):
"""Computes the Maximal Marginal Relevance (Carbonell and Goldstein, 1998) given two similarity arrays.

Args:
sim_di_q (T.Tensor): Similarity between candidates and query, shape: (N,).
sim_di_dq (T.Tensor): Pairwise similarity between candidates, shape: (N, N).
lam (float): Lambda parameter: 0 = maximal diversity, 1 = greedy.
num_iters (int, optional): Number of results to return. If none, all items are used.

Returns:
T.Tensor: Ordering of the candidates with respect to the MMR scoring.
"""
if num_iters is None:
num_iters = len(sim_di_q)

if not (0 < num_iters <= len(sim_di_q)):
raise ValueError("Number of iterations must be in (0, num_pids).")

if not (0 <= lam <= 1):
raise ValueError("lambda must be in (0, 1).")

R = np.arange(len(sim_di_q))
S = np.array((sim_di_q.argmax()))

for _ in range(num_iters - 1):
cur_di = R(~np.isin(R, S))  # Di in RS

lhs = sim_di_q(cur_di)
rhs = sim_di_dq(S(:, None), cur_di).max(axis=0).values

idx = np.argmax(lam * lhs - (1 - lam) * rhs)

S = np.append(S, (cur_di(idx)))

return T.tensor(S)
``````

My code works as expected — tested on this example, but I feel like two things in particular can be improved:

• performance: I have some concerns regarding `S = np.append(S, (cur_di(idx)))`
• code-quality: type annotations and possibility of expressing the shape of the arrays as type annotations — what’s the best practice here?