Finding similar values within a list of lists in Python

I am working on a machine learning problem with object detection. For now, I am trying to find GPS coordinates that are close to other GPS coordinates. If they are close, I want to make note of it by index. So in my example below, with test data, these two areas are not actually close to one another, so their ‘close_points_index’ should be just their index. But my actual data set has ~100k observations.

This code is slow with 100k observations. I am looking for some help optimizing this code as I can get correct output but would like it if someone could point out any inefficiencies.

My data looks like:

``````({'area_name': 'ElephantRock', 'us_state': 'Colorado', 'url': 'https://www.mountainproject.com/area/105746486/elephant-rock', 'lnglat': (38.88463, -106.15182), 'metadata': {'lnglat_from_parent': False}}, {'area_name': 'RaspberryBoulders', 'us_state': 'Colorado', 'url': 'https://www.mountainproject.com/area/108289128/raspberry-boulders', 'lnglat': (39.491, -106.0501), 'metadata': {'lnglat_from_parent': False}})
``````

My code solution is below. I avoided using two for loops but realize that I am sure a map() is just syntatical sugar for a for loop. Note that latLongDistance I assume is fairly optimized but if not I don’t mind. My focus is on my findClusters() function.

``````from math import cos, asin, sqrt, pi
from functools import partial

def latLongDistance(coord1, coord2):

lat2 = coord2(0)
lat1 = coord1(0)

lon1 = coord1(1)
lon2 = coord2(1)

p = pi/180
a = 0.5 - cos((lat2-lat1)*p)/2 + cos(lat1*p) * cos(lat2*p) * (1-cos((lon2-lon1)*p))/2

kmDistance = 12742 * asin(sqrt(a))

return kmDistance

def findClusters(listOfPoints, thresholdValueM = 800):

coords = (x('lnglat') for x in listOfPoints)

for index, data in enumerate(listOfPoints):

lngLat = data('lnglat')

modifiedLLDistance = partial(latLongDistance,coord2 = lngLat)

listOfDistances = list(map(modifiedLLDistance,coords))

meterDistance = (x*1000 for x in listOfDistances)

closePoints = (i for i in range(len(meterDistance)) if meterDistance(i) < thresholdValueM)

listOfPoints(index)('close_points_index') = closePoints

return listOfPoints

``````

After the function is ran, see below. Note that these have multiple indices as I ran this output on the actual data set. If I were to run just these two points their indices should be: (0) and (1) respectively.

``````
({'area_name': 'ElephantRock', 'us_state': 'Colorado', 'url': 'https://www.mountainproject.com/area/105746486/elephant-rock', 'lnglat': (38.88463, -106.15182), 'metadata': {'lnglat_from_parent': False}, 'close_points_index': (0)}, {'area_name': 'RaspberryBoulders', 'us_state': 'Colorado', 'url': 'https://www.mountainproject.com/area/108289128/raspberry-boulders', 'lnglat': (39.491, -106.0501), 'metadata': {'lnglat_from_parent': False}, 'close_points_index': (1)})

``````

I’ve experimented with a few things, but am coming up short. Primarily, I am a bit inexperienced with finding speed increases as I am relatively new to Python. Any critical input would be helpful. I have not posted here so let me know if I need some more information for it to be reproducible.