I have datasetA with 90,000 rows and datasetB with 5,000 rows. Each dataset has a column called “ID” with employee IDs. My goal is to to create another column in datasetA that identifies whether the employee ID in datasetA is also in datasetB with a True/False. Additionally, there are most likely some multiples for certain employees/employee ids in both datasets. I am fairly certain that the code I wrote works, but it is way too slow, and I was wondering what I could change to make it faster? Thanks!
#Creating the new column to identify whether the ID in datasetA is also in datasetB. datasetA("inB") = "Empty" # Looping through for id_num in datasetA("ID"): filt = (datasetA("ID") == id_num) if (datasetB("ID") == id_num).any(): datasetA.loc(filt, "inB") = True else: datasetA.loc(filt, "inB") = False ```