I have a table data
stored in a database ships.db
,
the data are informations of tracked ships hourly.
The table data looks like this.
time | shipId | longitude | latitude
---------------------------------------------------------------------
00:00:00 1 xx.xxxx yy.yyyy
00:00:00 2 xx.xxxx yy.yyyy
00:00:00 3 xx.xxxx yy.yyyy
00:00:00 4 xx.xxxx yy.yyyy
01:00:00 2 xx.xxxx yy.yyyy
01:00:00 4 xx.xxxx yy.yyyy
... ... ... ...
23:00:00 4 xx.xxxx yy.yyyy
Splitting the whole earth to a grid of 5-degree width and length for each cell,
I would get the number of fetched records hourly per cell of that grid.
Note that the number of records are not the same each hour because some ships are not more live therefore not fetched.
I wrote this code in python, it works but it takes large time because the database has roughly 250000
records.
Is there another method or approach to make it better and faster in python?
My script:
import sqlite3
def writeToFile(string, file):
with open(file,"a") as ouf:
ouf.write(string+"n")
output = "report.txt"
with sqlite3.connect("ships.db") as con:
cur = con.cursor()
#iterate over times from 0 to 23 (hours)
for hour in range(0,24): # hours: from 0 to 23
#make each loop of time in this time format "hh:00:00"
time = str(hour).zfill(2)+":00:00"
#scan from longitude -180 (180 W) to +180 (180 E) each 5 degree of longitude
for longitude in range(-180,180,5):
#scan from latitude -90 (90 S) to +90 (90 N) each 5 degree of latitude
for latitude in range(-90,90,5):
sql = f'''SELECT time, count(*) AS occurence FROM 'data'
WHERE time ="{time}"
AND latitude BETWEEN {latitude} AND {latitude+5}
AND longitude BETWEEN {longitude} AND {longitude+5}
GROUP BY time'''
data = cur.execute(sql).fetchone() #fetchone because group by time
if data != None:
time, occurence = data
else: #some cell of grid may have no ship at a hour therefore this else
occurence = None
result = (time, occurence, longitude, latitude)
#writing the result to output
writeToFile("t".join(result), output)