My current solution is to simply replace the delimiter with a single-char before calling read_csv
.
from io import StringIO
def read_comma_space(fname):
with open(fname, 'r') as f:
text = f.read().replace(', ', ',')
s = StringIO(text)
return pd.read_csv(s, header=0, sep=',')
This enables use of the C engine, but must make multiple passes over the file. Compared to a baseline read of the file (read_csv(fname, header=0, sep=',')
), this solution adds about 50% to total execution time in my tests. This is much better than the ~8x execution time over the baseline of the python engine.