My current solution is to simply replace the delimiter with a single-char before calling
from io import StringIO def read_comma_space(fname): with open(fname, 'r') as f: text = f.read().replace(', ', ',') s = StringIO(text) return pd.read_csv(s, header=0, sep=',')
This enables use of the C engine, but must make multiple passes over the file. Compared to a baseline read of the file (
read_csv(fname, header=0, sep=',')), this solution adds about 50% to total execution time in my tests. This is much better than the ~8x execution time over the baseline of the python engine.