As you can see in the first image, the first data group is ** HOLE and the second data group is ** GEOL in a single CSV file. And I have almost 500 CSV files that basically have the same format.
I can successfully extract the group data (** HOLE) because the header line in all CSV files is set as follows:
import pandas as pd
from os import chdir
#define and direct to the file path
csv_file_path = 'C:/Users/Taurus Yong/Desktop/AGS3.1_new'
#create the file list in the path
csv_files = glob.glob('*.csv')
#create an empty list used later
list_data = ()
#loop to read csv files
for filename in csv_files:
holeid = pd.read_csv(filename,header=5,skiprows=(7),usecols=(0,10,11,21,22))
#get the holeid group only
holeid = holeid(:holeid('*HOLE_FDEP').isnull().argmax())
#filter all Trial Pits
holeid = holeid(~holeid('*HOLE_ID').str.contains('TP'))
#filter all the borehole depth less than 3m
holeid('*HOLE_FDEP') = pd.to_numeric(holeid('*HOLE_FDEP'))
#combine new and old datas
#merge all the data
datas = pd.concat(list_data,ignore_index=False)
#import all data to a new csv file
export_csv = datas.to_csv(r'C:UsersTaurus YongDesktopholeid.csv')
I would like to extract the group data (** GEOL) from all CSV files.
However, the code does not work if I have different records. As you can see in the picture, there are 3 records in this CSV file in the group (** HOLE) (22226DH1, 22226TP1, 22226TP2), but possibly 5 records in the group (** HOLE). Files, each CSV file has a different number of records, which means that the row as the header is not set for all CSV files.
I tried to run the code in a single CSV file:
#find the location of "**GEOL"
idx = data(data('*HOLE_ID') == '**GEOL').index
idx = idx + 1
data = pd.read_csv(io,header=idx)
But does not work since header =? must be an integer.
Is there a way to extract the ** GEOL data for all 500 CSV files and save the data frame in a new CSV file?
Thanks for your help!