Scenario: One of my friend was tasked with merging/concatenating multiple CSV files. Files have different/new column(s) i.e. some files having different shape/structure(varying column(s) and rows). How can you merge them into one CSV file? without loss of data (both columns and rows)?
Solution: My friend tried but nothing worked for him and asked me to give it a try. Solutions out there on the Web dint satisfactorily solved the requirement, also the code looked complex for me for a routine problem of merging, and I felt there can be a simple solution. As most of the solutions though combined multiple files, there was some loss of data(rows).
So after some thought below is working solution:-
Ensure all CSV files are placed in a single directory on the disk and use below code.
Above code was tested on merging 16 CSV files, each with approx:10,000 rows with varying column(s).
Please share your experiences on merging and if this post helped you.
Solution: My friend tried but nothing worked for him and asked me to give it a try. Solutions out there on the Web dint satisfactorily solved the requirement, also the code looked complex for me for a routine problem of merging, and I felt there can be a simple solution. As most of the solutions though combined multiple files, there was some loss of data(rows).
So after some thought below is working solution:-
Ensure all CSV files are placed in a single directory on the disk and use below code.
import pandas as pd import os import glob def merge_files(files): #files[0] contains the name of the 1st csv file you want to merge. df = pd.read_csv(files[0]) #Iterate from 2nd file to list of all file names to merge. for cur_csvfile in files[1:]: #And perform outer join with previous dataframe df = df.merge(pd.read_csv(cur_csvfile,encoding='latin1'), how='outer') return df #Code starts here srcpath="D:\\Data Science\\filenames\\" os.chdir(srcpath) #If you need a list of filenames that all have a certain extension, prefix, or any common string in the middle, #use glob instead of writing code to scan the directory contents yourself. fileNames_toMerge = glob.glob("*.csv") mergedDf = merge_files(fileNames_toMerge) mergedDf.to_csv("D:\\Data Science\\Full\\mergedFile.csv",index=None)
Above code was tested on merging 16 CSV files, each with approx:10,000 rows with varying column(s).
Please share your experiences on merging and if this post helped you.