Tuesday, December 13, 2011

CSV HELL!!!

Recently I had the pleasure to work with CSV files. And it ended in a CSV hell for me! The functionality I wanted wasn't available anywhere (As often happens when you provide custom solutions to your clients!) The client didn't want to rely completely on the system! (I question the need for the system in such circumstances, but the client wanted the system! Brilliant! Kind of like 'hey, I'll be travelling in Bus but just in case lets have that train available and tracks laid out!)
Anyway, I couldn't get them to change their preference and so this hell started. Now the situation is the client uses OpenOffice and/or MS Excel to edit the CSV's. They have excelsheets for the data and they export that to csv. I assumed I could write a parser for CSV and that would be it. But when I got to the task I just couldn't figure out the correct logic. One logic worked for some and failed for some other data and the other worked for some and failed for some other.
After playing with it for couple of days, I was really frustrated enough that I dumped the csv thing and instead used single quotes as field delimiters and "|" as separators. This made my logic simple and in 2-3 days I had a working parser for these files.
Next hard part come when I tried exporting excel sheets from both MS Excel and Open Office - both would screw up the file somehow - sometimes it was the field delimiters, sometimes the separators. After wasting one more day on this, I finally resorted to manually editing the files. GEdit's find and replace function came really handy then.
Four more days later I had the supposed to be csv's ready and I finished the task. It shouldn't have taken  that much time but as it happens the raw data coming from client had so many problems in it, like improper fields, fields exceeding lengths(these brought in duplicate record errors in mysql), and many others.
All through and through, it was quite an experience for me to work on this challenge.

No comments:

Post a Comment