This web page attempts to offer a solution to conversin of XLSX files (Excel workbooks) to CSV files.
This is easily done using Excel itself using its "Save As" option, but if one needs to do it in a batch mode and on a server machine,
where it is customary not to install MS Office nor any s/w drivers including OLEDB, it becomes a real problem.
So, we offer a pure Powershell solution to conversion of xlsx files into csv.
A few points before presenting the solution:
- It turned out that XLSX file is a zip file which contains several xml files with data.
- The individual sheets are kept as separate XML files.
- To save on space even more Microsoft stores all the character literal values in one common xml dictionary file. The individual cell value found for this string in the actual .xml file is just an occurence number in this dictionary file.
- Dates are stored as day number since 1900/01/01 (at least they are supposed to. I discovered that one has to subtract 2 from this number of days to get the correct conversion).
- Time portion of the date is stored as a fraction of a day, so it has to be multiplied by 60*60*24 (86400) to get the actual number os seconds.
- Microsoft does not store empty cells or rows, so any gaps between values have to be taken care by the code.
- To figure out the number of skipped columns one need to be able to figure out the distance between, say, cell "AB67" and "C67".
The way columns are named: A through Z, then AA through AZ, then AAA through AAZ, etc.,
suggests that we may assume they are using a base-26 system and therefore use a simple conversion method from a base-26 to the decimal system
and then use subtraction to find out the number of commas between columns.
- If a cell value contains a comma, the enire value has to be enclosed in doublequotes.
- If a cell value contains a doublequote each of them has to be doubled and then the value shoudl be enclosed in doublequotes.
With this in mind here is the code:
To call this code you may use a command like this:
The resulting will be printed, so you will have to redirect it to a file.