CSV (Comma-separated values) is a plain text format that stores tabular data with commas separating fields and line breaks separating records. Unlike proprietary spreadsheet formats, CSV files contain no formatting, formulas, or metadata, just raw data. For marketers, CSV serves as the exchange medium for bulk uploads to ecommerce platforms, keyword exports from SEO tools, and database migrations between incompatible systems.
What is CSV Format?
CSV represents tabular data where each line is a record and commas delimit fields within that record. The format originated in the early 1970s, with IBM's Fortran compiler supporting list-directed input/output using commas as delimiters by 1972 (IBM Fortran Documentation). The term "CSV" itself was in use by 1983 when it appeared in the Osborne Executive computer manual.
Before 2005, CSV existed as a de facto standard with fuzzy implementations across different software. The main standardization effort came with RFC 4180 in 2005, which defined CSV formally as a MIME Content Type with the identifier text/csv (RFC 4180). Later, W3C recommendations in 2013 and 2015 addressed metadata and semantic limitations in the original RFC (W3C Tabular Data Model).
CSV is technically a subset of delimiter-separated values. While the standard specifies commas, real-world usage includes variations using tabs (TSV) or semicolons, particularly in locales where commas serve as decimal points.
Why CSV Format Matters
CSV remains the workhorse for data exchange because it prioritizes compatibility over features:
- Universal interoperability. Every major ecommerce platform, spreadsheet program, and database system exports and imports CSV. You can move data from Google Analytics to Excel to MySQL without conversion loss.
- Human readability. Unlike binary formats, you can open a CSV in any text editor, examine the raw data, and fix errors by hand when automation fails.
- Small footprint. CSV files compress efficiently and consume less storage than equivalent Excel files, making them ideal for large datasets like keyword inventories or transaction logs (Adaltas Big Data Benchmark).
- Long-term accessibility. Because CSV contains only text, it survives software obsolescence. Data stored in legacy Excel formats may become unreadable; CSV persists.
- Bulk operations. SEO tools and advertising platforms universally accept CSV for bulk uploads of keywords, ad copy, product feeds, and redirects.
How CSV Format Works
The mechanics are simple, but edge cases create complexity:
Structure. A CSV file consists of records separated by line breaks. Each record contains fields separated by commas. The first line typically contains column headers, though RFC 4180 provides no way to confirm whether the first row is headers or data.
Quoting rules. Fields containing commas, double quotes, or line breaks must be enclosed in double quotes. If a field contains double quotes, those quotes are escaped by doubling them ("). For example: "Inflatable Elephant, African" requires quotes because of the embedded comma (Socrata CSV Documentation).
Line termination. RFC 4180 specifies carriage return and line feed (CR/LF) as the standard line terminator, though many Unix systems use only LF.
Encoding. CSV supports any character encoding (UTF-8, UTF-16, ASCII), but the format itself provides no mechanism to declare which encoding is used. This must be communicated separately, leading to common character display issues when sharing files across regions.
Data types. CSV stores everything as text. The string "1" and the number 1 look identical in the file. Applications must interpret types during import, and CSV makes no distinction between null and empty values (Frictionless Data Specs).
CSV Variations and Dialects
Not all CSV files follow identical rules. Python's csv module identifies several dialects:
| Dialect | Characteristics | Best For |
|---|---|---|
| excel | Default for Microsoft Excel; uses CRLF line endings | Windows environments |
| excel-tab | Uses tab characters instead of commas | Data with many commas |
| unix | Uses LF line endings and quotes all fields | Unix systems, data integrity |
Microsoft Excel specifically has restrictions compared to other software. As of 2019, Excel still cannot export CSV files in commonly used UTF-8 encoding without a Byte Order Mark workaround, and it does not enforce comma separators in all locales (Wikipedia CSV Article).
Best Practices
Explicitly set UTF-8 encoding. When generating CSVs for international markets, ensure your system exports UTF-8. Importing UTF-8 CSVs into older Excel versions may require using the Data > From Text wizard rather than double-clicking the file.
Include a header row. Always provide field names in the first row to prevent column mapping errors during import.
Validate column consistency. Every row should contain the same number of fields. Empty values should be represented as ,, not by omitting the delimiter.
Escape special characters properly. Scan your data for commas, quotation marks, and line breaks. Wrap these fields in double quotes and escape internal quotes by doubling them.
Check software limits before exporting large datasets. Microsoft Excel and LibreOffice limit CSV files to 1,048,576 rows. Apple Numbers limits files to 1,000,000 rows. Google Sheets limits to 10,000,000 cells total (Import2 CSV Limits Guide).
Common Mistakes
Mistake: Unescaped commas within fields. A product description containing "Red, White, and Blue" without quotes pushes subsequent data into the wrong columns. Fix: Wrap the field in quotes: "Red, White, and Blue".
Mistake: Assuming Excel handles UTF-8 automatically. Opening a UTF-8 CSV by double-clicking in Excel may display special characters as gibberish. Fix: Use Data > Get Data > From File > From Text/CSV to specify encoding during import.
Mistake: Confusing headers with data. Some applications treat the first row as data rather than headers, especially if the data looks similar (e.g., a SKU that resembles a word). Fix: Check your import settings to explicitly designate header rows.
Mistake: Exceeding row limits. Attempting to open a 2-million-row CSV in Excel truncates data silently without warning. Fix: Use database tools or specialized large-file viewers for datasets exceeding 1 million rows.
Mistake: Including line breaks within unquoted fields. A cell containing a hard return breaks the record structure. Fix: Ensure the field is quoted: "Line one\nLine two".
Examples
Basic keyword export:
Keyword,Search Volume,Difficulty
csv format,1900,45
comma separated values,800,32
Product feed with embedded commas:
SKU,Name,Description,Price
TEE-001,Graphic Tee,"Vintage style, 100% cotton, pre-shrunk",29.99
Database export with null values:
ID,Title,Author
1,"Great Expectations","Dickens"
2,,null
Note the empty field between commas representing a missing title, and the string "null" representing the author in record 2. CSV does not distinguish between null and empty strings (Python CSV Module Documentation).
CSV vs. Excel
Use CSV when you need portability and simplicity. Use Excel (XLSX) when you require formatting, multiple sheets, formulas, or data validation rules. CSV files open faster and parse programmatically with less memory overhead, making them superior for automated SEO pipelines. Excel files preserve presentation but create vendor lock-in and bloat file sizes.
Quick decision rule: If a human needs to read and format the data, use Excel. If a machine needs to process the data, use CSV.
FAQ
What is the difference between CSV and TSV? TSV (Tab-separated values) uses tabs instead of commas to separate fields. TSV works better for data containing many commas, such as text descriptions, but CSV enjoys wider software support.
Can CSV files contain formatting like colors or bold text? No. CSV stores only plain text data. If you need formatting, use Excel (XLSX) or OpenDocument (ODS) formats.
Why do special characters display incorrectly when I open a CSV in Excel? CSV files do not specify their character encoding internally. If the file uses UTF-8 but Excel assumes Windows-1252 encoding, characters like é or 中 appear garbled. Use Excel's import dialog to specify UTF-8 encoding explicitly.
Is there a standard for CSV? RFC 4180 from 2005 defines a common format and the MIME type text/csv, but adherence is voluntary. Many applications generate "CSV" files that deviate from the standard, particularly regarding line endings and quoting.
How do I handle commas within my data? Enclose the field in double quotes. If the data contains double quotes, escape them by doubling: " becomes "".
What are the size limits for CSV files? Individual applications impose limits. Excel supports 1,048,576 rows. LibreOffice supports the same. Google Sheets supports 10,000,000 cells (not rows). The CSV format itself has no theoretical limit.