BLOG
Data Format Cheat Sheet: JSON, CSV, XML, and YAML
A coworker once exported 50,000 rows of customer data from a database as CSV, converted it to JSON for an API import, and watched 3,000 records silently corrupt. The culprit? A customer name containing a comma: "Smith, Jr." The CSV parser split it into two fields. The JSON converter dutifully created two separate properties where there should have been one. Nobody noticed until customers started receiving mail addressed to "Jr."
Data formats look interchangeable from a distance. They all store structured information. But they have different rules about delimiters, nesting, types, encoding, and whitespace—and those differences eat your data when you convert carelessly. Here's what each format actually does, when to use it, and what breaks during conversion.
JSON: The Web's Native Data Format
JSON (JavaScript Object Notation) won the data format wars for web APIs. It's readable, lightweight, natively supported in every browser, and maps cleanly to data structures in virtually every programming language. If you're building or consuming a web API in 2026, you're almost certainly using JSON.
Basic JSON syntax:
{
"name": "Alice Chen",
"age": 32,
"active": true,
"roles": ["admin", "editor"],
"address": {
"city": "Portland",
"state": "OR"
}
}
JSON has six data types: strings (in double quotes), numbers (no quotes), booleans (true/false), null, arrays (ordered lists), and objects (key-value pairs). That's it. No dates, no comments, no undefined. This simplicity is both its strength and its limitation.
Common JSON mistakes that a JSON formatter catches instantly:
- Trailing commas.
{"a": 1, "b": 2,}is invalid JSON. JavaScript allows trailing commas in objects, so this catches people who write JSON by hand in a JS file. - Single quotes.
{'key': 'value'}is invalid. JSON requires double quotes. Python developers hit this constantly. - Unquoted keys.
{key: "value"}is valid JavaScript but invalid JSON. Keys must be double-quoted strings. - Comments. JSON doesn't support comments. No
//, no/* */. If you need commented configuration, use YAML or JSON5.
CSV: Deceptively Simple
CSV (Comma-Separated Values) looks trivial. Values separated by commas, rows separated by newlines. A spreadsheet format. What could go wrong? Quite a lot, as it turns out.
name,age,city
Alice Chen,32,Portland
Bob Smith,28,"New York, NY"
"O'Brien, Pat",45,Chicago
The fundamental problem with CSV: there's no universal standard. RFC 4180 exists but is widely ignored. Different applications handle quoting, escaping, line endings, and encoding differently. Excel exports CSV differently than Google Sheets. A CSV that opens perfectly in one program garbles text in another.
CSV pitfalls:
- Values containing commas. Must be quoted. If your CSV writer doesn't quote them, parsing breaks.
- Values containing quotes. Double-quote characters inside a quoted field must be escaped as
"". Miss this and parsing derails. - Values containing newlines. A quoted field can contain a literal newline. Many naive parsers split on newlines without checking whether they're inside quotes.
- Encoding. Is the file UTF-8 or Latin-1? Excel on Windows often exports as Windows-1252, which corrupts non-ASCII characters when read as UTF-8.
- Leading zeros. Excel strips leading zeros from numeric-looking strings. ZIP code "07042" becomes "7042." Phone numbers lose their leading digits.
- No data types. Everything is a string. The number
42, the string"42", and the boolean concept oftrueall look the same in CSV.
Despite these issues, CSV remains indispensable. It's the universal exchange format for tabular data. Every spreadsheet, database, and data analysis tool can import and export it. When you need to move flat, tabular data between systems, CSV is often the only format both sides support.
Converting between JSON and CSV requires understanding that they model data differently. JSON is hierarchical (objects within objects). CSV is flat (rows and columns). Converting a deeply nested JSON object to CSV forces you to flatten the hierarchy, which means inventing column names for nested properties or losing nested structure entirely.
A JSON to CSV converter handles the flattening automatically, typically using dot notation (address.city) for nested fields. Going the other direction, a CSV to JSON converter turns each row into a JSON object with column headers as keys. This works well for simple tabular data but can't reconstruct nesting that wasn't preserved in the CSV.
XML: Verbose but Powerful
XML (eXtensible Markup Language) was the king of data exchange before JSON dethroned it in the late 2000s. It's more verbose than JSON—a lot more—but it offers features JSON doesn't: schemas, namespaces, attributes, processing instructions, and mixed content (text interspersed with markup).
<?xml version="1.0" encoding="UTF-8"?>
<user id="1" active="true">
<name>Alice Chen</name>
<age>32</age>
<roles>
<role>admin</role>
<role>editor</role>
</roles>
<address>
<city>Portland</city>
<state>OR</state>
</address>
</user>
Where XML still dominates in 2026:
- SOAP web services. Legacy enterprise systems, banking APIs, healthcare (HL7), government systems.
- Configuration files. Maven (pom.xml), Android manifests, .NET configuration, Spring contexts.
- Document formats. XHTML, SVG, RSS, Atom feeds, EPUB, Office Open XML (.docx is a ZIP of XML files).
- Data interchange with schemas. When you need strict validation of structure and types, XSD (XML Schema Definition) is more mature than JSON Schema.
Converting between XML and JSON has inherent ambiguity. XML has attributes and elements; JSON has only properties. An XML attribute like <user id="1"> could become {"user": {"@id": "1"}} or {"user": {"id": "1"}}—there's no single correct mapping.
An XML to JSON converter makes these decisions for you (typically prefixing attributes with @). A JSON to XML converter faces the reverse problem: should JSON properties become XML elements or attributes? Most converters default to elements, which is the safer choice.
YAML: Human-Friendly Configuration
YAML (YAML Ain't Markup Language) was designed to be the most human-readable data format. It uses indentation instead of brackets, supports comments, and can represent the same data structures as JSON with less visual noise.
# User configuration
name: Alice Chen
age: 32
active: true
roles:
- admin
- editor
address:
city: Portland
state: OR
YAML has become the dominant format for configuration: Docker Compose, Kubernetes manifests, GitHub Actions, Ansible playbooks, CI/CD pipelines, and application configs. If it's a DevOps tool, it probably uses YAML.
YAML's readability comes with traps:
- Indentation sensitivity. Mix tabs and spaces and your YAML is invalid. Use spaces only, and be consistent (2 or 4 spaces, not a mix). This is the single most common YAML error.
- Implicit typing. YAML auto-detects types.
yes,no,true,false,on,offare all parsed as booleans. Writecountry: NO(Norway's country code) and YAML reads it asfalse. To force a string, quote it:country: "NO". - The Norway problem extends further. Version numbers like
1.0become floats. Timestamps like2026-04-13become date objects. Octal numbers like0777(Unix permissions) are interpreted as the decimal number 511. - Multi-line strings. YAML has five different ways to handle multi-line strings (
|,>,|+,|-,>-). Each preserves or folds newlines differently.
A YAML validator catches these issues before they break your deployment. Paste your YAML in, and it highlights syntax errors and potential type-coercion surprises. A YAML to JSON converter is also useful for debugging: convert your YAML to JSON to see exactly how the parser interprets your values. If Norway became false, you'll see it immediately.
Format Comparison at a Glance
| Feature | JSON | CSV | XML | YAML |
|---|---|---|---|---|
| Human readable | Good | Good (tabular only) | Fair (verbose) | Excellent |
| Nesting | Yes | No | Yes | Yes |
| Data types | 6 types | Strings only | Via schema | Auto-detected |
| Comments | No | No | Yes | Yes |
| Schema validation | JSON Schema | None standard | XSD, DTD | JSON Schema (via JSON) |
| File size | Moderate | Smallest | Largest | Moderate |
| Best for | APIs, web data | Spreadsheets, bulk data | Enterprise, documents | Configuration files |
Conversion Rules of Thumb
When converting between formats, keep these principles in mind:
- Flat to hierarchical is easy. CSV to JSON works cleanly when each row maps to one object. Just pick your column headers carefully.
- Hierarchical to flat loses information. A nested JSON object with three levels of depth can't be perfectly represented in a two-dimensional CSV. Something has to be flattened, concatenated, or dropped.
- Types get lost in CSV. The number
42and the string"42"are indistinguishable in CSV. If types matter, validate after import. - XML attributes have no JSON equivalent. Conversion tools use conventions (like
@prefix) but these are tool-specific. Don't assume consistency between converters. - YAML to JSON is lossless. Every valid YAML document maps to a valid JSON document (YAML is actually a superset of JSON). The reverse is also true.
- Always validate after converting. Run the output through a format-specific validator to catch issues the converter didn't flag.
Data formats are tools, not religions. JSON for APIs, CSV for tabular exchange, XML for enterprise systems with strict schemas, YAML for configuration files you need to read and edit by hand. Know the right format for the job, understand what breaks during conversion, and keep a formatter bookmarked for the inevitable debugging session at 11 PM.