A validating parser is a software tool that checks an XML document for two levels of correctness: general syntax and adherence to a specific set of rules. It ensures data is structured properly so it can be exchanged between different systems without errors.
A validating parser is a program that reads XML code and verifies that it follows the "eXtensible Markup Language" rules. While all parsers check for "well-formedness" (basic syntax), a validating parser goes further by comparing the document against a Document Type Definition (DTD) or an XML schema.
What is a Validating Parser?
Parsers are the engines that read and interpret XML data. In technical environments, "correctness" comes in two degrees:
- Well-formedness: The document meets general XML rules, such as having a single root element and properly balanced tags.
- Validity: The document is well-formed and also conforms to a specific DTD or schema that defines which tags are allowed and where they can go.
A validating parser performs both checks. If you are using a validating parser, you must ensure the validation feature is turned on before checking your file.
Why a Validating Parser matters
Using a validating parser reduces the risk of data rejection and improves system efficiency.
- Data Integrity: It ensures that information, such as financial or healthcare data, is structured exactly as the receiving system expects.
- Automated Rejection Prevention: Governments and agencies often use these tools to filter submissions. For instance, the Registered Plans Directorate will reject submissions with file format errors ( Canada.ca).
- Extra Capabilities: Validating parsers can use DTD information to perform entity substitution and provide default attributes that the author might have omitted.
- Error Reporting: These tools provide specific feedback on where a document breaks the rules, making it easier to fix errors before going live.
How a Validating Parser works
The process follows a logical sequence to ensure the data is "clean" before it is processed by an application.
- Check Well-formedness: The parser scans for basic XML syntax like open/close tags and root elements.
- Locate Schema/DTD: The parser identifies the ruleset (schema) linked to the document.
- Compare Structure: It checks every tag against the schema. For example, it ensures a
<annuitantLastName>tag does not exceed a 40-character limit. - Confirm Constraints: It verifies that required tags are present and that data types (like numbers vs. text) are correct.
- Report Results: The parser either confirms the document is valid or lists specific errors for the user to fix.
Best practices
Turn on validation features. Many software parsers have validation turned off by default. Always verify it is active before running a check on critical files.
Test before submission. Do not rely on the receiving party to test your files. Use the same XML schema provided by the agency (like the CRA) to self-test your data locally.
Use online tools for hand-edited code. If you are editing XML by hand, use web-based syntax checkers. These tools allow you to upload a file or paste text to see immediate errors.
Verify across multiple parsers. Because different parsers might interpret the XML Recommendation with slight variations, run a document through more than one parser to be certain of its correctness ( XML.com).
Common mistakes
Mistake: Assuming a "well-formed" document is "valid." Fix: Always check against a schema. A document can have perfect tags but still be invalid if it uses a tag name the schema does not recognize.
Mistake: Using a non-validating parser for official data transfers. Fix: Use a dedicated validating parser. Non-validating parsers may ignore a DTD even if one is present.
Mistake: Failing to omit prohibited content in tags. Fix: Ensure tags do not include extra spaces, initials, or titles (like Mr. or Mrs.) if the schema constraints forbid them.
Examples
Example Scenario: Tax Reporting
A tax professional needs to submit a list of retirement savings accounts. The government provides an XML schema defining that the "Last Name" field must be exactly <annuitantLastName>.
- Valid Input:
<annuitantLastName>Smith</annuitantLastName>(Correct tag name and data). - Invalid Input:
<LastName>Smith</LastName>(Well-formed, but invalid because the schema requires "annuitantLastName").
Example Scenario: Browser Differences
Web browsers use different internal parsers to read data. Microsoft Internet Explorer historically used a parser in MSXML.DLL, while the Mozilla browser used a C-based parser called "expat."
Validating Parser vs. Non-validating Parser
| Feature | Validating Parser | Non-validating Parser |
|---|---|---|
| Goal | Ensure correctness against a schema/DTD. | Ensure general XML syntax is correct. |
| Well-formedness | Always checked. | Always checked. |
| Compliance | Checks specific tag names and order. | Does not check tag names/order. |
| Attribute Defaults | Can supply default values from DTD. | Usually ignores these. |
| Speed | Slower (more checks). | Typically faster. |
FAQ
Which XML parser should I choose?
If you are simply browsing or editing XML with a GUI-based editor, the choice is usually made for you by the application developer. For example, Internet Explorer uses MSXML.DLL. However, if you are developing software or submitting data to an agency, you must choose a parser that supports validation and fits your programming language (like Perl, Java, or C). By 1998, there were over 200 parsers available ( XML.com).
What happens if my XML is not valid? The results depend on who is receiving the data. In many cases, such as filing with the Canada Revenue Agency, the entire submission will be rejected, and the records will be considered "not filed" due to format errors.
Can a non-validating parser still use a DTD? Yes, it is possible. A non-validating parser may make use of some DTD information, but it is not required to do so. Only a validating parser is guaranteed to enforce the DTD rules.
How do I test a document if I don't have a standalone parser? You can use online validation services, such as the W3C Markup Validation Service or the Brown University STG parser. These allow you to enter a URL, upload a file, or paste XML code directly into the browser to check for errors.
Is speed a concern with validating parsers? Speed is rarely a concern for documents with only a few hundred elements. It becomes a factor as document size grows or when you are processing thousands of files at once. In those cases, the size and efficiency of the parser code matter more.