HTML (HyperText Markup Language) is the standard markup language for documents displayed in web browsers. It defines the content and structure of web pages using a system of tags and elements. Search engines and screen readers parse this markup to understand your page hierarchy, making clean, semantic HTML essential for visibility and accessibility.
What is HTML?
HTML is the foundational code that tells browsers how to display content. Tim Berners-Lee developed the first version at CERN starting in 1989, evolving from a simple document-sharing system into the structural backbone of the modern web.
The language uses elements enclosed in tags (angle brackets like <p>) to mark up text, images, and media. These elements create semantic structure that browsers render into visible pages. HTML5, the current specification standardized on 28 October 2014, expanded capabilities to include native video, audio, and semantic elements like <article> and <section>.
On 28 May 2019, the W3C announced that WHATWG would become the sole publisher of the HTML and DOM standards, shifting to a "Living Standard" model that updates continuously rather than through versioned releases.
Why HTML matters
- Search indexing: Search engine spiders rely on semantic HTML structures to rate the significance of text they find and index.
- Accessibility: Semantic markup helps screen readers and audio browsers interpret document structure without wasting time on irrelevant information.
- Rendering consistency: A valid doctype declaration triggers standards mode, preventing browsers from reverting to quirks mode that displays pages inconsistently.
- Content hierarchy: Heading tags (
<h1>through<h6>) denote structural importance, helping algorithms understand topic relevance and page organization. - Cross-platform delivery: HTML documents transmit via HTTP from servers to browsers, rendering as multimedia web pages across devices.
How HTML works
- Request: The browser requests an HTML document from a web server via HTTP.
- Parsing: The browser reads the markup and builds the Document Object Model (DOM), interpreting tags to structure content.
- Rendering: The browser converts the DOM into a visible page, applying default styles unless overridden by CSS.
- Execution: JavaScript embedded in or linked to the HTML executes to modify behavior and content dynamically.
The browser does not display raw tags. It uses them to embed images via <img>, create hyperlinks with <a>, and format text according to element semantics such as <strong> for importance or <em> for emphasis.
Types of HTML
| Version | Description | Status |
|---|---|---|
| HTML 4.01 | Previous standard with Strict, Transitional, and Frameset variants, published on 24 December 1999 | Legacy |
| HTML5 | Current version supporting multimedia and semantic markup, standardized 28 October 2014 | Active |
| XHTML | XML-based reformulation requiring stricter syntax, now referred to as the XML syntax for HTML | Deprecated |
| Living Standard | Continuously updated specification maintained by WHATWG since 28 May 2019 | Current |
HTML 2.0 was published as RFC 1866 on 24 November 1995, establishing the first standardized version for implementation.
Best practices
Validate your markup. Use the W3C Markup Validation Service to catch syntax errors that trigger quirks mode or confuse crawlers.
Use semantic HTML5 elements. Replace generic <div> containers with <header>, <nav>, <main>, <article>, and <footer> to clarify page structure for search engines and assistive technologies.
Write descriptive alt text. Always include the alt attribute in <img> tags to ensure proper indexing when images fail to load and to support visually impaired users.
Maintain heading hierarchy. Use one <h1> per page for the main topic, followed by sequential <h2> through <h6> tags for subsections. Do not skip levels.
Minimize presentational markup. Avoid deprecated tags like <font>, <center>, or <u>. Control appearance through CSS instead of HTML attributes.
Common mistakes
Mistake: Omitting the doctype declaration. Without <!DOCTYPE html>, browsers revert to quirks mode, rendering pages inconsistently across devices.
Fix: Place <!DOCTYPE html> at the very first line of every HTML document.
Mistake: Using tables for layout. Tables increase file size, complicate maintenance, and confuse screen readers when used for page layout rather than tabular data.
Fix: Use CSS for layout; reserve <table> elements for actual data presentation.
Mistake: Missing alt attributes on images. Empty or omitted alt text reduces image search visibility and fails accessibility standards.
Fix: Write concise, descriptive alt text for every image: <img src="photo.jpg" alt="Red running shoes on white background">.
Mistake: Improper tag nesting. Overlapping tags like <strong><em>text</strong></em> create invalid markup that parsers may handle unpredictably.
Fix: Close tags in reverse order of opening: <strong><em>text</em></strong>.
Mistake: Presentational HTML. Using <b> or <i> for styling rather than semantic emphasis creates ambiguity for non-visual user agents.
Fix: Use <strong> for importance and <em> for emphasis, applying visual styling via CSS.
Examples
Basic page structure:
<!DOCTYPE html>
<html>
<head>
<title>Page Title</title>
<meta charset="UTF-8">
<meta name="description" content="Page description for search results">
</head>
<body>
<h1>Main Heading</h1>
<p>Paragraph text with <a href="page.html">a descriptive link</a>.</p>
<img src="image.jpg" alt="Descriptive alt text">
</body>
</html>
Semantic article markup:
<article>
<header>
<h1>Article Title</h1>
</header>
<section>
<p>Content text here</p>
<img src="diagram.png" alt="Chart showing quarterly growth">
</section>
<footer>
<p>Author information</p>
</footer>
</article>
Properly nested list:
<ul>
<li>First item</li>
<li>Second item with <em>emphasized text</em></li>
<li>Third item</li>
</ul>
FAQ
What is the difference between HTML and HTML5?
HTML5 is the current version of the HTML standard standardized on 28 October 2014. It introduced semantic elements like <article> and <nav>, native multimedia support without plugins using <video> and <audio>, and the <canvas> element for graphics. Previous versions relied on presentational markup and required plugins for video content.
Is HTML a programming language? No. HTML is a markup language, not a programming language. It annotates content structure using tags but does not contain logic, variables, or control flow. You use it to organize and format documents, while programming languages like JavaScript handle computational behavior and interactive functionality.
What is semantic HTML and why does it matter for SEO?
Semantic HTML uses elements that convey meaning about the content, such as <header>, <main>, and <footer>, rather than generic containers like <div>. Search engine spiders use these tags to understand page structure and rate the significance of text for indexing purposes. This practice also improves accessibility for screen readers.
How do I check if my HTML is valid? Use the W3C Markup Validation Service or browser developer tools. Validators check your code against formal standards, catching missing tags or improper nesting that could trigger quirks mode or prevent proper indexing. Valid HTML ensures consistent rendering across browsers.
What is the purpose of the doctype declaration?
The doctype (<!DOCTYPE html>) triggers standards mode in browsers, ensuring they render the page according to modern specifications. Without it, browsers may revert to quirks mode, which emulates older rendering behaviors and can cause layout inconsistencies.
Should I use .html or .htm file extensions? Either works, but .html is the modern standard. The .htm extension originated from early operating systems that limited file extensions to three characters. Most web servers treat them identically, but consistency helps with URL management.
How does HTML work with CSS and JavaScript? HTML provides the document structure and content. CSS handles presentation, layout, and styling separately. JavaScript adds interactivity by manipulating the HTML DOM. The three technologies work together: HTML contains the markup, CSS styles it, and JavaScript affects behavior and content dynamically.