HTML is fundamental to the World Wide Web and it is the format used for every website. The term “HTML” stands for hypertext markup language. You might be surprised to learn that HTML isn’t a programming language. In this article, you will find out exactly what HTML is and how it works.

The HTML protocol

HTML is what is known as a “protocol.” This means that it is a standard that establishes a common method of writing Web pages and it dictates the functionality of Web browsers. It doesn’t require a special editor because it is written in plain text, so it can be written in any text editor, such as Notepad.

The HTML protocol can be accessed by anyone. It is owned by a non-profit organization, but there is no charge for a copy of the standard. This easy access to the standard was vital for the growth of the World Wide Web. The fact that anyone is entitled to write a Web page made the creation of a lot of content for the Web happen a lot faster.

At the time of writing, there are about 2 billion websites in the world but only about 400 million of those are active. All of them are conveyed in HTML. While other formats exist for useful and distinct purposes, such as XBRL for digital business reporting, HTML is the standard for Web pages and has been for quite some time. 

HTML history

HTML has been around for a long time now. It was created by Sir Tim Berners-Lee – the inventor of the World Wide Web in 1990. Sir Tim didn’t have a knighthood back then – he was knighted by Queen Elizabeth II in 2004. Back in 1990, Berners-Lee was working at CERN, the European Organization for Nuclear Research, which is based in Geneva, Switzerland.

Mr. Berners-Lee has a very strong IT pedigree. His father was Conway Berners-Lee, the creator of the world’s first commercial stored program computer. His mother was Mary Lee Woods, one of the world’s first computer programmers.

Tim went to work at CERN as a consultant in 1980 and while there, proposed a computer-based data sharing and information communication system, based on hypertext. This system was the prototype for HTML and it was called ENQUIRE. Tim kept evolving the idea into a proposal for CERN in 1990. This eventually emerged as the WorldWideWeb project and its pages were written in HTML. The first standard document for the protocol was published as HTML Tags in 1991.

What is hypertext?

The term “hypertext” refers to those links that you see on every web page. A link, which is also called a hyperlink, is made up of an underlined block of text. These words are usually colored blue. This convention of coloring blue and underling has become a convention of HTML. However, links don’t have to be colored blue or underlined.

A link is composed of that anchor text plus a web page address that is associated with it. In World Wide Web terminology, a web page address is called a Universal Resource Locator, or URL.

For an example of how HTML creates a hypertext link, Chamber of Commerce is created in an HTML document as:

<ahref=”https://www.chamberofcommerce.org/”>Chamber of Commerce</a>

The href attribute can contain an absolute address, which starts with “https://,” as shown above, or a relative address, which expresses a location as arrived at from the directory that holds the file for the HTML page.

What is markup?

Markup is a term used in the publishing industry. Marking up a page involves leaving handwritten notes in the margins to give instructions to the printer over typesetting. Graphic designers and proofreaders also “mark up” documents.

This is why HTML is not a programming language. It is focused on the layout of a Web page and most of its codes, which are implemented as HTML tags in angle brackets (“<”, “>”). These are called HTML elements and they have attributes. Semantic elements give instruction on what style a block of contained text should have or formatting instructions, such as bold, italic, and underline.

How does HTML work?

The most important part of making an HTML page accessible to the general public is the Web browser, such as Chrome or Firefox. The Web browser is an interpreter, or a reader, for HTML documents. This is very similar to the way you need Adobe Reader in order to view a PDF file. In both cases, the source file is just a confusing file full of codes. The reader uses those codes as instructions and organizers a visible page from them.

You probably already know that the Web browser doesn’t usually run on the same computer that stores the HTML file. Those files usually have the extension .htm or .html. The computer that stores those files is called a server and the software that manages the distribution of requested files is called a Web server.

Communication between a Web browser and a web server is conducted with another protocol, which is called the Hypertext Transfer Protocol (HTTP). The server sends the code for a requested Web page in a stream – it doesn’t send the file itself, but the contents. When the Web browser receives that HTML document, it renders it into a visible page.

Once the browser has presented the page, it stops running. The page remains in the main panel of the browser and the Web browser doesn’t do anything else until it is triggered into action. When the user clicks on a link in a page, the associated URL gets copied up into the address bar of the browser and that provokes it to fetch a new page.

Web page contents

Not all of the code that creates the final Web page will arrive in the first Web server response to a request for that page. Just as the HTML document contains links to other pages embedded in the code to create a hypertext link it also contains links to other files that trigger the browser to go and get the contents of that file and read it into that point of the page.

Examples of content that doesn’t arrive with the initial HTML document are images and graphics, which are stored in separate files. Those widgets that let you like a page on social media are also read in during a later phase of composing the page.

Some Web pages are designed to contain panels, called frames, which are read in from different files. Frames can also include frames and those frames might contain calls to other files, such as images. So, as the Web browser works through the code that it receives for a page, it has to send out requests for many other files, which could all be stored on different servers in different parts of the world.

Which version of HTML?

There are many different versions of HTML. The definitive definition of HTML is owned and managed by the World Wide Web Consortium (W3C), which decides when to commission a new version of HTML.

The major versions of HTML are HTML 2.0, which was published in 1995, HTML 3.2, published in 1997, HTML4, with version 4.01 being published in December 1997 and then again in April 1998. HTML5, the latest version of HTML, was published in 2014.

If you are starting to write a Web page, all the guidance that you encounter will be about HTML5. Most of the Web pages you encounter today will be written in HTML5. You can tell which version of HTML a page has been written in by looking at the first line of the HTML page.

The first line in an HTML5 page is:

<!DOCTYPE html>

There are three types of HTML 4.01. The first line of HTML 4.01 Strict reads:

<!DOCTYPE HTML PUBLIC “-//W3C//DTD HTML 4.01//EN” “http://www.w3.org/TR/html4/strict.dtd”>

The first line of HTML 4.01 Transitional reads:

<!DOCTYPE HTML PUBLIC “-//W3C//DTD HTML 4.01 Transitional//EN” “http://www.w3.org/TR/html4/loose.dtd”>

The first line of HTML 4.01 Frameset reads:

<!DOCTYPE HTML PUBLIC “-//W3C//DTD HTML 4.01 Frameset//EN” “http://www.w3.org/TR/html4/frameset.dtd”>

Creating an HTML page

This guide is not able to present a full tutorial of HTML. However, you will learn how to read the code and how to structure your first HTML document.

You now know what the first line of your HTML page will be: <!DOCTYPE html>.

An important convention of HTML is that the tags that indicate each element work like brackets. There is an opening tag, which contains the name, and a closing tag, which has the name of the related opening tag preceded by a forward slash (“/”). It is usual to position the opening tag and closing tag at the same indentation on different lines with all of the lines bracketed by the pair on intervening lines and indented further.

The HTML document itself begins with <html> and ends with </html>. As everything in the HTML page will be within the <html> tag, most programmers don’t bother indenting the first level within them.

An HTML document has two sections: a header and a body. Many programmers don’t bother indenting within the first level of these tags either. The presentation of indented lines within the opening and closing tags is called nesting. A typical HTML document can require many levels of nesting and so, some lines can be very deeply indented. This is why it is a good idea to avoid this initial indentation.

Each programmer quickly decides on their own indentation styling. Usually, it is two, three, or four spaces and that becomes instinctive and forms part of a signature of a programmer.

The header is identified by <head> and </head> tags and the body is indicated by <body> and </body> tags.

So, the outline of your HTML page will look like this:

<!DOCTYPE html>

<html>

<head>

</head>

<body>

</body>

</html>

 

The HTML header

The top of an HTML page contains the header and that has meta information in it. This section is also where variables are set up and where constants are declared and it is also where functions are declared. Variables, constants, and functions sound like parts of a program. They are.

Although HTML is not a programming language, an HTML page can include small programs, made up of scripting. These programs are written in JavaScript and inserted into the document. By declaring a JavaScript function in the header, you can use that function in the body section of the page.

Although it is possible to write a Web page in HTML by hand in a text editor, there are now many Web page generators. Even word processor systems, such as Word can easily generate a page of HTML from a standard page of text. HTML generators insert a lot of code and system declarations in the header of the new page.

One of the first lines in a header that you will always see is <meta charset=”UTF-8″>. <meta> is one of the few tags that does not need a closing tag. Two useful lines to put in are the title of the page and a description.

So, if your HTML page has a header at all, the minimum that it will have is:

<head>
<meta charset=”UTF-8″>
<title>Information about HTML</title>
<meta name=”description” content=”Details about HTML, where it comes from, and what HTML codes mean.” />
</head>

The format of the two meta tags shown above illustrates that tags can have attributes. Different tags allow different attributes. The format of an attribute is the attribute name, followed by an equals sign, followed by a value in quotes. Even if the value is a number, it has to be in quotes.

Nothing that is written in the header section will ever be visible on the rendered Web page unless it is contained in a function that is called in the body of the HTML document.

The HTML body

Everything that appears on a Web page is written in the body section of the HTML document. You can take a look at the HTML that creates a Web page by pressing Control-U while looking at that page. This opens a new tab with all of the HTML from start to finish, including the header and the body of the HTML page.

The main text on any page is made up of HTML elements. This is the name for the building blocks of HTML that are implemented with HTML tag syntax. Most elements have attributes.

Here is an example of a widely-used element and its attributes:

<img src=”https://blahdeblahblah.com/pics/smiley_face1.jpg” alt=”Be happy” width=”500″ height=”600″>

This element creates a space for an image. Not only does this element create a space, but the browser will load an image into it. Src, alt. width, and height are all attributes. As has already been explained, the values for attributes must appear in quotes. There is a list of 12 possible attributes for the img tag. It can also contain what are known as global attributes and event attributes.

The <img> tag is rare because, like the <meta> tag that you have already seen, this tag doesn’t require a closing tag.

Inline, block, and empty elements

There are three types of elements:

  • Block, also called “block-level”
  • Inline
  • Empty

Here are more details about each of these element types.

Block elements

A block element is like a paragraph. It surrounds a number of lines that each present another element. So, elements can be nested. A block element can contain other block elements and also inline elements.

Here is an example of an often-used block element:

<ul>
<li>Skip</li>
<li>Hop</li>
<li>Jump</li>
</ul>

The <ul> tag creates an unordered list – such as bullet points.

The unordered list needs to contain items. These are denoted by the <li> tag, which indicates list items. <li> is also a block element.

It is standard practice to show a block element on a line by itself. As you can see in the example, it is common to indent the elements that are bracketed by an outer element.

Inline elements

An inline element does not occupy a line by itself. Instead, it interjects into the text of another element. Typically, inline elements implement formatting. Take a look at the example below, which shows a block element, containing an inline element.

<p>The border of the skirt was <i>scalloped</i> by a special method of stitching.</p>

In this example, the <p> tag represents the instruction to create a paragraph. The inline element, <i> creates italic formatting for the enclosed text.  An inline element cannot contain a block element, but it can contain another inline element.

Empty elements

An empty element is a special type of inline element. It doesn’t have a corresponding closing tag, so it doesn’t contain any other object. Here is an example:

<p>Do not pass go. <br>
Do not collect 100 pounds.</p>

The <br> element creates a line break without adding a space for a new paragraph. It needs to be contained within a block element that contains text, such as the paragraph element (<p>) as shown above.

HTML entities

HRML has a number of reserved characters. This presents a problem if you actually want to use one of those characters. HTML elements resolve this issue.

Consider the following HTML:

<p>You can create bold text by using the <b> inline element. Be careful to close off that tag at the end of the passage that you want to make bold, otherwise, all of the remaining text on the page will be made bold.</p>

The Web browser would create the following paragraph:

You can create bold text by using the inline element. Be careful to close off that tag at the end of the passage that you want to make bold, otherwise, all of the remaining text on the page will be made bold.

This is where an HTML entity comes in useful. A character entity can be represented by an entity name or an entity number. The format for an entity name begins with “&” and the entity number begins with “&#.” Entity names and entity numbers are terminated by a semicolon (“;”).

For example, the entity name for “<” is &lt; and &#60; is its entity number. So, in order to write the above passage in HTML without creating formatting problems, you would write:

<p>You can create bold text by using the &lt;b&gt; inline element. Be careful to close off that tag at the end of the passage that you want to make bold, otherwise, all of the remaining text on the page will be made bold.</p>

The browser would render that passage as:

You can create bold text by using the <b> inline element. Be careful to close off that tag at the end of the passage that you want to make bold, otherwise, all of the remaining text on the page will be made bold.

HTML color names

One of the global attributes of HTML is style. A global attribute is an attribute that can be included in most tags. The style attribute allows you to specify a color.

There are a number of pre-defined named colors in the HTML specification. It is also possible to specify a color with an RGB code; also with HEX numbers, HSL, RGBA, and HSLA values.

There are 140 HTML color names, which is a list that is too long to include here. However, you can see the full list of HTML colors on the W3C website.

Cascading style sheets

A cascading style sheet (CSS) is a way to create custom layout formats for your Web page. This system allows you to change the standard colors used for text, change the fonts, specify borders and menu styles, and create consistent livery throughout the site. If you create a CSS, the Web browser will use that instead of its default style palette.

There are three ways to create a CSS:

  • Inline
  • Internal
  • External

The inline method uses the style global attribute at the point that you create a piece of text. Such as:

<p style=”color:blue;”>Look at this blue text.</p>

The internal method is implemented in the header of the HTML page. In this strategy, you set up a series of styles in a similar manner to the creation of a JavaScript function.

<head>
<style>
body {background-color: lime;}
 h1   {color: yellow;}
 p    {color: blue;}
 </style>
 </head>

With the above example, every paragraph will have blue text and every H1 header will have yellow text.

The third method is the most manageable if you want to create a consistent image across all of your Web pages. This is the external CSS system, which writes all styles to a separate file. This file can then be called in the header section of every page that you create:

<head>
   <link rel=”stylesheet” href=”my_styles.css”>
 </head>

XHTML

HTML is not the only digital markup language in circulation. It isn’t even the only markup language managed by the World Wide Web Consortium. The most widely implemented alternative to HTML is XHTML, which is a combination of XML and HTML. You can read more about XML and XHTML on other pages of this site.