CGI: Making web applications like it’s 90s

authors:

~xwindows


table of contents


Web programming today is a mess: gazillion of frameworks and libraries thrown on top of each other, runaway complexity so rampant, while the whole setup teetering closer to the state of house of cards than ever. You know it had become so bad because people have now started shipping you their computers (a la Docker) just for you to be able to run their web applications…

Don’t you ever hate your middleware for pulling in millions of dependencies? Feeling so done with juggling between multiple web programming libraries? Getting tired of seeing your PHP script break on every single PHP update that arrived? Looking for alternatives that could shine even in lowly environments like routers and single board computers? Would like something more retro and longer-standing for a change?

If so, welcome to the olden world of CGI programming!

Introduction §

At the most basic level of web-serving, when your browser sent a request to the web server, the server would check for a file residing on that URI path requested; if exists, it would give that file’s content to your browser, the end. The point of CGI is to extend that with the following idea: if that path is not pointing to a file, but rather to an executable program; instead of serving the program binary to the client, we run that program, with request body piped to its standard input, and pipe its standard output back to the client as a response. (Lawyers will say this is a sly oversimplification, but you get an idea)

By the way, CGI stands for Common Gateway Interface. Of course, a very common question that follows would be why is it being called gateway: it was because in early 1990s, the main use of this kind of server-executed program was not web application, but for writing glue logic to access institution’s already-existing in-house infosystem, which previously only accessible as command line programs run via on-premise terminal or over telnet/dial-in shell session.

Such glue logic programs would accept the request, invoke the on-server infosystem programs with correct parameters, dress its output a bit before sending that result to web browser; making them gateways to let users from the web access those in-house information systems. These were in fact, the main uses that pushed for the effort to standardize web servers into using the same common interface for running such gateway programs; and that’s where the name came from.

Anyway, by using just standard input/output and some environment variables, it means you can use virtually any compiled programming language, and any shebang-compatible interpreted programming language for server-side web development. There would be no complicated protocol you need to grok; and when you chose your language wisely, there would be no dependency hell to watch out for, no API/ABI breakage to rewrite around, and no upgrade treadmill forced on you. Life was definitely simpler back in the days; and by using CGI, your life could be simple today too.

For these reasons, while having limited amount of bling and bang to offer, CGI has been standing through time, as the lowest common denominator, programming language-agnostic, platform-independent scheme for running web applications; from its first standardization at the dawn of World Wide Web era nearly 3 decades ago, to today. And… did you know that the development of PHP was originally become possible because of CGI too?

Tilde.club have been supporting CGI programming on user web space since 17 May 2020. As CGI was originally conceived in shared institutional Unix server environment; on a tilde, it means we are experiencing it in its natural habitat.

Hello World! §

As simple as it is, everybody has to start somewhere; so the following are example Hello World CGI programs in many programming languages that Tilde.club supports. All of them produce HTTP response with status code 200, text/plain MIME type, and simple Hello World text as a response body. Note that every example scripts here all work under .cgi file extension; other language-specific file extension that work would be noted in each example.

  • Perl (also works with .pl file extension):

      #!/usr/bin/perl
      print "Status: 200";
      print "Content-Type: text/plain";
      print "";
      print "Hello World!";

    Note that Perl was the main language of choice back in the heyday of CGI programming.

  • Bourne shell script (also works with .sh file extension):

      #!/bin/sh
      echo "Status: 200"
      echo "Content-Type: text/plain"
      echo
      echo "Hello World!"
  • Python (usable under both 3.x and 2.x, also works with .py file extension):

      #!/usr/bin/python
      print("Status: 200")
      print("Content-Type: text/plain")
      print("")
      print("Hello World!")
  • AWK:

      #!/usr/bin/awk -E
      BEGIN {
          print "Status: 200"
          print "Content-Type: text/plain"
          print
          print "Hello World!"
      }
  • Lua (also works with .lua extension):

      #!/usr/bin/lua
      print("Status: 200")
      print("Content-Type: text/plain")
      print("")
      print("Hello World!")
  • Tcl:

      #!/usr/bin/tclsh
      puts "Status: 200"
      puts "Content-Type: text/plain"
      puts ""
      puts "Hello World!"
  • Common Lisp:

      #!/usr/bin/sbcl --script
      (progn
          (princ "Status: 200") (terpri)
          (princ "Content-Type: text/plain") (terpri)
          (terpri)
          (princ "Hello World!") (terpri)
      )

Pick the language you like, put the script (or executable) in a file anywhere inside your public_html subdirectory of your Tilde.club home directory, with appropriate file extension; and also make sure that the thing is world-readable and world-executable (something like chmod o+rx YOURFILE.EXT would do). If you use other language that compiles to a binary executable, just world-executable permission will suffice.

The URL for accessing a CGI program from a web browser is no different from accessing regular file hosted on your Tilde.club web space.

Note that there are no assembly, C, and C++ example here, and that is intentional: you are supposed to already know such languages well already —including how to program it safely and defensively— before even thinking about trying them in this task.

Program Output §

Output of your CGI programs is expected to have two parts:

  1. Lines you printed before the first blank line will be treated as HTTP response headers fields:

    • The only exception is the Status: pseudo-header, which will not be output as a real response header, but its value will be rather used as HTTP status code of the response.
      • When Status: pseudo-header is omitted, the HTTP status code of your response would be 200.
      • Your program ought NOT to output this as a real HTTP response line (HTTP/1.0 200 OK and suchlike). Doing so is off-spec; and while some servers handle this okay, Tilde.club doesn’t.
    • You MUST output Content-Type: header; or else the server would reject your program’s output and give HTTP 502 error to the client instead.
    • A blank line ends the headers section.
    • You should output headers (including the blank line terminating the headers) in platform’s native line ending, which is LF in case of Tilde.club and other GNU/Linux hosts; but in practice, CR/LF is accepted as well.
  2. And what you output after the first blank line is your response body (i.e. content). This part can use any line ending in case of text, or it could even be binary; as long as it fits with the Content-Type: header value you had just printed. Empty response body is allowed as well; by not outputting anything after that first blank line.

Program Input §

Information from HTTP request arrive at your CGI program in two different channels:

  1. Request line, request headers, misc request information, and server information: these arrive as environment variables.
  2. Request body: this arrives verbatim as standard input data.

Unless you are processing HTTP POST or PUT request (which are quite advanced stuff), you don’t really need to look at request body at all. So the information of interest are mostly contained in the environment variables:

  • The HTTP request method used would be passed to your program as a value of environment variable REQUEST_METHOD.
  • The part after ? of request URI would be passed to your CGI program as the value of environment variable QUERY_STRING.
    • This variable will always be present. If the request URI had no ?, or there was nothing after ?; the value would be empty.
  • Each request headers field’s name would be converted to uppercase, prepended with HTTP_, and set as environment variable with value equals to the header value received from the client. For example, Host: tilde.club header line would be converted to an environment variable HTTP_HOST with value tilde.club.

The following are environment variables from the CGI 1.1 specification which are set for CGI programs in Tilde.club, in alphabetical order:

CONTENT_LENGTH, CONTENT_TYPE, GATEWAY_INTERFACE, QUERY_STRING, REMOTE_ADDR, REMOTE_PORT, REQUEST_METHOD, SCRIPT_NAME, SERVER_NAME, SERVER_PORT, SERVER_PROTOCOL, SERVER_SOFTWARE

  • You can find out more what each of these variables mean in the original CGI 1.1 specification, linked in the Further Reading section below.

And the following are other environment variables that which are not in CGI 1.1 specification, but are set for CGI programs in Tilde.club:

DOCUMENT_ROOT, DOCUMENT_URI, HTTPS, REDIRECT_STATUS, REQUEST_SCHEME, REQUEST_URI, SCRIPT_FILENAME, SERVER_ADDR

Notes:

  • REMOTE_HOST variable is always absent; including when the remote host does have a valid reverse-DNS address.
  • REMOTE_IDENT variable is always absent; including when the remote client does connect from a host with Identd service.
  • While both HTTPS and REQUEST_SCHEME variables could be used for discerning HTTPS from plain old HTTP request; checking for HTTPS value on is to be used if you expect your CGI program to be portable to Apache HTTP Server.
    • This can be used for ensuring a correct version of Atom or RSS feed got served on a right protocol.

Program Execution §

These are conditions that your CGI programs would be running in:

  • One instance of CGI program would be executed to service one request; and that instance would terminate at the end of response.
  • Multiple instances of a CGI program could be run at the same time.
  • Your CGI program would start after the server had read the entire request header from the client (but not the request body, if any); and only when the request URI matched your CGI program of course.
  • Your CGI program would be run inside its directory (and not other location like server binary’s directory).
  • Once your CGI program runs, its standard input would be fed with the request body (if any).
  • If you are going to process request body, you ought to do so before producing any output. (HTTP is a request-response protocol, remember?)
  • Everything your program output on standard error stream would go into the server’s error log.
  • CGI program that did not finish running for too long will cause the server to return HTTP 504 Gateway Time-out error to the client instead of its response.

Tips §

  • Avoid making your CGI program a time hog; good CGI programs start quickly and finish quickly.
  • Avoid making your CGI program a resource hog; just like everything else you do on Tilde.club shell.
  • Avoid making your CGI program a security hole. For this reason, using C or C++ for a non-trivial CGI program are not recommended unless you actually know your craft.
  • Remember: it costs Tilde.club 1 program execution to service one HTTP request to a CGI program; use it responsibly and for things that matter.

Setup-Specific Notes §

Following are tidbits specific to the CGI setup used in Tilde.club:

  • If you would like to make CGI program a directory index, name it index.cgi. (index.sh works too in case of shell script)
  • There is no database daemon of any kind. (If you would like to use SQLite, see below for a caveat about credential and files)

And some caveats:

  • There is no support for PATH_INFO environment variable; you can blame Nginx for this one.

    This mean you cannot simulate files and directory-like URIs (like /~SOMEONE/category.cgi/automobile/ev) under your CGI program; the server will simply return HTTP 404 error for such URIs even when category.cgi exist and being executable.

  • Avoid leaving files with following extensions in your web space when you don’t intend for them to be run as CGI:

    • .cgi
    • .pl
    • .sh
    • .py
    • .lua

    This is because in current setup, requests to these files will be forwarded to a CGI handler anyway, even when their corresponding executable bit is not set; while it would not really run such script, it would result in HTTP 502 error being sent to client. If you would like to distribute these verbatim as source files, you might want to workaround by renaming such files to add .txt at the end.

  • CGI programs here run under web server’s credentials: user nginx and group nginx (user ID 994, group ID 990); tread carefully if you need to make your program read/write private files.

Further Reading §