COP 2344 (Shell Scripting) Project

A common problem when putting content on a web server, is that text files commonly contain characters that have special meaning to a web browser. These include an ampersand (“&”), a less-than symbol (“<”), a greater-than symbol (“>”), and others.

In addition valid HTML or XHTML, documents require some information at the beginning (a document prolog) and some more at the end (the document epilog). (XHTML is a more modern version of HTML; today's web browsers understand both formats.)

In this project you will write either a Perl or Python3 script that transforms a plain text file into a valid XHTML file.

Create a Python3 or Perl script, that reads text from a file whose name is provided on the command line, and produces a valid XHTML document as the standard output. The title of the document should be the name of the file.

For example, if a text file named “hello” contains the following text:

Hello, World & Class!
<Good-Bye!>

Then the XHTML encoded output should look like this:

 1.  <?xml version="1.0" encoding="UTF-8"?>
 2.  <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
 3.      "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
 4.  <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
 5.    <head>
 6.      <title>hello</title>
 7.    </head>
 8.    <body>
 9.      <pre>
10.        Hello, World &amp; Class!
11.        &lt;Good-Bye!&gt;
12.      </pre>
13.    </body>
14.  </html>

The spacing of lines 1 to 9 (the XHTML required document prolog) and lines 12 to 14 (the required document epilog) is for readability only, and not required.

Your script must make the following changes to the input:

Change all occurrences of “&” to “&”.
Change all occurrences of “<” to “<”.
Change all occurrences of “>” to “>”.
Add the correct XHTML document header (nine lines), including a correct title with the document name.
Add the correct XHTML document footer (three lines).

The name of a file as a command line argument can be accessed “ARGV[0]” in Perl. In Python, import sys and then use “sys.argv[0]”.
The order you do your changes matters. Unless you convert the ampersands first, the other conversions will get corrupted.
A sample Perl script filter.pl can be used as a model for your script. A sample Python3 script filter.py is available too. (A copy of either can be found on YborStudent, in ~wpollock/bin.)
To avoid lots of print statements, you can learn about Perl's here document. In Python, just use a triple-quoted string.

A copy of your Python or Perl script. A sample text file you can use for testing your script is available on YborStudent.hccfl.edu at ~wpollock/mycat.c.

You can type or send as email to . Please see your syllabus for more information about submitting projects.

COP 2344 (Shell Scripting) Project #4
XHTML Encoding

Due: **by the start of class** on the date shown on the syllabus

Description:

Requirements:

Additional Notes:

To be turned in: