Working with structured data

In this column we look at writing data to files in a structured format and how to read it back in that format.

Python vs. PHP: Choosing your next project's language

Writing To Files

To read files, we have focused on the fgets() function. This function relies on a file descriptor in order to know from which file to read, and which line of a file to read. Likewise, we will use the appropriately named fputs() function to write to a file. The following example shows how easy it is to use fputs().

$s = "The time is ".date('D M d H:i:s T Y')."\n";
$fp = fopen("./","a");
$i = fputs($fp,"--\n".$s);
fputs($fp,"Wrote $i bytes to file\n");
echo "Wrote $i bytes to file\n";

View this script from your PHP-enabled Web server (see the cover CD for information on setting up a PHP-enabled Web server if you do not have one). This script writes three lines of data to a file: a separator, the time and the number of bytes written to the file on the second line (collectively, an 'item'). It also outputs this final line to the Web user.

Notice that the script opens in mode 'a' - that is, opened for writing only, created if it does not exist with a file pointer placed after the last line of the file. This means that when fputs.php is viewed a second time, the latest time will be appended to the end of A useful exercise would be to change this script to place the most recent write to at the beginning of the file.

You may also have noticed that the file pointer is closed with the fclose() function. It is important that this takes place when writing to a file opened with fopen(), since if the file pointer is not closed all data written to the file will be lost.

Reading Structured Data

In order to read back the data from one item at a time, we will need to approach the reading of the file differently than last month. This requires the construction of an algorithm that locates the first line of an item by the "--\n" separator and assumes that the next two lines are the time and bytes written strings (more sophisticated algorithms in later months will not make such assumptions). This kind of algorithm is called a parser (pronounced 'par-zer').

$fp = fopen("./","r");
while(($s = fgets($fp,1024)) && !feof($fp)) {
if($s[0] == '-' && $s[1] == '-') {
if(($time = fgets($fp,1024)) && ($bytes = fgets($fp,1024))) {
echo "<TR>\n<TD>\n$time</TD>\n\n";
echo "<TD>\n$bytes</TD>\n</TR>\n";
} else {
echo "<TR><TD>Data file corrupted!\n</TD></TR>";
} else {
echo "<TR><TD>Data file corrupted!\n</TD></TR>";

Save this as parser1.php in the same directory as the file and request it from your Web server. All the items you have created with fputs.php are returned in an HTML table, which reflects the structure of our data.

parser1.php introduces a few new features of PHP as well as the concept of parsing. To look at this step by step: first, the script opens file for reading and places the file pointer at the beginning of the file. It then enters into a loop reading one line at a time. The conditions for the loop continuing are a) that the call to fgets() returns a true result (that is, it reads from and b) the current file pointer is not at the end of the file.

These two conditions are conjoined by a 'logical AND' - &&. That is, only when both conditions are true is the loop executed (a logical OR, denoted by ||, would have allowed either value to be true for the loop to be executed). See for more information.

Next the script checks if the first and second characters returned by fgets() are dashes ("-"), thereby matching the item separator. The script achieves this by comparing the first byte of the string s ($s[0]) to dash and the second byte ($s[1]) to dash. If both match dash, the script continues to parse the item; otherwise, it assumes the data is corrupted and tells the Web reader this.

If the script has found an item, it reads the time line and the bytes written line. Like before, it only continues if both results are true. The script then outputs the time and bytes written lines intermixed with HTML tags.

Notice how simple it is to embed HTML into your scripts in this way. If you cannot work out how parser1.php is constructing the HTML table, have a look at the HTML source in your Web browser - you will notice that the loop constructs all the table cells.

The final point of interest in parser1.php is the number of bytes it reads from each line. Why 1024? The simple answer is that we do not want to assume that all lines will be short. This assumption would cause chaotic results if the files became corrupted, effectively breaking the data corruption detection.

Join the newsletter!

Error: Please check your email address.
Rocket to Success - Your 10 Tips for Smarter ERP System Selection
Keep up with the latest tech news, reviews and previews by subscribing to the Good Gear Guide newsletter.

Gavin Sherry

PC World
Show Comments


James Cook University - Master of Data Science Online Course

Learn more >


Sansai 6-Outlet Power Board + 4-Port USB Charging Station

Learn more >

Victorinox Werks Professional Executive 17 Laptop Case

Learn more >



Back To Business Guide

Click for more ›

Brand Post

Most Popular Reviews

Latest Articles


PCW Evaluation Team

Louise Coady

Brother MFC-L9570CDW Multifunction Printer

The printer was convenient, produced clear and vibrant images and was very easy to use

Edwina Hargreaves

WD My Cloud Home

I would recommend this device for families and small businesses who want one safe place to store all their important digital content and a way to easily share it with friends, family, business partners, or customers.

Walid Mikhael

Brother QL-820NWB Professional Label Printer

It’s easy to set up, it’s compact and quiet when printing and to top if off, the print quality is excellent. This is hands down the best printer I’ve used for printing labels.

Ben Ramsden

Sharp PN-40TC1 Huddle Board

Brainstorming, innovation, problem solving, and negotiation have all become much more productive and valuable if people can easily collaborate in real time with minimal friction.

Sarah Ieroianni

Brother QL-820NWB Professional Label Printer

The print quality also does not disappoint, it’s clear, bold, doesn’t smudge and the text is perfectly sized.

Ratchada Dunn

Sharp PN-40TC1 Huddle Board

The Huddle Board’s built in program; Sharp Touch Viewing software allows us to easily manipulate and edit our documents (jpegs and PDFs) all at the same time on the dashboard.

Featured Content

Product Launch Showcase

Latest Jobs

Don’t have an account? Sign up here

Don't have an account? Sign up now

Forgot password?