Working with structured data

In this column we look at writing data to files in a structured format and how to read it back in that format.

Python vs. PHP: Choosing your next project's language

Writing To Files

To read files, we have focused on the fgets() function. This function relies on a file descriptor in order to know from which file to read, and which line of a file to read. Likewise, we will use the appropriately named fputs() function to write to a file. The following example shows how easy it is to use fputs().

<HTML>
<BODY>
<?
$s = "The time is ".date('D M d H:i:s T Y')."\n";
$fp = fopen("./fputs.data","a");
$i = fputs($fp,"--\n".$s);
fputs($fp,"Wrote $i bytes to file\n");
echo "Wrote $i bytes to file\n";
fclose($fp);
?>
</BODY>
</HTML>

View this script from your PHP-enabled Web server (see the cover CD for information on setting up a PHP-enabled Web server if you do not have one). This script writes three lines of data to a file: a separator, the time and the number of bytes written to the file on the second line (collectively, an 'item'). It also outputs this final line to the Web user.

Notice that the script opens fputs.data in mode 'a' - that is, opened for writing only, created if it does not exist with a file pointer placed after the last line of the file. This means that when fputs.php is viewed a second time, the latest time will be appended to the end of fputs.data. A useful exercise would be to change this script to place the most recent write to fputs.data at the beginning of the file.

You may also have noticed that the file pointer is closed with the fclose() function. It is important that this takes place when writing to a file opened with fopen(), since if the file pointer is not closed all data written to the file will be lost.

Reading Structured Data

In order to read back the data from fputs.data one item at a time, we will need to approach the reading of the file differently than last month. This requires the construction of an algorithm that locates the first line of an item by the "--\n" separator and assumes that the next two lines are the time and bytes written strings (more sophisticated algorithms in later months will not make such assumptions). This kind of algorithm is called a parser (pronounced 'par-zer').

<HTML>
<BODY>
<TABLE WIDTH=500 CELLPADDING=1>
<?
$fp = fopen("./fputs.data","r");
while(($s = fgets($fp,1024)) && !feof($fp)) {
if($s[0] == '-' && $s[1] == '-') {
if(($time = fgets($fp,1024)) && ($bytes = fgets($fp,1024))) {
echo "<TR>\n<TD>\n$time</TD>\n\n";
echo "<TD>\n$bytes</TD>\n</TR>\n";
} else {
echo "<TR><TD>Data file corrupted!\n</TD></TR>";
}
} else {
echo "<TR><TD>Data file corrupted!\n</TD></TR>";
}
}
?>
</TABLE>
</BODY>
</HTML>

Save this as parser1.php in the same directory as the fputs.data file and request it from your Web server. All the items you have created with fputs.php are returned in an HTML table, which reflects the structure of our data.

parser1.php introduces a few new features of PHP as well as the concept of parsing. To look at this step by step: first, the script opens fputs.data file for reading and places the file pointer at the beginning of the file. It then enters into a loop reading one line at a time. The conditions for the loop continuing are a) that the call to fgets() returns a true result (that is, it reads from fputs.data) and b) the current file pointer is not at the end of the file.

These two conditions are conjoined by a 'logical AND' - &&. That is, only when both conditions are true is the loop executed (a logical OR, denoted by ||, would have allowed either value to be true for the loop to be executed). See www.php.net/manual/en/language.operators.logical.php for more information.

Next the script checks if the first and second characters returned by fgets() are dashes ("-"), thereby matching the item separator. The script achieves this by comparing the first byte of the string s ($s[0]) to dash and the second byte ($s[1]) to dash. If both match dash, the script continues to parse the item; otherwise, it assumes the data is corrupted and tells the Web reader this.

If the script has found an item, it reads the time line and the bytes written line. Like before, it only continues if both results are true. The script then outputs the time and bytes written lines intermixed with HTML tags.

Notice how simple it is to embed HTML into your scripts in this way. If you cannot work out how parser1.php is constructing the HTML table, have a look at the HTML source in your Web browser - you will notice that the loop constructs all the table cells.

The final point of interest in parser1.php is the number of bytes it reads from each line. Why 1024? The simple answer is that we do not want to assume that all lines will be short. This assumption would cause chaotic results if the fputs.data files became corrupted, effectively breaking the data corruption detection.

Join the PC World newsletter!

Error: Please check your email address.

Our Back to Business guide highlights the best products for you to boost your productivity at home, on the road, at the office, or in the classroom.

Keep up with the latest tech news, reviews and previews by subscribing to the Good Gear Guide newsletter.

Gavin Sherry

PC World
Show Comments

Essentials

Microsoft L5V-00027 Sculpt Ergonomic Keyboard Desktop

Learn more >

Lexar® JumpDrive® S57 USB 3.0 flash drive

Learn more >

Mobile

Lexar® JumpDrive® S45 USB 3.0 flash drive 

Learn more >

Exec

Lexar® JumpDrive® C20c USB Type-C flash drive 

Learn more >

Audio-Technica ATH-ANC70 Noise Cancelling Headphones

Learn more >

HD Pan/Tilt Wi-Fi Camera with Night Vision NC450

Learn more >

Lexar® Professional 1800x microSDHC™/microSDXC™ UHS-II cards 

Learn more >

Budget

Back To Business Guide

Click for more ›

Most Popular Reviews

Latest News Articles

Resources

PCW Evaluation Team

Azadeh Williams

HP OfficeJet Pro 8730

A smarter way to print for busy small business owners, combining speedy printing with scanning and copying, making it easier to produce high quality documents and images at a touch of a button.

Andrew Grant

HP OfficeJet Pro 8730

I've had a multifunction printer in the office going on 10 years now. It was a neat bit of kit back in the day -- print, copy, scan, fax -- when printing over WiFi felt a bit like magic. It’s seen better days though and an upgrade’s well overdue. This HP OfficeJet Pro 8730 looks like it ticks all the same boxes: print, copy, scan, and fax. (Really? Does anyone fax anything any more? I guess it's good to know the facility’s there, just in case.) Printing over WiFi is more-or- less standard these days.

Ed Dawson

HP OfficeJet Pro 8730

As a freelance writer who is always on the go, I like my technology to be both efficient and effective so I can do my job well. The HP OfficeJet Pro 8730 Inkjet Printer ticks all the boxes in terms of form factor, performance and user interface.

Michael Hargreaves

Windows 10 for Business / Dell XPS 13

I’d happily recommend this touchscreen laptop and Windows 10 as a great way to get serious work done at a desk or on the road.

Aysha Strobbe

Windows 10 / HP Spectre x360

Ultimately, I think the Windows 10 environment is excellent for me as it caters for so many different uses. The inclusion of the Xbox app is also great for when you need some downtime too!

Mark Escubio

Windows 10 / Lenovo Yoga 910

For me, the Xbox Play Anywhere is a great new feature as it allows you to play your current Xbox games with higher resolutions and better graphics without forking out extra cash for another copy. Although available titles are still scarce, but I’m sure it will grow in time.

Featured Content

Latest Jobs

Don’t have an account? Sign up here

Don't have an account? Sign up now

Forgot password?