Usenix: Dartmouth expanding diff, grep Unix tools

Dartmouth researchers are working on variants of diff and grep that can parse more complex data structures

With some funding from Google and the U.S. Energy Department, a pair of computer scientists at Dartmouth University are updating the venerable grep and diff Unix command line utilities to handle more complex types of data.

Such updates are needed because "we now tend to have more model-based configuration languages that have meaningful constructs spanning more than one line," said Gabriel Weaver, a Dartmouth graduate student who, along with Dartmouth computer science professor Sean Smith, is creating the variants of grep and diff. Weaver presented the new utilities at a poster session at the Usenix Large Installation System Administration (LISA) conference, being held this week in Boston.

The new programs will allow administrators to extract meaningful data from configuration files, log files and other sources of operational data, the researchers maintain.

Grep and diff are command line-based text analysis tools available in all Linux and Unix distributions. Both are designed to parse documents on a line-by-line basis. Grep offers the ability to search through multiple text files and folders for a specific chunk of text or regular expression. Diff compares two documents and highlights the differences between them.

As with most Unix utilities, the output from either of these programs can be linked, or piped, to other utilities, so they can be incorporated into scripts that automate routine system administration tasks.

The new programs, called Context-Free Grep and Hierarchical Diff, will provide the ability to parse blocks of data rather than single lines. For each new type of data structure, a vendor would provide a pattern library identifying the basic structure of the data, which the software would then use to "extract the constructs of interest from the document," Weaver said.

Such utilities could provide administrators the ability to work with more complex forms of data now being generated by network equipment and infrastructure software. For instance, Cisco's IOS (Internetwork Operating System), which is the company's operating system for its routers and switches, will provide operational data in block-like data structures.

With this data, a tool such as diff "can be too low-level," Weaver said. "Diff doesn't really pay attention to the structure of the language you are trying to tell differences between." He has seen cases where dif reports that 10 changes have been made to a file, when in fact only two changes have been made, and the remaining data has simply been shifted around.

Grep has issues with data blocks as well. "With regular expressions, you don't really have the ability to extract things that are nested arbitrarily deep," Weaver said.

Context-Free Grep is still in the design stage, but should be completed within the next few months. A prototype of Hierarchical Diff has been completed, though the researchers have not posted the code yet.

Google's interest in this technology springs from the company's efforts in cloud computing, where it must automate operations across a wide range of networking gear, Weaver said. The DOE foresees that this sort of software could play a vital role in smart grids, in which millions of energy consuming end-devices would have connectivity of some sort. The software would help "make sense of all the log files and the configurations of the power control networks," Weaver said.

In addition to system administration duties, the utilities could also be used in with non-technical languages as well. They could be used to parse legal documents, for instance, Weaver suggested.

A number of Usenix attendees praised the idea for its potential usefulness. "You wonder why it hasn't been done before," one said. Another commented that such tools could also be really handy for code repositories such as Git.

Joab Jackson covers enterprise software and general technology breaking news for The IDG News Service. Follow Joab on Twitter at @Joab_Jackson. Joab's e-mail address is Joab_Jackson@idg.com

Join the PC World newsletter!

Error: Please check your email address.

Tags unixopen sourceU.S. Energy DepartmentLinuxUtilitiessoftwarenon-Windowsoperating systemssystem managementDartmouth UniversityGoogle

Our Back to Business guide highlights the best products for you to boost your productivity at home, on the road, at the office, or in the classroom.

Keep up with the latest tech news, reviews and previews by subscribing to the Good Gear Guide newsletter.

Joab Jackson

IDG News Service
Show Comments

Essentials

Lexar® JumpDrive® S57 USB 3.0 flash drive

Learn more >

Microsoft L5V-00027 Sculpt Ergonomic Keyboard Desktop

Learn more >

Mobile

Lexar® JumpDrive® S45 USB 3.0 flash drive 

Learn more >

Exec

Lexar® Professional 1800x microSDHC™/microSDXC™ UHS-II cards 

Learn more >

Lexar® JumpDrive® C20c USB Type-C flash drive 

Learn more >

Audio-Technica ATH-ANC70 Noise Cancelling Headphones

Learn more >

HD Pan/Tilt Wi-Fi Camera with Night Vision NC450

Learn more >

Budget

Back To Business Guide

Click for more ›

Most Popular Reviews

Latest News Articles

Resources

PCW Evaluation Team

Michael Hargreaves

Windows 10 for Business / Dell XPS 13

I’d happily recommend this touchscreen laptop and Windows 10 as a great way to get serious work done at a desk or on the road.

Aysha Strobbe

Windows 10 / HP Spectre x360

Ultimately, I think the Windows 10 environment is excellent for me as it caters for so many different uses. The inclusion of the Xbox app is also great for when you need some downtime too!

Mark Escubio

Windows 10 / Lenovo Yoga 910

For me, the Xbox Play Anywhere is a great new feature as it allows you to play your current Xbox games with higher resolutions and better graphics without forking out extra cash for another copy. Although available titles are still scarce, but I’m sure it will grow in time.

Kathy Cassidy

STYLISTIC Q702

First impression on unpacking the Q702 test unit was the solid feel and clean, minimalist styling.

Anthony Grifoni

STYLISTIC Q572

For work use, Microsoft Word and Excel programs pre-installed on the device are adequate for preparing short documents.

Featured Content

Latest Jobs

Don’t have an account? Sign up here

Don't have an account? Sign up now

Forgot password?