New peer-to-peer (P2P) technology has been designed to speed file transfers through the sharing of non-identical files. The system promises to make online file sharing up to five times faster, and could potentially add to the efficiency of software updates and data transfers to and from portable storage devices.
Traditional P2P file-sharing services work by identifying identical files in a P2P network, and simultaneously downloading different chunks of the desired file from multiple sources. The speed with which a file can be obtained is generally determined by the number of sources available, which relates to the popularity of the file.
While popular files can typically be obtained reasonably quickly, rarer files with few identical sources may take quite some time to download. By configuring P2P networks to identify data chunks that are similar, - but not necessarily identical - to the desired file, the newly-designed technology, called Similarity-Enhanced Transfer (SET), is expected to greatly increase the number of potential sources for downloads, hence boosting transfer speeds.
No one knows the degree of similarity between data files stored in computers around the world, but analysis suggest the types of files most commonly shared are likely to contain a number of similar elements, researchers say. Many music files, for instance, may differ only in the artist-and-title headers, but are otherwise 99 percent similar.
"In some sense, the promise of P2P has been greater than the reality," said David G. Andersen, an assistant professor of computer science at the Carnegie Mellon University. "This [SET] is a technique that I would like people to steal ... it would make P2P transfers faster and more efficient."
Depending on the number of sources readily available, researchers expect SET to speed transfers by anything from 5 to 500 percent. Under recent testing, SET was found to improve the P2P transfer time of an MP3 music file by 71 percent. When the system was configured to download data from files that were 47 percent similar, a larger 55MB movie trailer was downloaded with a speed increase of about 30 percent.
In cases where files were so popular that available sources are already fast enough to use up all the receiver's bandwith, however, advantages may not be so great. During the download process, SET constantly spends time and bandwidth searching for other files, which researchers found could add about 0.5 percent to the download time.
As SET is based on identifying 'fingerprints' based on cryptographic hash functions that are fundamental to online transactions, Anderson said the technique is not likely to increase the risk of corrupted downloads.
"The chances of different data producing the same fingerprint are basically infinitesimal," he said. "These [cryptographic hash] functions underlie important parts of secure online commerce and data encryption."
P2P file sharing is notorious for its use in software, movie and music piracy; however, Andersen stresses that it is also widely used in other applications, such as the distribution of free and open source software.
"As the [P2P] technology matures, I think we'll see it being used for more and more legal uses," he said. "For instance, we're pretty sure based on other people's previous studies that SET will be useful for distributing software such as GNU/Linux; there's often substantial similarity between different releases of the operating system."
Although the researchers hope to implement SET in a service for sharing software or academic papers, Andersen said they have no intention of applying it themselves to movie- or music-sharing services. Current research is focussed instead on extending SET techniques to efficiently locate similar content that is already present on a user's hard drive, to speed data transfers.