Global Lambda Integrated Facility

Teraflow Testbed: High Performance Flows for Large Distributed Data Archives

Teraflow Testbed

Websites

http://www.teraflowtestbed.net/
http://www.ncdm.uic.edu/

Collaborators

Australia: University of Melbourne
China: Chinese Academy of Sciences (CAS), Computer Network Information Center; National Astronomical Observatories
Germany: Max-Planck-Institut für Plasmaphysik, Garching Computing Centre
Japan: University of Tokyo, Institute for Cosmic Ray Research
The Netherlands: SARA Computing and Networking Services; University of Amsterdam
South Korea: Korea Astronomy and Space Science Institute; Korea Institute of Science and Technology Information (KISTI)
United States: University of Illinois at Chicago, National Center for Data Mining; Johns Hopkins University ; University of California San Diego (UCSD); NASA Goddard Space Flight Center

With support from StarLight (United States); TransPAC (United States); JGN2 (Japan); KREONet2 (South Korea)

Description

The TeraFlow Project is developing data mining middleware to transport, explore and mine high-volume data flows. The Teraflow Project supports the development of several tools and applications, including UDT for high volume data transport, SOAP* for high-performance web services, and applications in several domains including astronomy, bioinformatics, and sensor networks, built over UDT, SOAP*, and related tools.

Part of the project includes the operation of the Teraflow Testbed, an international application testbed for exploring, analyzing, integrating and detecting changes in massive and distributed data over wide-area high-performance networks.

The Teraflow Testbed has nodes in Chicago, Kingston, Amsterdam, Geneva, Daejeon, and Tokyo connected by 1 Gbps and 10 Gbps wide-area networks. The Teraflow Testbed is currently used to distribute the Sloan Digital Sky Survey data to researchers worldwide. It is also used in experiments to detect changes in high volume data flows.

Teraflow Demonstration Topology

The Teraflow Testbed 2, introduced at SC'06, will soon be extended from Chicago to UCSD/Calit2 over CAVEwave. Teraflow Testbed 2 will support persistent data services for moving large scientific data sets (Sector); persistent data services for real-time analysis of distributed streaming data (Angle); and, next-generation distributed storage, data, and integration services. It will have several dedicated paths using fiber from NLR, as well as several shared optical paths to other sites. Using a subset of this Testbed (illustrated here), NCDM won the SC06 Bandwidth Challenge. Its entry, "Transporting Sloan Digital Sky Survey Data using SECTOR", sustained a disk-to-disk data-transfer rate of 8Gbps over a shared 10Gbps routed link between SC'06 (Tampa), UIC and StarLight, with a peak rate of 9.18 Gbps. StarLight network engineers greatly assisted.

Supported by NSF OCI-0430781 for the period 1 October 2004 – 30 September 2007. Principal Investigators: Robert Grossman (University of Illinois at Chicago) and Alex Szalay (Johns Hopkins University).