Friday, November 14, 2008

CHEP 2009 - Prague

I've always wanted to go to Prague and it finally looks as if the planets are coming into alignment. I just submitted (3) abstracts for the CHEP conference which will be held 3/21-27, 2009. CHEP stands for Computing in High Energy and Nuclear Physics and it's an opportunity for folks such as myself to present papers and give lectures on both the mundane and extraordinary components of our own little computing context. This particular conference is held every (2) years, I think, and carries with it a bit more prestige than the bi-annual HEPiX meetings.
I've always had a romance for Prague -- I imagine it to be a labrynthine, stoney grey web of spires and mystery in the spirit of Kafka. In the 1991 Soderberg movie, Kafka, Prague was an intensely dark and murderous place. The movie rehashed many of Kafka's own themes: isolation, transformation, paranoia, and institutional oppression. I especially liked when the movie moves from black and white to full color once Kafka (Jeremy Irons) enters the Castle -- with all the appropriate elbows and winks to Dorothy in Oz.
Then I recall reading something about Einstein, Kafka, and Freud hanging out in Prague but I don't know if that's real or just a myth...

Anyway, here are the abstracts -- it took a minimal of effort to write them up -- if they get accepted then I'll craft something good.

Title: dCache Storage Cluster at BNL

Abstract content
Over the last (2) years, the USATLAS Computing Facility at BNL has managed a highly performant, reliable, and cost effective dCache storage cluster using SunFire x4500/4540 (Thumper/Thor) storage servers. The design of a discreet storage cluster signaled a departure from a model where storage resides locally on a disk-heavy compute farm. The consequent alteration of data flow mandated a dramatic re-construction of the network fabric.
This work will cover all components of our dCache storage cluster (from door to pool) including OS/ZFS file-system configuration, 10GE network tuning, monitoring, and environmentals. Performance metrics will be surveyed within the context of our Solaris 10 production system as well as those rendered during evaluations of OpenSolaris and Linux. Failure modes, bottlenecks, and deficiencies will be examined.
Lastly, we discuss competing architectures under evaluation, scaling limits in our current model, and future technologies that warrant close surveillance.

Presentation type (oral | poster)
Oral

Primary Authors:
PETKUS, Robert (Brookhaven National Laboratory)

Co-authors:
KARASAWA, Mizuki (Brookhaven National Laboratory)
MCCARTHY, John (Brookhaven National Laboratory)
SMITH, Jason (Brookhaven National Laboratory)

Abstract presenters:
PETKUS, Robert

Track classification:
Hardware and Computing Fabrics



Title: Building a Storage Cluster with Gluster

Abstract content
Gluster, a free cluster file-system scalable to several peta-bytes, is under evaluation at the RHIC/USATLAS Computing Facility. Several production SunFire x4500 (Thumper) NFS servers were dual-purposed as storage bricks and aggregated into a single parallel file-system using TCP/IP as an interconnect. Armed with a paucity of new hardware, the objective was to simultaneously allow traditional NFS client access to discreet systems as well as access to the GlusterFS global namespace without impacting production.
Gluster is elegantly designed and carries an advanced feature set including, but not limited to, automated replication across servers, server striping, fast db backend, and I/O scheduling. GlusterFS exists as a layer above existing file-systems, does not have a single-point-of-failure, supports RDMA, distributes metadata, and is entirely implemented in user space via FUSE.
We will provide a background of Gluster along with its architectural underpinnings, followed by a description of our test-bed, environmentals, and performance characteristics.

Presentation type (oral | poster)
Oral

Primary Authors:
PETKUS, Robert (Brookhaven National Laboratory)

Co-authors:
SMITH, Jason (Brookhaven National Laboratory)

Abstract presenters:
PETKUS, Robert

Track classification:
Hardware and Computing Fabrics


Title: Log Mining with Splunk

Abstract content
Robust, centralized system and application logging services are vital to all computing organizations, regardless of size. For the past year, the RHIC/USATLAS Computing Facility (RACF) has dramatically augmented the utility of logging services with Splunk. Splunk is a powerful application that functions as a log search engine, providing fast, real-time access to data from servers, applications, and network devices. Splunk at the RACF is configured to parse system and application log files, script output, snmp traps, alerts, and has been integrated into our Nagios monitoring infrastructure.
This work will detail our central log infrastructure vis-`a-vis Splunk, examine lightweight agents and example configurations, consider security, and demonstrate functionality. Distributed Splunk deployments or clusters between institutions will be discussed.

Presentation type (oral | poster)
Oral

Primary Authors:
PETKUS, Robert (Brookhaven National Laboratory)

Co-authors:
SMITH, Jason (Brookhaven National Laboratory)
RIND, Ofer (Brookhaven National Laboratory)

Abstract presenters:
PETKUS, Robert

Track classification:
Software Components, Tools and Databases

No comments: