Distributed Monte Carlo Production for Joel Snow Langston University DOE Review February 2010
Outline Introduction FNAL SAM SAMGrid Interoperability with OSG and LCG Production System Production Results Summary DOE Review February 2010 Joel Snow Langston University 2
Introduction Covers my tenure as production coordinator Simulation data (MC) crucial to physics analysis Tevatron luminosity and hence raw data volume is at record levels Challenge for analysts and production Personnel & computing resources migrating to LHC experiments DZero strategy Increase automation Leverage resources and support DOE Review February 2010 Joel Snow Langston University 3
Evolution Mature experiment, but nimble history of adopting innovative technologies distributed data handling - SAM early adopter of the grid for production - SAMGrid significant investment in these technologies Grid technology allows opportunistic usage DZero can mix traditional dedicated and opportunistic resources Grid interoperability Leverages resources and support, reduces personnel needs per CPU hour DOE Review February 2010 Joel Snow Langston University 4
Sequential data Access via Metadata Fermilab system first used by DZero SAM data handling system predates grid Set of servers working together to store and retrieve files and metadata Permanent storage and local disk caches Database tracks location, metadata of files, job processing history Delivers files to jobs (using GridFTP over WAN), provides job submission capabilities DOE Review February 2010 Joel Snow Langston University 5
SAMGrid Fermilab developed grid first used by DZero for global MC production in 2004 SAMGrid = SAM + Job and Information Management (JIM) components Provides the user with transparent remote job submission, data processing and status monitoring. VDT based (Globus + Condor) Logically consists of Multiple execution sites Resource selector Multiple Job Submission (Scheduler) sites Multiple Clients (User Interface) to Submission site. DOE Review February 2010 Joel Snow Langston University 6
SAMGrid Interoperability As Open Science Grid (OSG) and LHC Computing Grid (LCG) became operational it was desirable to leverage these resources for DZero FNAL and DZero developed and deployed SAMGrid interoperability with both LCG and OSG resources Execution site acts as a Forwarding node packages SAMGrid jobs for OSG/LCG job submission via Condor-G/Glite DOE Review February 2010 Joel Snow Langston University 7
Consolidation, Automation, Exploitation SAMGrid sites require operational manpower and expert support People power and FNAL support migrating to LHC experiments Increase automation - Automc Reduce number of SAMGrid sites, increase use of OSG and LCG comes with support provides opportunistic job slots DOE Review February 2010 Joel Snow Langston University 8
MC production gets work from the SAM Request System Physics groups' MC requests are parametrized and prioritized as a Python object Production System DOE Review February 2010 Joel Snow Langston University 9
Automatic Monte Carlo Request Processing Developed Automc System in use at FNAL Handles official DZero MC production at all but 2 sites From approved request to final data storage Easy to use minimizes manpower needs Site independent deploy for any grid site (SAMGrid, OSG, LCG) capable of managing many sites Handle recovery of common failures Integrates with existing MC request priority protocol DOE Review February 2010 Joel Snow Langston University 10
AutoMC Monitoring Running at FNAL & managing production at 32 sites http://www-d0.fnal.gov/computing/mcprod/dajd/dajd_status.html DOE Review February 2010 Joel Snow Langston University 11
Production System Resources MC production uses a variety of dedicated and opportunistic resources on 4 continents 1 Non-grid site at ccin2p3 Lyon (FR) productive, flexible Native Samgrid sites: FZU (CZ), GridKa (DE), LUHEP (US), USTC (CN), Wuppertal (DE) LCG resources CE's in FR, NL & UK, Samgrid-LCG infrastructure in FR, UK, NL, & DE OSG resources OSG resources CE's, SE's, and Samgrid-OSG infrastructure in US & BR DOE Review February 2010 Joel Snow Langston University 12
MC Production Results Looking back at the last 30 days Averaging 2.3M events per day and totaling 68.2M events in 30 days DOE Review February 2010 Joel Snow Langston University 13
MC Production Results Looking back at the last year (2009/01/01-2010/01/01) cumulative since September 2005. Averaging 21.8M events per week and totaling 1.1B events in a year DOE Review February 2010 Joel Snow Langston University 14
MC Production Results Looking back at the last year by production segment 52 week averages per week (2009/01/01-2010/01/01) Non-grid: 10.4M, OSG: 7.22M, Samgrid: 4.15M, LCG: 0.01M DOE Review February 2010 Joel Snow Langston University 15
MC Production Results Looking back at the last year by production segment Cumulative since September, 2005 Production Last Year By Segment Nongrid OSG Samgrid LCG 52 week totals (2008/01/26-2009/01/26) Non-grid: 335.2M, OSG: 267.0M, Samgrid: 215.0M, LCG: 0.3M 47.0% 33.8% 19.2% <0.1% DOE Review February 2010 Joel Snow Langston University 16
MC Production Geographic Events Last Year: Distribution 1.6% Europe 718M 34.3% N. America 387M Asia 18M S. America 5M 63.7% 0.4% (2009/01/01-2010/01/01) Europe S. America N. America Asia DOE Review February 2010 Joel Snow Langston University 17
MC Production Results Looking back at the last 4+ years (2005/09/05-2010/01/30) cumulative since September 2005. Averaging 12.3M events per week and totaling 2.82B events DOE Review February 2010 Joel Snow Langston University 18
MC Production Results Looking back at the last 4+ years by production segment 4+ year averages per week (2005/09/05-2010/01/30) Non-grid: 5.2M, OSG: 3.3M, Samgrid: 3.6M, LCG: 0.2M DOE Review February 2010 Joel Snow Langston University 19
MC Production Results Looking back at the last 4+ years by production segment Cumulative since September, 2005 Production Last Year By Segment Nongrid OSG Samgrid LCG 4+ year totals (2005/09/05-2010/01/30) Non-grid: 1.19B, OSG: 755M, Samgrid: 824M, LCG: 48.0M 42.3% 26.8% 29.2% 1.7% DOE Review February 2010 Joel Snow Langston University 20
Production Results Last 6 Years 900 800 700 600 500 400 300 200 100 0 80 70 60 50 40 30 20 10 0 DZero MC Production in Millions of Events 2004 2005 2006 2007 2008 2009 DZero MC Production in Terabytes of Data 2004 2005 2006 2007 2008 2009 DZero MC Production in Millions of Events Period Total Non-Grid SAMGrid OSG LCG 2008/12/26-2009/12/26 789.3 316.3 211.7 256.4 5 2007/12/26-2008/12/26 794.8 315.6 213.6 259.7 5.8 LCG OSG SAMGrid Non-Grid 2006/12/26-2007/12/26 398.2 109.1 158.1 96.5 34.4 2005/12/26-2006/12/26 348.0 144.4 195.5 0.5 7.6 2004/12/26-2005/12/26 98.1 68.6 29.5 0.0 0.0 2003/12/26-2004/12/26 42.4 41.8 0.6 0.0 0.0 DZero MC Production in Terabytes of Data Period Total Non-Grid SAMGrid OSG LCG LCG OSG SAMGrid Non-Grid 2008/12/26-2009/12/26 67.4 27.0 18.2 21.8 0.4 2007/12/26-2008/12/26 67.8 26.9 18.4 22.0 0.5 2006/12/26-2007/12/26 31.6 7.3 13.2 8.2 2.9 2005/12/26-2006/12/26 23.0 9.4 13.1 0.0 0.5 2004/12/26-2005/12/26 6.0 4.1 1.9 0.0 0.0 2003/12/26-2004/12/26 1.9 1.9 0.0 0.0 0.0 DOE Review February 2010 Joel Snow Langston University 21
LU DZero MC Production 2005/09/05-2010/01/30 LUHEP produced 11.0 M events and 907 GB data Last year LUHEP produced 4.9 M events and 442 GB data Cumulative since Sept. 2005 DOE Review February 2010 Joel Snow Langston University 22
Condor Q's at LUHEP SAMGrid Last Month OSG DOE Review February 2010 Joel Snow Langston University 23
OU DZero MC Production 2005/09/05-2010/01/30 OUHEP produced 160 M events and 13.8 TB data Last year OUHEP produced 72.9 M events and 6.5 TB data Cumulative since Sept. 2005 DOE Review February 2010 Joel Snow Langston University 24
Summary DZero 's early deployment of grid technology and automation has dramatically increased MC production First deployment SAM distributed data handling system Early SAMGrid deployment Use of OSG and LCG resources through interoperability with SAMGrid First opportunistic usage of OSG Storage Elements Automated MC production system Anticipate adequate MC through the last analysis DOE Review February 2010 Joel Snow Langston University 25