Big Bangs, big TOEs, ICT and ‘doing’ science
The LHC, the LCG, the ‘End of Theory’ and ICT to the rescue
Later this year, most likely within a month or two, the biggest scientific apparatus of all times, probably the biggest machine of any sort ever constructed, will start to function. It has been under construction for nine years and is now in the final stages of its pre-commissioning testing.
It is an underground ring almost nine kilometres across and 27 kilometres around, a gigantic tunnel built to reach into the conditions that prevailed at the beginning of the universe; it straddles the border between France and Switzerland. CERN’s Large Hadron Collider, the LHC, accelerates bunches of some 100 billion or so protons (count them) to 99.9999991 per cent of the speed of light to collide head on with similar bunches travelling in the opposite direction.
Each bunch of particles in the LHC is only a few centimetres long and thinner than a human hair; each tiny bundle packs the energy of hundreds of speeding cars or, a few freight trains screaming down the track. The particle bundles will smash together some 30 million times per second producing 600 million direct particle collisions per second. Four giant detectors – one of which is half the size of Notre Dame Cathedral and another that has more iron than the Eiffel Tower – collect data from these collisions. The detectors have more than 100 million channels to funnel the data from the collisions to the CERN (European Organisation for Nuclear Research) data centre.
The LHC aims at confirming – or not – everything we think we know about particle physics. The LHC will also look for the Higgs boson – if found, the particle will help clarify a number of fundamental questions including how particles acquire mass.
The collider will search for new forces and clues to the nature of the so-far undetectable dark matter/dark energy that apparently accounts for most of the mass of the universe. In the process, researchers hope to gain insight into such theories as supersymmetry and the existence of extra dimensions in addition to the three we deal with every day.
With luck, the LHC might help prove one of the theories, such as String Theory – that tie together relativity and quantum physics (no one has done it yet) into a TOE (Theory of Everything). A TOE is needed to explain the Big Bang – the beginning of time and the universe.
None of this, though, will happen without ICT – without a super broadband network and a worldwide network of computing facilities manned by scientists from the world’s greatest nuclear research facilities.
The LCG is the LHC Computing Grid. It will filter the data streams from the experiments – I have read estimates of up to ten petabytes per second – and save only the 100 most promising events for analysis; the rest will be discarded. Nevertheless, approximately 15 petabytes (that is 15 thousand terabytes) of data will be recorded, catalogued, managed, distributed and processed each year. It would take a stack of CDs 20 kilometres high to record this much data.
The LCG’s task is incredibly resource intensive from the computational, storage and data-com perspectives. It will take the data flowing from each experiment and analyse it to reconstruct the physical properties of each event. The LCG will run the data through simulations to see how well the data matches theoretical predictions and will further analyse the data using sophisticated algorithms to search for patterns scientists can use to derive new theoretical descriptions and explanations.
The LHC will generate much more data than any one computing facility or any single research institution can deal with, so the LCG was designed to distribute the load to computers and scientists around the globe.
The scientists who dreamt up the LHC knew from the beginning – it was relatively easy to calculate the magnitude of the data handling requirements – that without ICT capabilities at least as sophisticated as the experiments themselves the investment would be for naught. So, many years before anyone even imagined YouTube and video pushing the Internet towards petabyte traffic rates, teams of scientists and computing specialists began planning the massive computing, and communications facilities needed.
The result is the dedicated, world-spanning, LCG network with communication links about a thousand times faster than the two-megabit connections common in home broadband.
Although the volume of data was easy to calculate and even cut down to size, it took a leap of faith some 15 years ago at the dawn of the Internet age to imagine a network that could handle all that data would be available in time. When the LHC was first conceived, dial-up modems and T-1 or E-1 links were the rule. Nevertheless, the networks and the equipment are ready. Fibre can carry the traffic and there are even routers such as Cisco’s CSR-1, which can handle petabytes of data. The ability of fibre to carry the data was relatively predictable, but the foresight that prompted the industry to design and build equipment to handle levels of traffic far above anything expected, was not.
The network will serve about five or six thousand users at close to 500 institutions; the network took many years to organise, implement and test. The LCG consists of a series of tiers: Tier-0 at CERN handles the initial input and distributes the data to Tier-1, the primary external research facilities that will handle the bulk of the data analysis and manage the permanent data storage for the rest of the grid. Tier-2 institutions will further analyse the data and run simulations. From there, the grid will extend out to smaller specialised research centres and even to desktop and portable computers.
This reliance upon massive amounts of data is not unique to the LHC. The growth of computing, of the Internet and massive data bases have together created new tools to process, analyse and make sense of colossal amounts of data – and the new tools are beginning to change the way we do science.
Chris Anderson is the Editor-in-Chief of Wired, a physicist, and the author of the bestselling book, The Long Tail: Why the Future of Business is Selling Less of More. Anderson is one of the most astute and provocative commentators on how science and technology are changing our world. He recently wrote a hotly debated column called, The End of Theory: The Data Deluge Makes the Scientific Method Obsolete.
In the End of Theory, he claims that the newfound ability to analyse massive amounts of data is more than a question of quantity – it gives us a new way to do science.
In the past, the sequence imposed by scientific method was, observe, ponder, formulate a hypothesis and test it to see if predictions based upon the hypothesis accurately describe the real world or could be proven false. Correlation was not sufficient to prove causation; an unbreakable chain of reasoning had to connect events to prove a cause and effect relationship. In mathematics, a proof had to proceed formally in an unbroken chain from the statement of the problem, step by step, to an irrefutable proof. Today, despite both practical and philosophical objections from many mathematicians, problems are increasingly yielding to brute force solutions run on supercomputers instead of elegantly reasoned proofs. These computer-assisted solutions are often so big they cannot be verified manually.
Given the vast quantity of data involved in certain types of problems, it is almost impossible without sophisticated, computationally intensive, analysis to find patterns and subtle correlations in the data. This use of computers, in itself, is not scientifically heretical; what smells of heresy is Chris Anderson’s contention that science can be run following Google’s “founding philosophy” that although we “don’t know why this page is better than that one, if the statistics say it is, that’s good enough”. Google-ised science says, “with enough data, you don’t need to start with a model, just crank up the computer, feed it data and check the patterns that pop up to find things that science cannot see”.
Yes, brute force computing can help us separate data-wheat from the chaff. Still, I am not sure Anderson means his ‘end of theory’ claim literally or, more likely, provocatively. Nowadays, if computers generate a few good leads and insights, it is enough. I have no doubt that one day, ‘computer-generated-science’ will make very important, original, contributions, especially as something approaching computer intelligence – not human intelligence, but intelligence nonetheless – evolves. In the meantime, I prefer old fashioned, well-reasoned, science instead of the Google-like lists with many meaningless ‘hits’ we have all come to know.
It’s not the end of theory…yet.
Our next Connect-World India Issue will be published later this month. This edition of Connect-World will be widely distributed to our reader base and, as well, at shows where we are one of the main media sponsors such as: India Telecom, New Delhi (11-13 December) and Convergence India, New Delhi (19-21 March 2009).
The theme for this issue will be: Seamless networks and seamless business in a seamless world.
Technologically, if not politically, the world is becoming increasingly interconnected, interdependent and interoperable. Few if any big companies are ‘national’ in the old sense; they no longer exist, work buy supplies and services and sell within the boundaries of a single nation. Supply chains and processes of all sorts reach into other nations and at times circle the globe crossing and interacting with one another, first in one nation than another, in subtle and complex ways.
Nowhere is this more obvious than in India, where broadband connectivity has, seemingly overnight, reinvented the country’s economy and re-written its future. More than just outsourcing, taking over existing processes in behalf of companies in other parts of the world, India is increasingly sourcing its own processes, its own technologies and products.
The speed and seamless interconnection and interoperability of the world’s networks, both wired and wireless, using a wide variety of transmission media and technologies, is now so common it is rarely noticed by the user, but much of India’s growth depends upon just this. This effortless connectivity makes possible the seamless interoperability of business processes and supply chains spread throughout the world. Today’s technology is inventing a seamless, global, work environment that might, one day, lead to a seamless world.
This issue of Connect-World India will explore the influence of information and communication technology upon the transformation of India, and how India is itself transforming the technology and processes and helping create a seamless world.
India 2008 Media Pack; Click here