Some more pictures and information about BlueGene computers at Daresbury.
The first is the BlueGene/L installed c.2005. BlueGene computers originally had a trapezoidal case to improve the circulation of cooling air.
Each Blue Gene/L compute or I/O node had a single ASIC (processor) with associated DRAM memory chips. The ASIC integrated two 700 MHz PowerPC 440 embedded cores, each with a double-pipeline double-precision Floating Point Unit (FPU), a cache sub-system with built in DRAM controller and the logic to support multiple communication sub-systems. The dual FPUs gave each Blue Gene/L dual-core processor (node) a theoretical peak performance of 5.6 Gflop/s.
Compute nodes were packaged two per compute card, with 16 compute cards plus up to 2 I/O nodes per node board. There were 32 node boards per cabinet giving a total of 1,024 nodes or 2,048 cores. By the integration of all essential sub-systems on a single chip, and the use of low power logic, each compute or I/O node dissipated only about 17 Watts, including the memory. This allowed 1,024 compute nodes, plus additional I/O nodes, to be housed in a single 19-inch rack, with sufficient power supply and air cooling.
Photos of partly populated BG/L board.
There is a BlueGene/L in Jim Austin's Computer Museum. This one came from EPCC.
The BG/L was replaced by a BG/P c.2007. BlueGene/P was the second generation system with quad-core PowerPC 450 processors running at 850 MHz. A compute card had one processor plus 2 or 4 GB memory and capable of a peak of 13.6 Gflop/s. As in the BG/L, 32 node boards were installed in the rack totalling 4,096 cores. The system was very efficient, giving 371 Mflop/s per Watt.
Photo of BG/P compute card and partly populated node board.
BlueGene/Q is liquid cooled so has a more conventional case shape. In terms of packing density it is even more radical than the BG/P. The processor is an 18-core ASIC chip of which 16 are available for running applications. Each core runs 4 threads of execution and the clock speed is 1.6 GHz. Each processor is mounted on a compute card along with 18 GB of memory.
32 compute cards are mounted on a so called "drawer" or node board and there are 32 node boards per rack giving 16,384 cores per rack. Thus our 7 rack system installed in 2012 had 114,688 cores in total. The full configuration of the system is rather complex and is explained here.
The one at Daresbury is the 7-rack system known as Blue Joule installed in the Hartree Centre in 2012. It was the 13th fastest computer in the world at the time of installation. The following figures show how it fared as other faster systems were installed around the globe. For full information see TOP500.
Date Rank System Cores Rmax (TFlop/s) Rpeak (TFlop/s) Power (kW) June 2012 13 BQC 16C 1.60GHz, Custom 114,688 1,252.2 1,468.0 575 June 2013 18 BQC 16C 1.60GHz, Custom 114,688 1,252.2 1,468.0 575 June 2014 23 BQC 16C 1.60GHz, Custom 131,072 1,431.1 1,677.7 657 June 2015 41 BQC 16C 1.60GHz, Custom 131,072 1,431.1 1,677.7 657
The first picture shows the IBM installation team, happy with progress, and illustrates the internal complexity of the 114 thousand processor system with network cables and liquid cooling pipes.
Here's the completed system.
This is a Compute card from an IBM Blue Gene/Q (specifically the BG/Q at Daresbury in early 2012) showing the PC2-A2 processor with 18 cores running at 1.6 GHz. A Blue Gene/Q system is made up of these cards, 32 per node board, and 1,024 per rack. This doesn't count the I/O board which use a similar design and contains 8 Compute cards per rack. The second photo shows it with the aluminium heat sink and clamp in place.
See also Wikipedia article.