addressing contention on multicore processors

Friday, April 23rd, 2010

While current multicore architectures include early levels of small core-specific private caches, larger caches, the bus, memory controller, and other components of the memory subsystem are shared among multiple cores. When multiple processes or threads run in tandem, they can contend for these shared resources. This contention occurs when the working set of the neighboring processes or threads exceed the size of the private caches and relies on the shared memory resources. This contention can result in a significant degradation in application performance, counteracting the ability to achieve parallelism on multicore processors. When an application suffers a performance degradation due to contention for shared resources with an application on a separate processing core, we call this cross-core performance interference.

Research insight: Contention for shared resources is severely limiting our ability to realize the promised parallelism of multicore architectures.

The goals: of this research direction is to fully understand the nature of contention as it occurs in current commodity multicore processors, and to exploit this understanding to devise innovative approaches and mechanisms for mitigating the negative impact of this contention in current commodity multicore processors. This negative impact has implications in system performance, utilization, throughput, quality of service, etc.

Impact: Making significant progress in this direction will enable the continuation of Moore’s law via allocating transistors to implement higher degrees of parallel processing. (At the very least for multi-programmed workloads)

My related publications:

  1. “Directly Characterizing Cross Core Interference Through Contention Synthesis” at HiPEAC 2011
  2. “Contention Aware Execution: Online Contention Detection and Response” at CGO 2010
  3. “Synthesizing Contention” at WBIA 2009 (workshop @ MICRO 2009)

practical online application restructuring

Thursday, April 22nd, 2010

The online restructuring of native applications enables an entire class of application optimizations that can only be performed dynamically as they require information that is only available at runtime. However restructuring native application code layout dynamically demands a high level of complexity, while traditionally, the benefit for this cost has not motivated its practical and commercial adoption. The fact that applying such optimizations with natively compiled binary applications has proven to be so difficult can be attributed to two factors: a lack of source level information with binary to binary restructuring, and added complexity and overhead for achieving the online monitoring and code rewriting.

Research insight: The ‘rewriting’ doesn’t have to occur dynamically to enable dynamic restructuring. When not identify the dynamic situations you wish to accommodate and statical specialize a number of instances (or versions) of the application code of interest for a number of these dynamic situations. Then online we simple need to identify which situation (or scenario) we are in and switch instances accordingly.

The goals: We must understand the implications of specializing at varying granularities to understand their trade-offs. We must also demonstrate the unique capabilities of this “scenario based optimization” (SBO) and its practical applications.

Impact: I expect SBO to one day be a commonly used optimization technique. The only challenge currently is the lack of a standardized semantic for performance monitoring hardware and its ABI (application binary interface) and how it is to be interfaced by each layer of the software stack. As our community further establishes these semantics SBO can be easily adopted in numerous application domains.

My related publications:

  1. “Scenario Based Optimization: A Framework for Statically Enabling Online Optimizations” at CGO 2009

general online optimization analyses for a new microarchtiectural environment

Wednesday, April 21st, 2010

Online and dynamic optimization for native binary applications is not yet popular commercially, however  is proving to be one of the most promising research directions for continuing to realize performance optimization opportunities at the binary level. Online optimizers use information that is only available dynamically to predict future application behavior and microarchitectural events and exploit these predictions to allow the executing application to adapt to its execution environment, or allow the environment to adapt to the application. However, the required online analyses cause overhead to the application, and traditionally, in many cases the overhead outweighs the benefits of the optimizations themselves. As a result, effectively achieving online optimization has proved quite challenging, especially at the binary level, since traditional dynamic binary optimizers often limit themselves to perform only the least costly lightweight online analyses.

Research insight: A new microarchitecural environment is emerging, and traditional online binary optimization techniques may no longer be relevant. In this new environment we have sophisticated multicore capabilities and lightweight hardware performance monitoring capabilities, and by leveraging these capabilities intelligently the cost of online monitoring and analysis begins to melt away.

The goals: Demonstrate the opportunity of unobtrusively perform sophisticated and much more beneficial online analyses for dynamic optimizations.

Impact: Successfully showing that reduction in complexity of harnessing binary level online optimization and demonstrating the benefits of leveraging the latest microarchitectural advances to re-approach online optimization problems should lead to evidence of the commercial and practical viability of online optimization for native application binaries.

My related publications:

  1. “A Reactive Unobtrusive Prefetcher for Multicore and Manycore Architectures” at SHCMP 2008 (workshop @ ISCA 2008)
  2. “MATS: Multicore Adaptive Trace Selection” at STMCS 2008 (workshop @ CGO 2008)