This talk will be on the area of runtime systems, focusing on the proto-runtime approach to modularizing runtime systems for parallel languages. I will show how any parallel runtime can be modularized into three modules. The first is a machine-specific module that exports an abstraction of the hardware. The other two modules are language specific. They "plug in" to the first, and use the interface to interact. In effect they customize the hardware-specific module to add synchronization construct behaviors, and control over which core specific work executes on. The interface provided by the hardware abstraction simplifies the task of defining synchronization construct behavior, allowing sequential thinking to be used when writing the code. The machine-specific module is reused by each language, which further reduces effort to implement a language. The work of tuning the machine-specific portion to take advantage of low-level details of the hardware is inherited into all of the languages, freeing them from this work. The arrangement results in very low overheads for language constructs. I will give experimental results that show a large improvement over native OS threads. I will also cover the "tie-point" theory that explains how such a modularization is possible.