Secure Software Systems (S3)

Speakers:

Gaël Thomas on 09/22/11 - VMKit: A Substrate for Managed Runtime Environments

Abstract: Managed Runtime Environments (MREs), such as the JVM and the CLI, form an attractive environment for program execution by providing portability and safety, via the use of a bytecode language and automatic memory management, as well as good performance, via just-in-time (JIT) compilation. Nevertheless, developing a fully featured MRE, including e.g. a garbage collector and JIT compiler, is a herculean task. As a result, new languages cannot easily take advantage of the beneﬁts of MREs, and it is difﬁcult to experiment with extensions of existing MRE based languages. This talk will describe and evaluates VMKit, a ﬁrst attempt to build a common substrate that eases the development of high-level MREs. We have successfully used VMKit to build two MREs: a Java Virtual Machine and a Common Language Runtime. An extensive study of this infrastructure assesses the ease of implementing new MREs or MRE extensions and evaluates the resulting performance. In particular, it took one person only one month to develop a Common Language Runtime using VMKit. VMKit furthermore has performance comparable to the well-established open-source MREs Cacao, Apache Harmony, and Mono and is 1.2 to 3 times slower than JikesRVM on most of the DaCapo benchmarks.

Bio: Gaël Thomas is an associate professor at Pierre and Marie Curie University (UPMC) in Paris, France, where he joined the faculty in 2006. He is a member of the Regal team, a joint research team of Inria and UPMC that investigates large-scale distributed systems. His main research interests include operating systems, managed runtime environments, large-scale distributed massively multiplayer games, and multicore programming. He received his PhD from UPMC in 2005 and subsequently performed postdoctoral research as a member of the Adele Team at Joseph Fourier University (Université Grenoble I).

Radha Jagadeesan on 09/20/11 - Operational Semantics for Relaxed Memory Models

Abstract: For shared memory imperative programs, Sequential Consistency (SC) is the first memory model that we learn at school. SC enforces a global total order on memory operations that includes the program order of each individual thread in the program. Since modern multicore architectures do not implement SC faithfully and SC disables several compiler optimizations, there has been extensive research into relaxed memory models. The Java Memory Model (JMM) is a tremendous technical accomplishment in this design space. However, the nonstandard technical development of the JMM proves problematic for the use of traditional PL techniques such as simulation and subject reduction. This talk describes our efforts to remedy this problem. We describe an operational semantics for the JMM that captures its full expressivity. Whereas the JMM model of the interaction of locks and data races does not validate roach-motel reordering and various peephole optimizations, our operational model differs from the JMM in these cases and validates these optimizations. Time permitting, we will briefly address the informal English specifications of memory effects in the API documentation of java.util.concurrent. We use type annotations to integrate memory effects into the signatures of components and use our operational semantics to support type-checking methods that validate implementations against these enriched interfaces.

Bio: Radha Jagadeesan received his bachelor’s degree from the Indian Institute of Technology, Kanpur in 1987 and his doctorate from Cornell University in 1991. His research interests are in the general area of programming languages and systems, with specific interests in the applications of logical methods and semantics to these areas.

Thomas W. Reps on 12/13/10 - There's Plenty of Room at the Bottom: Analyzing and Verifying Machine Code

Abstract: Computers do not execute source-code programs; they execute machine-code programs that are generated from source code. Consequently, some of the elements relevant to understanding a program's capabilities and potential flaws may not be visible in its source code. The differences in outlook between source code and machine code can be due to layout choices made by the compiler or optimizer, or because transformations have been applied subsequent to compilation. The talk will discuss the obstacles that stand in the way of using static, dynamic, and symbolic analysis to understand and verify properties of machine-code programs. Compared with analysis of source code, the challenge is to drop all assumptions about having certain kinds of information available and also to address new kinds of behaviors arithmetic on addresses, jumps to ``hidden'' instructions starting at positions that are out of registration with the instruction boundaries of a given reading of an instruction stream, self-modifying code, etc.). In addition to describing the challenges, the talk will also describe what can be done about them.

Bio:Thomas W. Reps is Professor of Computer Science in the Computer Sciences Department of the University of Wisconsin, which he joined in 1985. Reps is the author or co-author of four books and more than one hundred fifty papers describing his research. His work has concerned a wide variety of topics, including program slicing, dataflow analysis, pointer analysis, model checking, computer security, code instrumentation, language-based program-development environments, the use of program profiling in software testing, software renovation, incremental algorithms, and attribute grammars.

His collaboration with Professor Tim Teitelbaum at Cornell University from 1978 to 1985 led to the creation of two systems-the Cornell Program Synthesizer and the Synthesizer Generator-that explored how to build interactive programming tools that incorporate knowledge about the programming language being supported. The systems that they created were similar to modern program-development environments, such as Microsoft Visual Studio and Eclipse, but pre-dated them by more than two decades. Reps is President of GrammaTech, Inc., which he and Teitelbaum founded in 1988 to commercialize this work.

Edward Lee on 12/6/10 - Computing Needs Time

Abstract: Cyber-Physical Systems (CPS) are integrations of computation and physical processes. Embedded computers and networks monitor and control the physical processes, usually with feedback loops where physical processes affect computations and vice versa. The prevailing abstractions used in computing, however, do not mesh well with the physical world. Most critically, software systems speak about the passage of time only very indirectly and in on-compositional ways. This talk examines the obstacles in software technologies that are impeding progress, and in particular raises the question of whether today's computing and networking technologies provide an adequate foundation for CPS. It argues that it will not be sufficient to improve design processes, raise the level of abstraction, or verify (formally or otherwise) designs that are built on today's abstractions. To realize the full potential of CPS, we will have to rebuild software abstractions. These abstractions will have to embrace physical dynamics and computation in a unified way. This talk will discuss research challenges and potential solutions.

Bio: Edward A. Lee is the Robert S. Pepper Distinguished Professor and former chair of the Electrical Engineering and Computer Sciences (EECS) department at U.C. Berkeley. His research interests center on design, modeling, and simulation of embedded, real-time computational systems. He is a director of Chess, the Berkeley Center for Hybrid and Embedded Software Systems, and is the director of the Berkeley Ptolemy project. He is co-author of five books and numerous papers. His bachelors degree (B.S.) is from Yale University (1979), his masters (S.M.) from MIT (1981), and his Ph.D. from U. C. Berkeley (1986). From 1979 to 1982 he was a member of technical staff at Bell Telephone Laboratories in Holmdel, New Jersey, in the Advanced Data Communications Laboratory. He is a co-founder of BDTI, Inc., where he is currently a Senior Technical Advisor, and has consulted for a number of other companies. He is a Fellow of the IEEE, was an NSF Presidential Young Investigator, and won the 1997!
Frederick Emmons Terman Award for Engineering Education.

Kai Li on 11/3/10 - Building a Commercial Deduplication Storage System

Abstract: We all hate tapes, but they have been used to store backup and archive data in data centers for several decades. In 2001, Data Domain set its mission to replace tape libraries by developing deduplication storage system products. Since 2003, Data Domain has launched several deduplication storage appliances and data replication eco-systems for data centers to replace tape libraries. These products can reduce storage footprints, WAN bandwidth requirements, and power consumptions by an order of magnitude. To date, it has deployed over 30,000 systems to about 5,000 data centers, and deduplication storage has now become the new standard for online data protection.

In this talk, I will give a retrospective overview of developing a commercial deduplication storage system for data centers, and a summary of the lessons we have learned in building and deploying a commercially successful deduplication storage product line.

Bio: Kai Li is a Paul M. Wythes '55, P'86 and Marcia R. Wythes P'86 Professor at Princeton University, where he worked as a faculty member since 1986. Before that, he received his PhD degree from Yale University, M.S. degree from University of Science and Technology of China, Chinese Academy of Sciences, and B.S. degree from Jilin University. He has led several influential projects such as Shared Virtual Memory (SVM) project which proposed and prototyped the initial Distributed Shared Memory for a cluster of computers, Scalable High-performance Really Inexpensive MultiProcessor (SHRIMP) project which developed fast protected user-level communication mechanisms for clusters, Scalable Display Wall project for large-scale data visualization, and Content-Aware Search System project for large feature-rich datasets. He was elected as ACM fellow in 1998. In 2001, he co-founded Data Domain, Inc. Since then, he served as CTO and consulting Chief Scientist.

Gernot Heiser on 10/13/10 - The Road to Trustworthy Systems

Abstract: Computer systems are routinely deployed in life- and mission- critical situations, yet in most cases their security, safety or dependability cannot be assured to the degree warranted by the application. In other words, trusted computer systems are rarely really trustworthy.

We believe that this is highly unsatisfactory, and have embarked on a large research program aimed at bringing reality in line with expectations. In this talk describes NICTA's research agenda for achieving true trustworthiness in systems. The first phase has been concluded, with the world's first formal proof of functional correctness of a complete OS microkernel. The second phase, in progress, aims at making dependability guarantees for complete real-world systems, comprising millions of lines of code.

Bio: Gernot Heiser holds the John Lions Chair of Operating Systems at the University of New South Wales UNSW), and leads the Trustworthy Embedded Systems (ERTOS) group at NICTA, Australia's National Centre of Excellence for ICT Research. He joined NICTA at its creation in 2002, and before that was a full-time member of academic staff at UNSW from 1991. His past work included the Mungi single-address-space operating system, several un-broken records in IPC performance, and the best-ever reported performance for user-level device drivers.
In 2006, Gernot with a number of his students founded Open Kernel Labs, now the market leader in secure operating-systems and virtualization technology for mobile wireless devices. The company's OKL4 operating system, a descendent of L4 kernels developed by his group at UNSW and NICTA, is deployed in more than 750 million mobile phones. This includes the Motorola Evoke, the first (and to date only) mobile phone running a high-level OS (Linux) and a modem stack on the same processor core.
In a former life, Gernot developed semiconductor device simulators and models of device physics for such simulators, and pioneered the use of three-dimensional device simulation for the characterization and optimization of high-performance silicon solar cells.

Matt Might on 10/6/10 - Challenges in the Static Analysis of Dynamically Typed, Higher-Order Languages

Abstract: Static analysis holds the key to faster, safer, more robust programs. A static analyzer is a program that can predict and bound the behavior of other programs at compile time. As such, compilers and software engineers can use these predictions and bounds to optimize a program's computation, demonstrate the absence of security flaws, compare computed invariants against a specification or even find and rule out dependencies that block parallelization.

This talk will explore the challenges in the static analysis of dynamically typed, higher-order languages (e.g., Scheme, JavaScript, Python). The first part of this talk will be a tutorial on the small-step analysis methodology; and the second part will explore recently developed techniques within the small-step framework that can improve precision, speed and reasoning power.

Bio: Matt Might is an assistant professor in the School of Computing at the University of Utah, where he leads the U Combinator program analysis research group. His primary research interest is static analysis of higher-order programs with applications to performance, parallelism, security and correctness. Dr. Might's other interests include language design, compiler architecture and functional programming. He received his Ph.D. in Computer Science from Georgia Tech in 2007.

Emina Torlak on 9/10/10 - MemSAT: Checking Axiomatic Specifications of Memory Models

Abstract: Memory models are hard to reason about due to their complexity---they need to strike a balance between ease-of-programming and allowing compiler and hardware optimizations. This talk presents an automated tool, MemSAT, that helps in debugging and reasoning about memory models. It takes as input an axiomatic specification of a memory model, as well as a multi-threaded test program containing assertions, and outputs a trace of the program for which the assertions are satisfied, if one can be found. MemSAT is fully automatic and is based on a SAT solver. If the tool cannot find a trace, it outputs a minimal subset of the constraints that are unsatisfiable.

We used MemSAT to check several existing memory models against their published test cases, including the current Java Memory Model by Manson et al., and a revised version of it by Sevcik and Aspinall. We found subtle discrepancies between what was expected and the actual results of test programs.

Bio: Emina Torlak received her Ph.D., M.Eng. and B.Sc. in Computer Science from MIT. She is currently a member of the programming languages and software engineering department at IBM's T. J. Watson Research Center, working on SAT-based analysis of memory models and on programming tools based on symbolic execution. Her previous work focused on the development of a scalable SAT-based constraint solver with applications to design analysis, code checking, test-case generation, and declarative configuration.

Secure Software Systems

Computer Science Purdue University