[12pt, a4paper, english]book headings [utf8]inputenc [english]babel [T1]fontenc textcomp verbatim [pdftex]graphicx epsfig epic eepic amsfonts latexsym amssymb paralist enumitem a4wide setspace ifthen pifont tikz [draft]fixme [bookmarks,colorlinks,citecolor=black,linkcolor=black,urlcolor=blue,unicode]hyperref [shell]pdftricks psinputs txfonts pstricks color pstcol pst-plot pst-tree pst-eps multido pst-node pst-eps pifont tikz psinputs plain empty center 1cm 1cm center 1mm Towards a Test Framework for Networked Embedded Systems 1mm Jonas Fonseca Department of Computer Science University of Copenhagen Supervisor: Philippe Bonnet Department of Computer Science University of Copenhagen /Author (Jonas Fonseca ) /Title (Towards a Test Framework for Networked Embedded Systems) /Keywords (Testing;Embedded systems;Testbed) page [2] [2]#1-- #2 [1]#1 page 1 english tocchapterReferences abbrv Conclusion Networked embedded systems have a tremendous potential for changing the future landscape of computing. One obstacle for reaching this goal we believe lies in the fact that testing techniques for networked embedded systems are immature and do not accurately capture the needs characterizing these systems. In academia efforts has been made but none has successfully approached the issues on a broader scale. In this thesis, we have addressed the problem by exploring the research space for testing networked embedded systems. In this process, we have analyzed and identified key challenges involved in testing networked embedded system using a problem solving approach. Based on our findings, a test framework has been proposed, which we think addresses the fundamental problems and issues. Finally, we have developed a partial implementation of the test framework to evaluate our concepts. Discussion If we accept that different test suites find different errors, it must necessarily follow that a combination of test suites is preferable. Today there is a great deal of focus on testing. Testing makes code and the associated development agile and free to explore. In order for testbeds and testing in general to be successful in facilitating the future requirements of research they need to be both versatile and specialized. These conflicting goals suggest that we need to approach this topic from different angles and perspectives. We believes this has been accomplished in this thesis. As an initial step to bring a more systematic testing approach to the research field, we have proposed a candidate fault model for networked embedded systems. We think it can serve as a starting point towards more clearly identifying and understanding the origin of faults in networked embedded systems. Moreover, we argue that it must be a key consideration in order to achieve more systematic testing. A challenge in overcoming this task has been to strike a good approach between generalizing without becoming too abstract. We believe this challenge can only be faced by involvement from the research community. To test complex networked embedded systems requires detailed knowledge about the state of both the individual systems and their synergy with the surrounding. This has let us to explore model-based testing approaches, which we argue have the potential to allow test frameworks to more fully embrace the specific challenges of networked embedded systems. We see this as an important step towards more scalable, expressive and effective testing techniques of networked embedded systems. Specifically, our findings suggest that more advanced types of dynamic and automatic testing become feasible/realizable because the use of models makes test cases less brittle by reducing dependencies on specific sequences of events and behavior. One thing to keep in mind is that some of the same limitations that exist for simulation also applies for model-based testing, namely that the results are directly related to the accuracy and level of detail of the underlying models. A lot of possibilities are also available in the use of artifacts. When regarding testing as merely a search problem, the more we know about the various components of the system under test, the more we are able to reduce the search space. From this perspective, artifacts can play a vital role in making testing more efficient. This is especially true for automated testing, where existing methods of randomization and other evolutionary strategies require a lot of resources with only little yield in return. This is a particularly hard and challenging area to approach, because it involve crossing the boundaries of several development processes and practices. We conclude the discussion by following up on the research questions posed in the introduction: Is it possible to implement a robust test framework for networked embedded systems? The evaluation shows that a test framework is realizable. It has also shown that the robustness depends to a high degree on the underlying testbed infrastructure. What challenges must be faced when designing a test framework for networked embedded systems? We believe that the analysis given in Chapter provides an good overview of the issues, which must be addressed when creating tools for testing embedded systems. Conclude if design goals was reached How do we approach the problem of efficient and systematic testing? With the proposed fault model, we have started the work towards identifying common fault causes. Furthermore, our findings related to the use of artifacts suggests that they can serve as a valuable basis for evaluating various test criteria, such as coverage and adequacy. Together, we believe that this encourages the development of more efficient and systematic testing. Can dynamic testing techniques be applied to extends the set of tested features? The analysis of dynamic testing has explored several techniques and clearly shows that they have a lot of potential in dynamically extending tests. Lessons Learnt Testing is a difficult subject to approach because it necessarily needs to impose certain work flows onto developers in order to be systematic and effective. While the task presents itself as straightforward, there are many practical problems and technical challenges that needs to be considered. Furthermore, solutions must deal with several conflicting goals. The lack of interest and success in the TinyOS community for working towards a more unified and general testbed and test framework is a testament to this. Many attempts have been made to create testbeds for networked embedded systems and each has addresses specific parts of the problem area, however, no significant outcome has resulted. The cause might be that the practical nature of testing makes it a less interesting topic for research. Indeed, many problems are concerned with basic system architecture and infrastructure on a level that requires long-term and financial commitment from institutions and organizations in the research community. From this perspective, to successfully ensure that future research is properly accommodated in terms of its requirements for testing infrastructure a high degree of political involvement and maneuvering is demanded. We have also been surprised to find that there is a general lack in the networked embedded system community to look beyond the self-defined boundaries of the research field for inspiration. This may well be founded in its original decision to shed previous research in the quest for redefining computing and embracing the vision of pervasive systems. Our findings suggest that there is a lot of potential in more openly adopting proven and workable approaches from researches in areas such as aspect-oriented programming. Future Work During the process of designing and implementing the test framework, we have addresses some of the challenges that has been identified, however, there is a great deal of work to be done to realize the concepts and asses the overall utility of the described approach. [Complete and integrate the test framework with existing tools] Work on job and scheduling support must be completed, after which the test framework needs to be integrate with existing tools. Due to the big community surrounding sensor network research, many tools and development frameworks exist. The success of the test framework depends on being present and offered where development is being done. [Extend the fault model] The fault model presented in Section needs to be extended and adjusted. We propose to make a survey, which investigates different types of systems, both scientific and commercial, in order to identify faults based on practical experience. Furthermore, a thorough examination of possible test criteria needs to be done for each of the faults in the model. [Explore dynamic testing techniques] Work also remains in exploring how knowledge from different sources, such as source code and runtime artifacts, can be integrated with the test framework. This has potential to provide crucial information, such as test coverage, and facilitate effective dynamic testing. We also believe that model-based testing and how it is best applied to networked embedded systems are important areas of exploration. [Embrace diversity via testbed federations] The ability of the test framework to facilitate future requirements of research relies on the underlying testbed infrastructure's ability to embrace both diversity of applications and platforms. To overcome these conflicting goals of versatility and specialization, we think that testbed federation is the only choice. This poses new problems such as resource peering and integration across institutional boundaries, which must be explored. [Moving beyond test frameworks to application frameworks] With many of the practical and technical tasks being addressed for providing a solid infrastructure for testing, an interesting topic is to investigate methods in reusing components between test and testbed frameworks and the application frameworks used in deployments. This has the potential to both reduce problems when migrating applications from an initial development infrastructure to the final real world deployment and allow reuse of for example monitoring tools. Evaluation and Testing To evaluate the implementation, this chapter will consider various parts of the system and test their functionality. Each test are founded in the goals and desirable characteristics listed in the introduction, on which a test case based on a small real-world example is described together with expected behavior and results. Jobs are then created for the test cases and the jobs are run on an installation of Re·Schedule on the DIKU Testbed. Problems and unexpected behavior is commented. Limitations and Assumptions Only functional tests are presented in this chapter. Consequently, the user interface itself will not be evaluated. Other parts of the implementation are not tested because they are hard or impossible to test. For example, the parts of the system affected by known problems and issues as listed in the implementation chapter is not considered. It is assumed that all test cases have access to all motes in the testbed. Furthermore, it is assumed that no external sources interfere with the tests. Test Cases Each test case will give a small introduction of the overall purpose and an explanation of the approach, such as algorithm, and the different artifacts that it uses. Following the presentation is a precise specification of the goal of the test case and expected results. Finally, the results of the test is given along with an evaluation comparing the expected and actual results. All the test cases presented below have been built using the TinyOS 1.x version for the motes and the applications are based on code developed for a course in networked embedded systems at DIKU. They are ordered so that simpler tests are evaluated first. Listing Mote Information The first test case will evaluate the basic functionality of the test framework's client API concerned with mote data. It simply lists all available information about each mote and the host to which it is attached. Goal and Expected Results The goal of this test case is to verify that mote information from the database is represented correctly. Specifically, that the relational view sent over the wire via the axis web service is mapped to the more object-oriented view of the client API. The expected result is that mote data is printed for all motes in the testbed. Results and Evaluation The result is as expected. Programming and Starting a Mote This test case will evaluate the basic functionality of the test framework API concerned with controlling motes. It programs a mote and starts it. To confirm that the programming was successful, the mote program simply prints something to the mote console. Goal and Expected Results The goal is to test that motes can be controlled and more specifically be programmed, and that completion events are signaled properly. The expected result is to see the output of the mote program. Results and Evaluation The result is as expected. Small Publish-Subscribe Experiment This test case is based on a small publish-subscribe experiment, where data in the form of imaginary sensor readings is published by a mote and needs to be routed through the network. The same mote program is loaded on all motes, however, to simulate an interesting topology the program contains code for ignoring packets from motes, which are not configured as neighbors. Goal and Expected Results The goal is to test that the test framework supports a small real-world experiment resembling what a student on networked embedded systems course might run. The expected result is to see the motes printing messages to their console to inform about the messages they send and receive. Results and Evaluation The result is as expected. Monitoring Health and Mapping Topology The final test case is to build a small monitoring job, which can be used to access the overall health of the testbed and help to map the topology of the testbed. The job uses a single mote program, which initially executes some on-mote self tests and then listens for network activity. On request the mote program can be instructed to send a sequence of packages. The general structure of the job is to: Get control of as many motes as possible. Program each motes. To avoid detect problems, set a timeout of 10 seconds after which the mote is marked as defunct and disregarded from the rest of the job. Iterate over each mote and start the mote program. Capture results from the self test via the mote console and log them. Request each mote to send a sequence of 10 packages one per second. Log the result of motes reporting that they received the package. Goal and Expected Results The goal is to use all the functionality provided by the implementation to first run a job, second uses multiple motes, third requires advanced control of the job and fourth uses the mote console for on-mote control. We expect that the application can be used to build a topology map of the testbed with a certain degree of precision. Furthermore, we expect the topology to show that all motes are connected to each other. Results and Evaluation The results from running this test case show that it is possible to build a small health monitoring job, which can be used to map the topology of the testbed. Figure shows the topology which was mapped during the test. Unexpectedly, all motes are not connected, which suggests that some of the radios are not capable of transmitting. There was a few obstacles during the development of the test case, which caused demanded considerable extra effort and increased the complexity of the test script. Most notably, it was necessary to workaround motes, which we were not able to program. This was done using timeouts. Secondly, the desire to execute certain control structures sequentially was complicated by the asynchronous interface. Finally, in terms of functionality this was the most advanced test case. Considering that the script ended up being around 200 lines of JavaScript code, this clearly shows that the test framework is quite expressive. Summary of Results We have shown that the implemented test framework provides the basic functionality for creating and running both simple and advanced experiments. Most of the scripts are small, which shows that the the interface is expressive. However, the asynchronous program model turned out to make some of the test cases very complex. Furthermore, there are several reoccurring patterns throughout the scripts, which suggests that they should be made part of the API. The experience from this evaluation shown that a lot of care needs to be taken when developing test cases for the test framework. As mentioned, it was necessary to use timeouts to gracefully handle non-responding motes. This was disruptive and had a very negative effect on the overall impression in terms of the usability of the testbed. It suggests that it is important to be able to continuously and automatically monitor the health of the testbed itself, in order for these issues to not degrade the quality of the service provided by the testbed. Implementation Overview Unreliability we can deal with in softwareGoogle Based on the analysis and proposed design, a partial implementation of the proposed test framework has been developed. The main goal of the implementation has been to prove the concept of the underlying ideas and to serve as a basis for future work. To achieve this, the implementation focuses on extending the Re·Mote Testbed Framework to provide clients greater control of experiments. The argument for this choice is that it is still unclear how exactly support for job scheduling and resource managing is best provided. Moreover, by focusing on extending the client-side of the framework, the implementation can capitalize on the stability of the existing infrastructure, because fewer changes of the server-side is necessary. Since the test framework is developed as an extension of the Re·Mote Testbed Framework, this chapter will first present important parts of the Re·Mote Testbed Framework. The rest of the chapter looks at various parts of the implementation with regard to the decisions that have been taken. Specifically details on the implementation of client-side, such as job control, is given. Finally, some of the known problems and limitations of the implementation are presented to highlight places for improvement and future work. The Re·Mote Testbed Framework While the Re·Mote Testbed Framework was designed with modularity in mind, not all components are easy to extend. Before looking at the implementation details, it is therefore necessary to first get a better overview of the framework in order to figure out which components needs to be changed and address some of the issues that had to be faced. Testbed Framework Overview The Re·Mote Testbed Framework consists of four major components, each with well-defined interfaces and responsibilities. A database provides a shared repository for keeping track of the core testbed configuration and session-level state information. This data is used by several web services responsible for authorizing users, granting access to motes, and acquiring mote status information as well as the mote control infrastructure (MCI), which manages the testbed back-channel. The mote host infrastructure consists of a mote control server to which clients connect, and a series of mote control hosts to which motes are connected. The relation between the different components and entities of the framework is illustrated in Figure . The framework makes use of many different technologies to accomplish modularity. Java was chosen as the language for the client code, because it very portable and has good support for graphical user interfaces. Furthermore, many tools and framework exists for Java that makes it attractive, one of which is Axis that is used in the Re·Mote Testbed Framework for providing the web services and generate client code for accessing them. The mote control infrastructure is the only component, which has been designed and implemented from scratch. This has enabled this critical component of the testbed framework to better address issues, such as efficiency, security and scalability. For example, the mote control infrastructure is written in C++, since it allows greater access to low-level devices necessary for accessing the physical motes. Moreover, the architecture of the mote control infrastructure is based on a central server, through which all mote control directives pass. This avoids exposing the physical infrastructure of the back-channel and concentrates the low-level access to a single entity. As a consequence, custom protocols have been created with C++ and Java libraries for carrying mote control data, which accounts for the centralized architecture. Extending the Re·Mote Testbed Framework From the overview given above, it can be seen that there is a clear and natural boundary between the server- and client-side. In short, clients accesses the services provided by a testbed through the web services and the MCI front-end server. The implementation thus needs to extend upon these services. Part of this task is to create a control layer, so the underlying access to the web services or the MCI server becomes transparent for the client. The only considerable functional change to the existing testbed framework is related with providing clients a method to infer the order of mote events. This has been done by adding timestamp information in the mote control messages sent by the mote hosts. Accurate time tracking thus relies on the system clock on each of the mote hosts to be in sync. One way to achieve this is by using the Network Time Protocol, however, how it is achieved is outside the scope of the framework. To enable better integration and reuse of the different components work has also gone into packaging all components, except the mote control infrastructure, in a Maven repository. Part of this task has been to split the components into modules from which artifacts, such as deployable web applications and extensible client libraries, can be assembled. This motivation behind this has been to enable seamless building of the framework and facilitate future work on the framework. By using Maven, the choice of using Java-based technologies is further consolidated and allows to integrate all the Java code from the existing framework. The rest of this chapter will solely present the implementation of client-specific parts of the test framework. Control Layer A central part of the test framework is the control layer. Its main purpose is to provide transparent access to testbed resources through a uniform interface that enable control of all aspects related to motes. A general goal behind the implementation of the control layer has been to make it portable across different application frameworks. This means that the resulting library is platform independent. The measure of platform independence has been to allow it to compile as part of an application using Google Web Toolkit, a framework for developing rich web applications in Java. For this reason the library depends mostly on self-defined or ubiquitous Java types. Control Library Overview The library has been written with modularization in mind and consists of three main modules: [Application programming interface (API)] Defines the interface through which clients can interact with a testbed and the motes makes available. For example, commands are provided for starting, stopping and programming motes as well as reading and writing to the console of a mote. [Core] A complete runtime that keeps track of the active session as well as motes in the underlying testbed in order to provide the above API. It also has utilities for maintaining various other state information. To increase reusability of the core module, it uses a platform adaptor for accessing application specific resources, such as thread scheduling, and has been kept extensible by providing basic building blocks. [Service provider interface (SPI)] Provides a service manager and a lowlevel ``API'' for plugging in modules that connect to a testbed and serving the actual requests. Several service interfaces are provided each of which are modeled after the web services and the commands in the mote control protocol. To give an idea of how the different modules of the client library are related, Figure illustrates an example of what modules are active at runtime. In the example, the core module uses the service manager in the SPI module and an application provided platform adaptor to provide the API. Two SPI modules are used for interacting with the testbed. Application Integration Since the control library is envisioned to be part of a larger entity such as a stand-alone application or the control component of a testbed job, it is very important that the library can be seamlessly integrated with the underlying application. It is therefore relevant to go over some of the choices concerning this problem. The library is not directly concerned with session creation including authentication. The main reason is similar to the reason for using a platform adaptor that applications need to be able to tailor certain parts of the library runtime to make best use of the available resources. For example a web application might have a separate login page, making authentication unnecessary, while other clients might automatically authenticate sessions based on a configuration file. Furthermore, it depends on a platform adaptor and SPI modules so that the underlying communication and application interaction can easily be changed. This means that it is possible to use one set of SPI modules, which uses sockets and web services, for a client which connects via the Internet, and another set of SPI modules, which directly accesses the central data and communicates with the server, for a client that is part of a job being executed on a testbed server. Command Line Interface To provide a basic tool for using the client library and experimenting with the ideas of the test framework an application in the form of a command line interface has been created. The main goal behind the decision to only offer a command line interface has been to allow the implementation to focus on prototyping the core functionality and be less concerned with usability issues. In the following, an overview of what the functionality the tool offers as well as other work related to the application will be presented. Functionality Overview The functionality supported by the command line tool is made accessible via several subcommands. They can be divided into interrogative commands and manipulating subcommands. The former provides the ability to query testbed resources, such as listing the available motes. It also allows testbed authentication credentials to be printed and tested, which can be useful for new users when creating a profile for a testbed. The rest of the commands allows a user to interact with the motes in a testbed and run experiments. For very simple experiments, there is a command for executing a program on a mote, which first programs the mote and then starts it. For more complex experiments there is a command for running scripts, which can extend the command line interface. The scripts has access to arguments given on the command line and a session, which gives access to motes in the testbed. Several scripts written in JavaScript has been developed, some of which reimplement commands mentioned above. Finally, there is a command for running jobs. It takes a job jar file as an argument and executes the containing control code. Application Libraries As part of the command line tool, two application-oriented libraries have been created. The goal with this task has been to ease the creation of new tools by offering reusable components. In turn, it has helped to reduce some of the complexity of the created application. The first library provides core functionality common to all applications. It implements a platform adaptor which can be used together with the client library. Furthermore, it also has support for reading user profiles from configuration files and automatically authenticating sessions. Finally, it contains a factory for creating scripting engines and setting up the script environment and bindings to core objects common for job experiments, such as the session. The other library is specifically for handling jobs. In the implementation, jobs are defined as a jar file, since this format is very well supported by the Java standard library. Part of the library is concerned with loading jobs and providing access to the job description and control code. The library also has a simple job manager to interface with the job control code as well as support for control code in the form of scripts. Known Issues and Limitations Although the implemented tool has been found useful it should be regarded as a prototype. Below, various known problems and limitations are described: [Profiles are mandatory] The command line interface requires that the user specifies or creates a default profile properties file. The motivation is that certain testbed properties, such as the URL of the web services, needs to be known upfront to simplify access to the testbed. [Session creation and scripts] When executing scripts and jobs, the command line interface will always connect to the testbed before execution is started. The reason is that the authentication process is highly application specific and difficult to expose to scripts and jobs. [Exiting from scripts] Management of the execution life-cycle from scripts and jobs is limited. For example, scripts are not able to hook into the client runtime and receive error notifications. [Printing to standard out] Due to how informational messages are handled internally, all errors will be printed to standard out instead of standard error. Furthermore, it can problematic to mix the use of printing from scripts with printing utilities exposed by the command line interface. The Re·Schedule Test Framework To address the issues we have raised in the previous chapters, we present a rudiment design for a test framework called Re·Schedule. It is inspired by concepts from the aspect-oriented programming paradigm and adheres to the guidelines and recommendations presented in Section . The design is mainly concerned with the configuration and control of experiments, however, we will also briefly consider how non-interactive experiments are scheduled and run. While the Re·Schedule Test Framework is designed with the infrastructure of the Re·Mote Testbed Framework in mind, we will not consider specific details related to this association. Aspect-Oriented Testing To help develop a philosophy and guide the underlying approach in the design, we will first introduce the concept of aspect-oriented programming as an inspirational source. Next, we apply the concepts to testing of networked embedded system. The Aspect-Oriented Programming Paradigm Aspect-oriented programming is a programming paradigm, which has been developed to address some of the difficulties with clearly expressing certain design decisions using the abstractions and composition mechanisms available in general purpose languages. These design decisions tend to cross-cut the system's basic functionality, whereby they are hard to modularize and end up being scattered through-out the code base and leading to entangled code. Examples of such cross-cutting concerns are logging, security, and error handling. The primary goal of the aspect-oriented programming paradigm is to provide a mechanism for modularizing these cross-cutting concerns into separate programming units called aspects, whereby they become isolated and possible to reuse. The more formal definition of when something is an aspect depends on whether or not ``it can be cleanly encapsulated in a generalized procedure'', be it object, method, etc. To illustrates the philosophy of the relationship between normal programming units and aspects, we will use an analogy paraphrased from : Imagine a mythical world inhabited by dragons and hunchbacks. The hunchbacks all dwell in houses with glass ceilings and work most of their day only communicating by sending messages to each other. Being hunched over, the hunchback are unaware of the existence of dragons, which fly around above them and observe their behavior. The dragons, curious by nature, regularly keep track of the mail correspondence without interfering with it. Occasionally, the dragons leap into action and, for example, repaint one of the houses. The hunchbacks will notice these changes, but continue with their everyday tasks, oblivious to the existence of the dragons. In the analogy, the dragons represent aspects, which are capable of augmenting the behavior of the rest of the system, represented by the hunchbacks, however, without the need or mean to change the underlying static model of the system. In other words, aspects offer the ability to inject new behavior into an existing system dynamically, transparently and with almost surgical precision. Applying Aspect-Oriented Programming Approaches In order to apply the concepts given above, we will first reiterate the main challenges we want to address. A distributed application can be viewed as consisting of multiple state-machines, each of which work autonomously. Keeping track of each state is difficult and cumbersome. Furthermore, there clearly exists various parameters of interest to testing, which are cross-cutting the application and not tied to any single mote or components. Finally, as we have pointed out certain aspects of experiments cannot clearly be expressed or is outright cumbersome to express in a general purpose language and can lead to tests becoming brittle. The basic idea behind applying concepts from aspect-oriented programming to networked embedded systems comes from the observation that the sort of low-intrusion approach of programming by difference, which is proposed in the paradigm, is somewhat similar to our desire to observe and modify state for the system as a whole. In this regard the use of aspects offer an additional level of abstraction, where the characteristics of networked embedded system can be build into the model itself. More concretely, aspects in the form of observable behavior acts as points, where cross-cutting invariants can be inspected by intercepting messages. These observations allow us to trace the application's control and data flow and reason about what is going on. Besides passive aspects that simply track the state, aspects can augment the system behavior by injecting messages or sensor data into an experiment. Consequently, the application of concepts from the aspect-oriented programming paradigm enables a more implicit method of reducing the complexity of testing invariants in the system under test. To exemplify how the fundamental units of the aspects-oriented programming paradigm can be mapped to networked embedded systems, consider an experiment testing an application for detecting river flooding. This application is concerned with measuring weather conditions such as humidity, temperature and the amount of rain in order to provide input for computing a prognosis. Based on changes in sensor data, the application will change the frequency of the data acquisition to increase the accuracy. In the example, each of the motes may be considered as components. Similarly, the various tiers and neighborhood of motes can be viewed as components in which information is gathered and shared. Aspects on the other hand is tied to the behaviors of the system, such as the sensor for measuring the rain amount causing the monitoring of the water level in the river to start updating more frequently. The aspects might be defined in terms of a certain message being sent, a certain topology being established, or other observable properties. An example of the basic setup of experiments in terms of topology and connectivity is illustrated in Figure . As shown, the majority of motes in the network take actively part in the experiment. Selected motes is assigned the role of ``dragons'' motes, which together form an overlay network that passively observe and may augment the experiment. The Case for Aspect-Oriented Testing We believe that the application of aspect-oriented concept can serve as a tool to decompose networked embedded systems into testable behavioural patterns. Furthermore, the ideas naturally allows to account for characteristics of networked embedded systems when testing. As an example, systems capitalize on locality to improve performance. This allows the amount of tracked system state to be reduced by minimizing the experiment to only consider a specific ``locality'' of the system under test and replaying or simulating the other surrounding parts. As with aspect-oriented programming, there is however certain properties to have in mind. First of all, aspects while powerful, cannot or should not incorrectly change system state invariants. In other words, no messages can be injected, which violate basic state invariants and assumptions. Second, the concept of aspects presented above is based on the assumption that we can reason about the state of the system as a whole based on behavior. This might not always be the case if the system is treated as a black-box and the system relies on unpredictable behavior. Finally, it relies on the ability to track communication and requires that the application is ``networked'', since anything that cannot be gathered from tracking the communication, cannot be detected. This is a general problem in event-driven systems and can make it hard to detect certain types of mote failures, for example when duty-cycling is extreme. To conclude, several limitations exist in terms of the type of applications that can be tested using this approach. While testing inspired by aspect-oriented concepts does not give the level of details that on-board debugging provides, it has a holistic approach as mention in Recommendation on page . Furthermore, it has the following interesting properties: [Low level of intrusion] By definition we are passively observing aspects of the system by tracking the communication. This means that no changes are required for the embedded system in compliance with Recommendation on page . [Reproducing and simplifying experiments is made straightforward] In order to reproduce an experiment, we simply use the overlay network to inject messages from a trace of a previous experiment. The task of automatically simplifying a test case, as mentioned in Recommendation on page , is a matter of replaying only a subset of events. [Can scale to large experiments] The overlay network only consists of subset of motes, which makes this approach very scalable, as expressed in Recommendation on page , potentially only restricted by the IO capabilities of the underlying backchannel. [Abstract system modeling] The expressiveness and high abstraction level makes modeling of systems possible. This facilitates automated test generation as mentioned in Recommendation on page . Design Overview To accommodate for the many challenges, the functionality of the test framework is split into 4 major layers illustrated in Figure . This design captures how the functionality is gradually extended from lowlevel and basic services to increasingly more advanced functionality. From an architectural point of view, this allows each layer to confine and encapsulate related functionality and abstractions. Furthermore, the separation enables each layer to be developed independently from each other and provides users interested in integrating the test framework into applications and tools multiple entry points, each with a different level of abstraction. The 4 layers are from the bottom and up: [Service layer] Which provides very basic access to state information and the different physical entities in the testbed, most importantly the motes. In addition, the service layer can also function as an adapter layer that enables the test framework to run on top of different testbed frameworks. [Resource layer] Where physical entities are mapped to abstract resources with the intention of hiding information related to the underlying testbed infrastructure. Considerations for the design of this layer is presented in Section . [Control layer] Which provides the means to control and configure motes and other entities. The design of this layer is presented in Section . [Modeling layer] That extends the underlying layers to facilitate the modeling of systems and their behaviors using high-level abstractions. Limitations Being a framework, it is very important that the approach is generic enough to support different work flows and testing requirements. Consequently, the design has the following scope and limitations: [Service layer] For the purpose of the work presented in the rest of this thesis, we will not address how the service layer should be designed, but simply define it to be the one offered by the Re·Mote Testbed Framework as presented in Section . [Modeling layer] While this layer in many ways plays a vital role in enabling the test framework to support holistic and automated testing, a design of the modeling layer is considered outside the scope of this thesis. Our main argument is that defining a design for the modeling layer is a considerable task that must address issues, which are still unclear and requires further research. [Non-interactive Experiments] We will mainly focus on the design for supporting interactive experiments, from the perspective that it in itself provides a complete and usable test framework. While allowing users to run experiments locally is valuable and important, a next step is to extend the test framework to handle non-interactive experiments, which are run on a testbed server. [User management policies] The test framework is like the Re·Mote Testbed Framework not concerned with how user accounts are managed. Specifically for this area of concern there are many site specific choices to be made as to who has access to what resources provided by the testbed. Furthermore, it is difficult to make assumptions regarding user management because of the level of trust that is involved. [Resource management policies] Similarly to user management, it is hard to define a resource management scheme that considers all the different concerns which applies for a testbed. To enforce policies they must be integrated into the service and resource layer. Rather than defining whether specific parameters according to which resource utilization should be optimized, we will simply suggest different alternatives when appropriate. The Resource Layer The main purpose of the resource layer is to decouple experiments and their resource requirements in terms of physical entities in the testbed. The main goal is to make experiments less brittle, because they no longer are tied to infrastructure-specific details of the testbed. Moreover, it might not even be tied to a specific testbed. This sort of generalization is very powerful and seeks to separate the responsibility between users and testbed administrators for the benefit of both. First, the underlying testbed can be changed and upgraded without breaking experiments. Second, if multiple testbeds are available, users can with little or no changes rerun experiments on different testbeds. Furthermore, the resource layer allows decisions related to resource arbitration to be moved and captured in a central place. By concentrating the decisions, it becomes more straightforward to enforce policies and manage and optimize the resource utilization. Abstractions The resource layer introduces two main abstractions: resources and resource claims. What classifies as a resource depends on the policies and the infrastructure of the testbeds. We will use a general definition, which states that resources are units that can be resolved to either a physical entity or a property related to a set of physical entities. Examples include motes, radio channels, sensors and topology. The resources used by an experiment are described via a set of resource claims. These resource claims act as constraints, which must be met in order for the experiment to be executed. The process of checking and resolving resource claims into physical entities and properties is called resource mapping. Claims can express hard constraint, such as ``all or nothing'', or soft constraints that captures more loosely defined mappings, for example ``as many of possible'' and ``5 to 10''. Finally, resource claims can also be relative giving the ability to express similar or identical mappings necessary for reproducing an experiments: ``same as last time'' or using a recorded seed for mapping resourcses. To avoid limitations in terms of the type of resources covered by a resource claim, they need to be both extensible and expressive. To achieve this, resource claims are expressed as plain text strings using combinations of key/value pairs. While a small domain specific languages can be used, we will simply use a restricted form of natural language where each line represents a claim, for example: Requirements The resource layer depends upon the service layer's ability to provide updated information that enables optimal mapping of resources. This information includes knowledge about the connectivity of each mote, which can be used to form topologies. In turn, the resource layer provides the control and modeling layer with the ability to map resource claims to physical entities in the testbed. Resource Classification and Mapping Resource mapping belongs to the class of NP-complete problemsCITE. Finding an optimal solution is thus inherently very computationally intensive. It is therefore crucial to apply methods, which can help the search for a 'good enough solution'. Since part of the mapping is related with policies, we provide the design for a general resource mapper. To optimize the resource mapping a preliminary classification of all resources is first constructed. This way it is possible to quickly reduce the amount of parameters that needs to be considered for each claim. The classification divides resources into groups, each of which has unique feature that the actual resource mapping needs to consider. Given a resource classification, satisfiability of resource claims is a matter of checking if each claim belongs to a predefined class. For example a classification can be ``mote where platform is dig582-2 and has light sensor''. The use of classifications does not solve the complete problem of mapping resources. For example, a high level constraint such as topology is not easily classified. To address this issue, we reduce topology to the simpler problem of neighborhood: ``who is the neighbor for mote X''. Using this formulation, topology is straightforward to classify. On top of the neighborhood information, claims related to topology can be mapped. Since the physical resources in the testbed can change over time, the classification can become invalid. The resource mapper therefore periodically updates its resource classifications. When this happens, all cached preliminary classification results are flushed and recalculated. The Control Layer The main purpose of the control layer is to facilitate advanced experiments with timed events, such as simulated mote failure, injection of sensor data and messages. The goal is to allow configuration and control of experiments, which is able to expressed scenarios interesting for stimulating the system under test. This involves processing of input to the experiment, e.g. in the form of resource claims or sensor reading that must be injected, and processing of output from a running experiment. From the output, the control layer should enable results to be collected to evaluate if the test fails or succeeds and for post-mortem analysis. Abstractions The basic abstractions of the control layer is the concept of jobs and job descriptions. Jobs are a generalization of the different possible types of experiments that is supported by the testbed. We will use a general definition that defines a job as a self-contained unit of work, which is indifferent to whether it is submitted to run interactively or non-interactively. In short, no changes are required for moving a job from an interactive to a non-interactive platform. By this definition, a job can express one or more test cases, a test suite, or even some administrative work, such as testbed health monitoring and topology mapping required by the resource layer. Jobs are defined via job descriptions, which serves as a formal specification of configuration requirements and job control directives. The main form of configuration requirements is resource claims. Job control directives can be given in the form of code or scripts. Requirements The control layer requires that the resource layer maps claims to a list of physical entities encapsulated by the resource abstraction. Alternatively, the control layer can access physical entities directly if this is a requirement. The layer provides functionality to maintain state information related to an experiment and interact with the claimed resources. Programming Model and Resource Access To make the control layer portable and allow it to be integrated into application and tools, its basic supported programming model is event-driven and non-blocking. On top of this minimal model, threaded and blocking models can be created to provide more simple control code. Furthermore, this also makes it possible to create simplified control directives, for example based on an event schedule. To allow the control layer to both access resources via the service and resource layer, it uses adapters, which translate the abstractions used by the respective layers. Dynamic Testing Sometimes ``pi = 3.14'' is (a) infinitely faster than the ``correct'' answer and (b) the difference between the ``correct'' and the ``wrong'' answer is meaninglessLinus Torvalds Paradigms are the sources of systems. From them come goals, information flows, feedbacks, stocks, flows. Places to Intervene in a System by Donella H. Meadows Testing is about decomposing systems and applications into units, which behavior it is possible to reason about. This is challenging for highly distributed systems because behaviors are emerging and phenomenon cross-cut the system across multiple motes. In this respect, the problem with normal testing in that it does not scale in terms of tracking events and behaviors in a complex system. As a result, developing and maintaining test suites with sufficient coverage is a very laborious task. To approach this problem, it is interesting to evaluate different methods for dynamic testing systems by extending existing test suites and automatically generate tests. The following sections will explore this topic in terms of detective measures, such as random testing and model-based testing, and corrective measures concerned with isolating software defects. Random Testing For any system of considerable size there is a necessity to deal with irregularities in software. In this respect, randomized testing can be a useful method to find behavioral corner cases for which the developers have not accounted or simply check how the systems accounts for unexpected situations. We will explore random testing in the context of random event schedules. Random Event Schedules The basic concept behind random event schedules is to introduce test drivers, which generate random events or behaviors during a test, and observe how the system responds. To give an idea of how these test drivers can extend an existing test cases, assume that a fairly generic test suite already exists. This test suite is initially run unaltered to get a reference of expected output. Next, the test suite is rerun with one or more test drivers, each of which inject some sort of randomized events. The events are created using the controls of the test framework or secondary motes that inject messages. What kind of test drivers can be applied of course depends on the system under test and the test suite. However, we propose several options: [Message replaying] This test driver will randomly pick a message or a sequence of messages and ``replay'' them. The replaying can be done via motes, which are not directly part of the experiment. The main objective of this test driver is to test how applications, which allow routing through multiple paths or use flooding strategies to disseminate information, handle messages arriving multiple times. [Radio noise] Radio communication is not reliable, which means that strategies for handling noisy channels needs to be applied. This test driver randomly introduces noise during an experiment with the goal of disrupting radio communication. This can either be done by sending bursts of messages with random bytes or garbage messages. [Topology change] As we have pointed out, networked embedded systems needs to be able to self-organize in case of failures. This test driver will randomly pick a mote and exclude it from the experiment by turning it off. Depending on the density of the used topology this can be repeated multiple times to the extend where the test driver can cause a partitioning of the network. The objective is to test resilience to topology changes in the routing layer. As an alternation, motes can randomly be added to the topology as is the case during the initial deployment of a system. This can be used for testing synchronization protocols. Using Random Testing One of the benefits of using random testing techniques is that they are relatively easy to implement and require little work and instrumentation of the code. Furthermore, it can be applied at different testing levels. As an example, it has been applied to harness testing of interrupt handling in embedded systems, where it was found that interrupt handling code is often a source of defects because assumptions do not account for spontaneous interrupts, for example when triggered by hardware flaws or an external electrical discharge. When performing randomized testing, it is important to save enough information to allow the tested scenario to be replayed and reproduced, if it should later be necessary. This can be done for example by recording all events, which happens during a test, or using seeds for generating event schedules. The latter, requires that the test driver's event generator uses a deterministic algorithm. One problem with using random testing is to account for the case that a test no longer only fails or succeeds, but gives an unresolved'' result. Using the above examples of test drivers, a test of a system may be unresolved if a mote failure results in a partitioned network or shuts down a mote, which has a special role in the experiment, such as being a gateway. While random testing can effectively elicit defects, which are otherwise hard to find, it potentially lacks in terms of efficiency, because searching for defects by randomly altering test cases requires a lot of test executions. Some improvement can be introduced by using heuristics to make more directed random testing, for example by defining rules that can be used to limit, which events schedules are considered. However, as we shall see, other and more efficient techniques exists for dynamically extending tests. Model-Based Testing A more directed approach to testing is to use model-based testing techniques for generating or extending test suites. In some sense, all testing is necessarily model-based in that any test case is based on the testers mental model of the system. Furthermore, we believe that modeling is an integral part of developing networked embedded systems, and something that a developer is likely to do anyway as part of the specification or design process, be it explicit or implicit. From this perspective, model-based testing is simply just a method to make these activities more formal. The general idea is to use detailed models, which accurately describe the intended behavior of the system under test, to create an execution trace complete with input and expected output. From these traces, test cases are then automatically generated using a selection criteria. The result is derived, by running the tests and comparing with the expected output. An overview of the whole process is illustrated in Figure . In the following we will explore the two key elements behind the effectiveness of model-based testing, namely the model of the system and the test generation algorithm. Models and Abstraction Levels The model may be extracted from software artifacts or explicitly defined using a modeling language, such as UML, from a specification. However, the modeling language must address different issues than the language used to program the systemCITE. For this reason, it may be desirable to use a language, which are more data-driven than languages, such as C. An important requirement is that the model is formal and rich in detail in order to allow a great amount of automation. A general problem in terms of modeling language is to achieve a good abstraction level, which allows it to focus on system details, such as data, functionality, communication, timing and security, but also allows it to accommodate for the characteristics of the software artifacts domain or platform. These trade-offs mean that each model is limited in what it can and cannot describe and may lack certain information about the test environment and the system under test. The level of abstraction has several costs. First, details that are not encoded in the model cannot be tested on the ground of this model. Second, the encoding used for input and expected output derived from the model may be different from the results of real executions. This means that it may be necessary to perform a translation between the two encodings, a process called concretization. Finally, for complex systems the model itself can become quite complex and require model checking to ensure the accuracy of the model. Generation Algorithm and Selection Criteria The benefits of model-based testing is the automatic creation of test suites. This makes the choice of test generation algorithm very important. First, different test levels require different test case generation strategies. As an example, model-based unit testing might have to only consider partial models and focus on certain input parameters, while model-based testing of whole systems can focus on operational profiles or more functional specifications. Second, the test generator uses the model to automatically find valid paths through the model. In this sense, test case generation is a search problem, where different approaches can be applied, such as heuristic search or symbolic execution. A challenge here is to deal with the possible problem of state explosion. This underlines the need of being able to identify the paths that are of most interest or importance. In other words, a very important part of using model-based testing is the selection criteria, which, similarly to test case specifications, help in deriving test cases that exhibit a desired set of characteristics. One possibility is to use concrete criteria, which define specific states of the model that are considered interesting. This can be extended using structural selection, such coverage criteria based on control and data flow analysis of the application, to ensure that many parts of the system is tested. For model-based testing to be effective the generation algorithm also has to consider generating test cases for unintended behaviors. For this it is necessary to increase the scope of the selection criteria to analyze and identify additional test cases for each transition condition. This can be achieved by employing more random metrics, such as stochastic selection, where probabilistic distribution criteria can ensure that a wide range of different input input, which contains both valid and invalid values, is tested. Finally, test adequacy criteria can be used to measure the quality of a test suite as a stopping criteria for the test generation process. From Models to Test Cases In Practice To give a concrete example of how model-based testing can be applied, we will limit the scope and focus on presenting a state-based approach. As the specification for the system we will use the state graph in Figure , which describes a simple application that uses a sense-and-sense strategy to disseminate data. The goal is to be able to generate tests, which at runtime can reveal state transitions faults, corrupt states and sneak paths. The first task is to describe the system under test in terms of state transition diagrams. This can be hard due to the concurrency of the application, where routing can be done while sensing is in progress. One simplification here could be to only consider input and output transmitted via the radio. Consequently, each transition defines a state change based on radio communication. Next, each state transition is annotated with predicates and constraints, which evaluate the context in the model. The predicate are boolean expressions that must evaluate to true for the transition to be valid. Constraint fields allows the number of paths produced to be limited during test generation. During the transformation conditions guarding transitions are evaluated. A third annotation in the form of procedural code may also be added, which can be used to inspect the context during the test execution. With a complete model, we are now ready to generate the test cases. For the purpose of this example we will use transition tree-based generation with which the state model is transformed into a transition tree as illustrated in Figure . In the tree, each path from the root to a leaf node is a test requirement for testing the behavior and each leaf node will be a test case template. A basic naïve basic algorithm which does not use any advanced selection criteria is: The initial tree is the initial state of the state model. Add a node for each valid transition. If the node is a final state or already occurs the node it marked terminal. This eliminates duplicate transitions. Continue until all leaf nodes are marked terminal. When the algorithm terminates, each test case is derived by concatenating all the test information fields for the path that is transitioned. Using Model-Based Testing Model-based testing has been the subject of multiple researches and the findings suggest that the technique can be very efficient, saving up to 80 of the cost of testing, and that it is most beneficial when applied to whole-system testing. Furthermore, model-based testing makes test suites less brittle, which is especially effective for systems that change frequently, because tests can be rapidly regenerated after the model has been updated. The greatest benefits of using model-based techniques lies in terms of automating a very time consuming task and increasing the test coverage. It should be noted that while there is potential for automatically generated tests to cover more possible inputs with fewer test casesCITE, the automation provided by model-based testing does not necessarily lead to greater coverageCITE. For example, it is not necessarily the case that the coverage increases simply because significantly more tests can be run. Developing good selection criteria plays an important role in this respect if model-based testing is to perform better than purely randomly generated tests. While the technique offers a lot of automation, the initial step of creating the model still requires a substantial effort. This can be difficult if the specification makes it hard to determine the constraints of the system. The effort also requires skill and knowledge of the modeling language, development process, and tool chain, which can hinder its acceptance and usability. For a given system under test both the models of the system and its environment are needed for analysis. Consequently, in the use of models there is also a hidden assumption that the test environment also can somehow be modeled. The extreme dynamics of networked embedded systems might be an obstacle in this respect. It may therefore be more appropriate to evaluate the models at runtime and account for this dynamics. This, in turn, may limit the expressiveness of the models, which can be employed, in order for the evaluation and processing of models in real-time to be scalable. As a final remark, it remains that there are many open questions that needs further research in order to answer whether or not model-based testing is applicable for testing large networked embedded systems. For example, how much of the process can be automated and what code artifacts should be considered during test generation? It is also unclear how systems are best modeled in terms of details and expressiveness. Model-based testing, however, has a lot of promising properties, and can be applied in practice, even if various simplification and assumptions are necessary. Isolating Software Defects While the above sections have explored detective measures of using dynamic testing to find failures, the task of diagnosing the failure using corrective measures remains. Based on results from tracing and other methods of instrumenting the system, the error causing the failure needs to be isolated and localized. This can be a very time consuming tasks and it is therefore interesting to explore whether dynamic testing techniques can assist. Finding Defect Causes Once we have established that the software has a defect, we can begin to look for its cause. To find a defect, we need to reason backwards, starting with the failure, a task which is very resource intensive. As already mentioned in Section , this task is made harder by the fact that they may not exist a clear link between the defect that cause an infection of the system state and the resulting failure. In the most abstract sense, it is a search in time and space, with time being the execution time where the infection takes place, and space being the state that is affected. Part of the task of backtracking is to first establish a test case which allows us to reliably reproduce the failure. This can be difficult for non-deterministic programs and long-running programs, where the system state depends on many different events, since it may require control over all possible input sources. With a starting point for triggering the defect, the next step is to further narrow down the test case to exclude unrelated events. In other words, the test case should be simplified so that the defect become clearer and easier to comprehend and communicate. With good programming style, part of the problem of finding the defect becomes more straightforward, since divided and nicely compartmentalized modules are easier to reason about. Finally, we want to be able to automate the test case so that we can reproduce the defect more easily and reliably. The process of identifying and tracing faults clearly shows that there is a lot of work involved in finding defect causes. We are therefore interested in exploring how test frameworks can help to more reliably reproduce failures and help make the search for a minimal test case easier. Isolating Failure-inducing Event Schedules The main problem in localizing the origin is that in networked embedded system it is necessary to examine the states across multiple motes. For long running applications, the number of states can grow to very large numbers and become unmanageable to reason about. Furthermore, reproducing failures for such long running experiments is not scalable. Consequently, to help identify possible failure causes, a method for simplifying the failure scenarios to the smallest possible sequence of states and events is needed. One method for achieving this is automatic delta debugging, an algorithm for systematically producing a minimal set of failure-inducing circumstances. The method can be applied to a broad range circumstances, such as program input, changes to the program code, or program executions, however, in the following we will consider using it for the isolation of failure-inducing event schedules. In other words, given a sequence of events, isolate the minimal set of events, which are critical for producing the failure. The basic idea is to systematically narrow down the difference between a passing and a failing run by gradually simplifying the event schedule. An example of the process is illustrated in Figure , where initial test runs starts by slicing away large subsequences of events and later refine the simplification process to have a finer slicing granularity. At some point, the algorithm will reach a point where the simplification process will no longer yield a smaller set. As suggested, the efficiency of the whole process is control by the narrowing strategy, which is applied in each step. Apart from the approach illustrated in Figure of removing events from failing runs, another strategy is to start with the empty sequence and add events until the sequence no longer passes. Finally, the two approaches can be combined so the search for the minimal set will add or remove depending on the result, whereby it can adapt the search strategy to quickly focus on simplifying specific subsequences. While this rather naïve search approach can be quite effective in finding minimal sets, it has problems similar to the use of random testing. The simplification process may produce circumstances that are inconsistent, and where the test result is neither pass nor failure, but unresolved. One way to deal with unresolved results is to include information about events, which account for properties, such as the relation between events, e.g. event A always follows event B. This information could be extracted from the state or behavioral models described above. Apart from reducing the problem of inconsistent sequences, it can also help to make the search for a minimal set more efficient. Isolating Failure-inducing Code While finding a minimal test case is a big step forward, the defect still has to be located in the source code. This problem is somewhat similar to the above, in that given the program and the minimal test case we want to isolate which part of the source code is causing the defect. However, where the above approach of slicing and adding works great for simplifying causality chains when it comes to events, it is less applicable for source code. In short, it will simply result in considering too many inconsistent programs. A better approach is to use information about the history of changes applied to the software being tested. This information can for example be extracted if the project uses a version control system to keep track of software and the changes they make. As stated, the delta debugging algorithm can also be used for this, therefore we will look at another but similar approach called automated bisecting, offered by the git version control system. The tool works by doing a binary search of the history of the source code based on initial knowledge of the last known good version and when the defect was introduced (usually the current version). For each version being considered, a script is executed, which can compile the source code and run the test case using the resulting program. Based on the exit code of the version being tested is either skipped E.g. if the test result was inconsistent. or marked ``good'' or ``bad'', after which the bisecting continues. The process ends when the failure-inducing version has been found. Because the method relies on using history, the approach is limited to only finding failure-inducing code with respect to regressions, where a previous version worked as expected and later broke. Furthermore, the quality of the result is limited by the size of the changes between each version. In other words, for this method to be effective the code needs to be bisectable meaning each change must be small and well-defined. Using Software Defect Localization A prerequisite for using delta debugging and history-based bisection is first of all the ability to reliably reproduce the failure. Furthermore, both methods require a lot of automation of the test process. In terms of resource utilization, they both rely on using a potentially large number of tests, however, the better the knowledge about the structure of the circumstances, be it events or source code, the less tests will be required. To exemplify, if the defect is known to reside in the source code of a specific file only changes involving this file needs to be considered. While some of the most time consuming parts of localizing defects is automated, they still leave manual work for developers. Often, a failure is not the result of a single cause, but rather the outcome of a causality chain of effects and causes. Thus given an initial and simplified event schedule derived from the above method, the programmer still needs to reason about the remainder of the causality chain in order to identify the defect. The Case for Dynamic Testing We believe there is a great potential in using dynamic testing techniques to help developers. The dynamic testing techniques, which has been presented, are very different and can be applied to different parts of the development process. If we accept that different test suites find different errors, it must necessarily follow that a combination of test suites is preferable. This underlines one of the strengths of dynamic testing, because they are able to reuse and extend tests. While some of them require a substantial investment of time and development effort, they also promise to provide a richer and more scalable framework for testing networked embedded systems. Finally, several techniques, such as use of randomized event schedules, can be applied with little effort, which makes them a good starting point for increasing the coverage of test suites. We conclude this chapter by providing two additional recommendations to those given in Section . [label=*:,ref=*] 6 The test framework should provide abstractions that allow automated test generation. The success of any test suite is its coverage. A good test suite might ensure that no crucial failure will go unnoticed in a very critical part of the code, but if the coverage is incomplete, the infectious behavior of defects can spread from parts with less coverage and threaten the correctness of the critical and well tested components. Because coverage is difficult and time consuming, it is worth investigating ways to automate parts of the test creation process. The test framework should support techniques for simplifying test cases in terms of events. A natural extension to automated testing is the ability to automatically isolate failure-inducing events in test cases. Testing Networked Embedded Systems In this chapter, we will explore testing of networked embedded system in terms of several major topics. First, the key challenges related to testing of embedded and networked systems are presented using a problem solving approach. This is followed up by looking at some of the work flows specifically involved in the use of testbeds. We will then define a fault model, which recognizes the typical source of faults in networked embedded systems. Finally, we summarize the findings by providing guidelines and recommendations that may be used in the design of a test framework. Key Challenges Embedded systems are characterized by their limited hardware capabilities. These limitations makes it difficult or even impossible to test certain features of the system. It is therefore necessary to clearly identify these challenges to design a robust test framework for embedded systems. Non-deterministic Hardware One of the biggest challenges is dealing with non-deterministic hardware. The platforms used in networked embedded systems are cheap, which leads to behaviors that are unpredictable. In other words, it is necessary to consider how to deal with the irregularities caused by non-deterministic hardware to avoid that they disrupt experiments. If we look at the causes, some may be due to the hardware being broken. Hardware quirks can also cause problems that may be perceived as if they were caused by non-deterministic hardware. An example from DIKU Testbed is the mote radios inability to send messages having an odd length. Other causes can be attributed to interference from external source. Examples include sensors sensitive to static electricity. It is very important to reduce the impact of non-deterministic hardware, as it can be very disruptive from the user's perspective. More importantly it can seriously affect the ability to reliably reproduce experiments, whereby the fundamental property of the test framework violated. To avoid these problems, the health of the hardware needs to be monitored continuously. The monitoring can involve tasks, such as running small applications, which do on-board self-test on each mote. In case of problems, the affected motes must be excluded from any future experiments. Testable Units The dependency on cross-component optimization and integration make it challenging to find well-defined boundaries between the components making up the system. The main problem is that the resource constraints may not always permit programs to be structured into modular and testable components. As a result, it may be necessary to test bigger units or create a lot of extra stubs to replace core components with the result of completely changing the system behavior. This is to some extends less of a problem when using TinyOS and nesC, which allow replaceable components to be created. However, while components can easily be replaced, whereby compilation of debug code can effectively be disabled, certain components cross-cut the component graph and are thus difficult to replace. In order for testing to help to locate defects, tests need to be able to target specific parts or units of the system. While existing techniques address this problem through the use of stubs, we still see a potential in more thoroughly investigating what kind of structure and granularity is desirable for the specific case of networked embedded systems. Observing the System Under Test Another big challenges when testing embedded systems is the limitations in the ability to observe the state of the system under test Tracking the state of a running system is often only possible via debuggers, and only through external devices, such as integrated circuit emulators (ICE). These devices can be very expensive, raising the cost considerable if a complete system of multiple motes needs to be equipped. Furthermore, ICEs can be slow and unpredictable due to the way they emulate certain parts of the underlying hardware. Tracing using probes is a more straightforward and proven method of observing embedded systems. The probes can either be in the form of software probes, which increment a counter or record information in a buffer, or it can be in the form of an oscilloscope or a similar measuring tool, which is attached to a part of the electrical circuit. Using the latter method the test code can flip the power of for example a pin on an unused microprocessor to signal a change in the state. While tracing offers one of the most rigorous methods of tracking the state of an embedded systems, it can potentially generate a lot of test output. In some circumstances, such as long-lived experiments, the data amount may exceed the capacity to process or store on-board during execution and have to be transferred to an external system via the mote's serial console for post-mortem analysis. Limitations of the throughput and reliability of the serial console, may require that data is compressed and sent using check sums. Detecting Failures A similar challenge lies in using the observations of the system under test to determine if the system behaves correctly. This leads us to the question: how do we detect failures? Or more interesting, how to we detect correct behavior? To answer these questions we must first understand how failures occur. As already mentioned, system engineering is a highly iterative process. Consequently, decisions and choices are made before the whole system is clearly understood and evolve over the course of a project. These decisions and choices ends up in the resulting system as assumptions and invariants, some of which are faulty from conception or will become faulty because external assumptions changes. These assumptions are a potential defect; the defect may and may not cause a failure, depending on whether or not it will be stimulated. If a defect is stimulated under the right circumstances, it can cause an infected state in the system. Once an infected state is present and it is not corrected or otherwise masked by subsequent events, it can propagate and spread to other parts of the system. In a distributed system, this may lead to the infection spreading across the boundary of a single system. At some point the infection will have become so severe that it will cause a failure, whereby the defect will become visible to the surroundings. From the above, it should be clear that there is not always a clear link between the source of a fault and how the fault manifests itself. A potential failure may stay hidden for a long time before it can finally be detected. In systems with a very low duty-cycle, it may take time in the amount of days to get the system into a state where the failure shows itself. Furthermore, how a failure presents itself in the output of the experiment might not always be clear. In this sense, for certain failures, you have to know what you are looking for in order to be able to detect them. To give an example, a dead lock in an application using a watch dog timer might only be visible by the fact that motes will restart periodically. The most straightforward approach is to detect failures by looking for anomalies. This can be done by comparing the traces and output from the experiment with the output of an expected behavior. The approach can be further extended by creating an abstract representation of the expected output either by looking at multiple runs or by explicitly creating such an abstracting. It is still unclear how such a representation is best described and whether it is feasible to automatically derived it from test output. It remains, however, that effective testing depends on the ability to detect failures as early as possible and as reliably as possible. To accomplish this good test evaluation techniques needs to be developed. Instrumentation and Use of Artifacts Certain properties of the system under test are hard to infer only from information gathered via methods, such as tracing. It may therefore be necessary to instrument the system with software components, which can collect the required information. There are several areas where this can be useful during the development phase. The biggest potential is in terms of gathering runtime artifacts. Here instrumentation can be used to get different types of statistics and detailed information about the system under test. This can give valuable information about the performance of the system, e.g. duty-cycling. It can also play an important role in increasing test efficiency by assisting developers in determining the adequacy and coverage of test suites. We have already mentioned the use of static analysis, which capitalizes on compile-time artifacts in the form of call graphs to infer the control flow and data flow of the program whereby potential software defects can be identified. It is interesting to explore if the use of compile-time artifacts can be combined with code generators to automatically create the components for instrumentation. This can reduce the amount of manual work related with gathering runtime artifacts with a satisfying level of detail. It is important, to use instrumentation carefully. The main reason is that timing is crucial for embedded system and applications are event driven and may have soft real-time requirements. If the method is not used wisely, instrumentation of the system can potentially have an effect on the behavior of the system being observed. This can lead to surprising differences in behavior, when the instrumentation is disabled affecting the ability to reliably reproduce experiments. While there are problems with using instrumentation of embedded systems, there are still valid uses to explore. For example, it can enable fine-grained control of experiments because an on-mote components have access to the input and output of the system under test. One possible use is for injecting sensor data into an experiment using a predefined model of a phenomenon, which can make experiments both more realistic and reproducible. Tracking Communication The ability to reason about networked embedded systems depends highly on the potential to track the communication. In terms of networked embedded systems, the research community initially sought to start afresh by not basing their work on existing network abstractions. The main argument was that the existing networking approaches was not applicable to these new constraint systems and it was necessary to open up the research space to new ideas. This has lead to a lot of exploration of how to compose a good network stack for networked embedded systems, but has also resulting in no consolidation. In other words, the communication in terms of protocols and messages is highly application specific and thus not easy to handle generically in a test framework. The first challenge is to decompose communication by parse the fields in the messages used by the system under test. Next challenge is the ability to interpret the meaning of each messages, such as whether it is for control or data. By accomplishing this, the test framework will be able to move beyond passive listening to inject messages into the system under test and stimulate it. One possibility employed in MoteLab to solve the first challenge is to explicitly associate message parsers with a test. Hereby, messages can be decomposed by the test framework and inserted with fields into a database for later retrieval. Furthermore, nesC has support for generating such parsers from annotated source code. With the advance of more consistent use of network standards and protocols in networked embedded systems, such as IPv6LowPAN and IEEE 802.15.4, some of these challenges will no longer need to be considered. The communication strategies in networked embedded systems also poses a challenge. As mentioned, timeliness is of great importance for the overall usability of the test framework. The conservative energy budget and resulting limitations of radio usage means that strategies such as delay-tolerant networking are used, which results in experiments requiring a long time to run taking up valuable resource. A solution is to investigate if it is possible to speed up test without affecting the validity of the results. Imagine a coordination protocol, where motes wake up every 30 seconds and synchronizes with a master. Would it be possible to reduce this wake up period to only 15 seconds without breaking the general assumptions of the system as a whole? Similarly, in a scenario where events are being replayed, could the running time of this test preamble be reduced by speeding up the replaying of events or even leaving out certain unrelated events? Application and Platform Diversity As the research field has progressed, many of the technical challenges related to system architecture and networking layer have maturedCITE: interoperability, standards, IPv6. This has shifted focus towards some of the more inter-disciplinary scientific challenges. Herein lies a great opportunity but also an great challenge for testbeds. On one side, there is a potential to facilitate the requirements of realistic application, however, this also comes at the cost of affecting the diversity of applications that can be supported. In other words, there is a tradeoff when choosing which mote platforms and sensor boards to deploy. This is one of the reasons why several different kinds of testbeds targeting different applications are needed; some mobile and customizable to specific application needs, others larger and more permanent for experimenting with sensor networking primitives at a significant scale. Another fundamental problems is platform diversity. Many different platforms exist, each of which has specific interesting properties that make it suitable for a specific application or research area. For example, the testbed must offer the newer platforms to facilitate research in emerging technologies. Furthermore, for each platform different alternatives may also exist in terms of what sensors they provide. This means that it is necessary to account for both different platforms and platform specific properties in the way experiments are configured and executed. Given the multiples of platforms and sensor configurations, it is unlikely that any testbed will be able to provide for the demands for a specific platform or a specific sensor. Dealing with these issues makes testbeds very expensive and difficult to maintain. An interesting solution, which has been explored in the context of PlanetLabCITE, is federation of testbeds. While it raises questions about how resources are exchanged, there is much to be gained in terms of greater deal of cooperation and information sharing of experiences among testbeds. Topologies In order to facilitate a wide spectrum of scenarios, the test framework must facilitate the configuration of many different topologies. For example, an experiment might need to test against a specific category of topology, such as star topology, using different levels of densities ranging from a very sparse network to a more strongly connected network. One of the challenges related with topology and configuration is that writing a test, which configures a specific topology is difficult, and more important gets repetitive and tedious when the application needs to be testing against different topologies. Furthermore, by having to manually configure a topology knowledge of the testbed infrastructure is require, which due to its potentially dynamic nature, can cause experiments to become brittle The simplest solution is to use on-mote components to configure the topology by simply filtering mote messages. However, this may require recompilation of the mote program. Another solution is to configuring the topology by picking motes to include and motes to exclude given that the testbed is big enough. However, for densely deployed testbeds this is not a feasible strategy, because it will result in a very small topology area. Another concern related with topology is utilization of testbed resources. Certain experiments might require a topology area covering the whole testbed, but without the need for using all motes in the testbed. For this case it is worth exploring if utilization can be increased by separating the communication of experiment to different radio channels. While this type of ``multi-tasking'' is desirable it requires the cooperation of experiments, since confinement to a specific radio channel is hard to enforce. Dynamic Environments and Infrastructure Similarly to non-deterministic hardware, there is also a big challenge of making environmental effects have as little impact as possible on the system under test. For example, noise from surrounding wireless networks can interfere with an experiment and affect it in multiple ways. While this may have positive implications in that it can uncover new defects in the system under test, it will nonetheless end up making it impossible to later replay the original scenario and reproduce the defect and thus verify a possible fix. Consequently, it may be necessary to not only monitor the experiment itself but also the environment in order to later be able to reason about a test run. It may also be appropriate to make it possible to adjust test cases at runtime to accommodate for environmental interferences. In terms of testing, timing is challenging, since multiple nodes in the system must first be brought to a desirable state during the test preamble. Then a series of events needs to be triggered reliably across the system to force the behavior or state, which must be tested. During the test, information must be gathered to keep track of the multiple states. This poses several challenges in terms of the robustness, precision, and scalability of the infrastructure used for testing. Simulation provides a solution for these and many of the other challenges mentioned above. Among the advantages is the ability to test very large networks potentially only limited by the available computing power and the accuracy of the underlying models used in the simulation. It also has the benefit of given more control over the experiments, for example by trading accuracy for efficiency and timeliness or by testing against different radio propagation models. Finally, simulation also give the unique opportunity of experimenting with sensors for which hardware does not yet exist. While simulation can be used to evaluate algorithms and understand the causes of behavior observed in the real world, it should, however, not be used for absolute evaluation without also considering whether the scenario being modeled has basis in the real world. Furthermore, compared to simulation, only a testbed using real hardware can provide a solid and thorough understand of the real-world technical challenges of networked embedded systems, such as resource limitations, communication loss, and energy constraints. An compromise is to use a hybrid solution in which experiments run on both physical and simulated motesKensei, whereby experiments can be scaled appropriately, while still providing realistic output from the physical motes. This depends on the ability to exchange parameters between the physical and simulated world. It remains that the dynamic and potentially non-deterministic environment are a source of problems when it comes to ensuring the reliable execution of experiments. Summary with priorities / feasibility Work Flows With the challenges in mind, we will next look at a some different work flows associated with the use of testbeds. Work Flow for Experiments The main goal of testbeds is to overcome many of the time consuming tasks associated with executing experiments on real motes by providing a ready to use and automated facility. The main tasks that helps to accelerate the execution of experiments is the simplified data collection. The most important eliminated tasks are manually reprogramming of motes, deployment of motes, and collecting of motes after the experiment. This leaves the following basic work flow for experiments: Create an experiment by selecting motes and assigning binaries to run on them. Schedule the experiment to run. Collect data from the testbed after experiment has run and analyze it. There are a lot of benefits with this model, first of all it makes experiments more scalable, since all the work of running an experiment goes into the creation and configuration of the experiment. Another advantage is that once an experiment has been created, it can be rerun multiple times, whereby the initial overhead of creating and configuring it is amortized. Automation can also help to increase the general utilization of the testbed, because the resource usage of experiments can be estimated in advance, which allows more effective scheduling and even the possibility to run multiple experiments in parallel. Interactive Usage One of the problems with batch mode experiments is that they require that many decisions are taken upfront. This makes this work flow less ideal for use during the initial stages of a development process, where there is more demand for experimenting and the development cycle of programming and running the program is quicker. While some testbeds allow semi-automated experiments where users can control a running experiment, this still suggests that there is also a need for supporting interactive work flows, because it allows to incrementally configure and derive the desired scenario. Interactive use is also preferable when learning an API, such as when students need to familiarize themselves with a radio stack at the start of a course. The basic work flow for interactive usage is very similar to the one above, except for a few extra steps. However, some steps can be skipped for subsequent experiments during the same session. Select and take control over motes. Program motes by assigning a binary to them. Instrument data collection. Run the experiment by starting and stopping motes in the desired order. Collect and analyze data. One problem with interactive usage is that it can lead to usage patterns that are not optimal. Experiences from the DIKU Testbed shows that users keep themselves logged in even for small tests potentially depleting mote resources. There is of course also a challenge in offering both interactive and batch mode experiments in that the sharing of testbed resources is not as well-defined. One possible solution to offer mixed work flows is to require resource reservations for all experiments. Use Cases To further guide the design choices, it is useful to also establish some general use cases for testbeds. We will focus on two use cases in the following: one where a researcher is experimenting with finding an efficient routing method and another where a student is working on an assignment for a course in networked embedded systems. *The Student For the student, it is the first experience in using a testbed. It requires some time to figure out how to get the first experiment working, but after this initial obstacle the student spends some more time to get familiar with using the testbed by gradually extending the experiment. After this first experience, the student has good idea about how to integrate the use of the testbed into the work on the assignment and has, furthermore, developed a fairly advanced experiment, which can serve as a good basis for future work. *The Researcher The researcher has a lot of experience with using the testbed and also knows some of the problems and limitations from running previous experiments on the testbed. Since most of the routing components are developed the project is entering a phase, where the experiments have a larger scale. However, the researcher is still interested in also running smaller experiments when tuning specific parts of the behavior because the large scale experiments take a longer time to run and is usually run automatically during the night. The chosen use cases show that the test framework must be versatile to support both types of users. This clearly underlines that there is a large gap in terms of being able to both get new users started and support advanced users needs. Supporting Multiple Work Flows Rather than trying to bridge the gap between the requirements of different work flows, we believe that the test framework should instead strive towards exposing different layers of functionality, which enable users to chose the level of control they want. We see several stages of an experiment, which could be improved by such functionality. First and foremost, the ability to control the motes themselves when creating experiments. While again different work flows has different requirements, even simple experiments need some sort of control, for example to program and start a mote. Second, the ability to store data collected during an experiments in a database can help to make the data more accessible and uniform. Finally, post-mortem analysis of experiments, for example when testing regressions or trying to isolate a defect. Furthermore, we believe it is unlikely that one single tool will suit all users, and instead propose that the test framework is made extensible. This is also a recognition of our belief that the success of the test framework very much depends on its ability to be integrated into the tool chain developers are already using and familiar with. This will also enable the testbed to offer its presence in many different ways, such as via both web and desktop applications. Finally, this will allow future work rethink how users can interact with the testbed. Running Non-interactive Experiments Certain work flows require that non-interactive experiment can be run a testbed server, because it allows better utilization of testbed resources and can help users to better manage long-running experiments. However, it also opens up for many new challenges. Most important are the many different security considerations that must be addressed. Because experiments can potentially execute arbitrary control code, they must run in a sandbox. Similarly, several experiments may run simultaneously, which can lead to degradation of access to server resources. To accommodate this, the system should employ protecting measures and actively monitor the experiments that are running so it can intervene if their resource usage is deemed harmful for the rest of the system. A similar challenge is related with how the output generated by experiments is handled. When running locally the output can be logged into a database that the user controls, or stored in files. However, when running non-interactively, the experiment needs to abide by the rules governing the server, such as disk quota and other storage policies. Finally, a service must enable the user to fetch and manage the output from past experiments. Another issue is with how to deal with the aspect of unpredictability in the execution of experiments. Certain failures may require human intervention, for example in the event that a crucial part of the experiment, such as a gateway mote, is not able to run as expected, or the user accounts' disk quota is depleted. Where the user can simply decide to restart an interactive experiment, it is less straightforward how certain failure scenarios are best handled for experiments running on a server. Finally, the testbed needs to provide a service for submitting and managing experiments. This service must consider whether experiments are stored on the server, so they can easily be rescheduled to run, and whether they are regularly deleted. Together with the above challenges, this clearly shows that there are many policy-related issues with non-interactive experiments, which must be addressed. Fault Models To capture some of the challenges we will shift focus to how we can move towards a more systematic testing approach. Given the unique challenges, networked embedded systems introduce new types of faults, which are characteristic for these systems. One way to identify them is by defining a fault model. The aim of a fault model is to create a taxonomy of the different types of faults by identifying common errors that are likely to occur in an implementation. This way potential sources of faults are highlighted and grouped into problem areas, which can serve as guidelines during the test specification phase. The Nature of Faults and Failures Before we can define a fault model, we need to first examine the different sources of faults and failures. From Section , it should be clear that there is not always a clear link between the source of a fault and how the fault manifests itself. It depends on many different circumstances for the right chain of events to occur. However, it is possible to narrow down the potential sources to basic entities of the system. We identify the following four potential sources of faults in networked embedded systems: The fault resides in the hardware. These faults occur due to the use of low-cost hardware platforms and extreme environmental conditions. The fault resides in the mote software. This source of faults occur due to defects in the software. For example in the form of buffer overflows, when static allocations are incorrectly accessed. The fault is an emergent property created by interaction between the hardware and mote software. Such faults occur when assumptions in software does not match those of the hardware. Example sources include race conditions and drivers, which do not account for reentrancy in interrupt code. The fault is an emergent property created by interactions between motes and their environment. These faults depends on a particular order of events, be it communication or physical phenomenon. They occur due to incorrect assumptions regarding the dynamic environment. A Fault Model for Networked Embedded Systems From our findings based on the work presented above and the study of researches, we propose the following candidate fault model for networked embedded systems. [Incorrect system assumption and simplification] In order to accommodate resource constraints, applications need to simplify and make assumptions about the environment. This may be in terms of local properties, such as the ability to store sensor readings for longer periods of time. If assumptions are over simplified or the system lacks degraded modes, which kick in in case of failure, the application as a whole can come under pressure and end up collapsing. [Failure to meet concurrency requirements] Networked embedded systems are integrated into environments of extreme dynamics. This makes it very essential to tune concurrency to meet the soft real-time requirements of the system. [Failure to self-organize] In order for networked embedded systems to be scalable, they must be able to organize themselves. Part of this is the ability to form and continuously manage topologies that offer a level of routing quality demanded by the application. As an example, applications needs to ensure that proper measures exists, which cope with topology changes. [Incorrect expectations for quality of service] The success of the system is determined by its ability to live up to the level of quality for the services, which it is expected to provide. Specifically, a sensor-based application needs to match the capabilities of the system with the level of detail and precision with which to observe the phenomenon of interest. To accomplish this the quality requirements may employ lossy modes based on how collected data will be used. Benefits of Fault Models We consider the above fault model a candidate, because more work is necessary to extend the model and more clearly identify common patterns. For example, a definitive model should also provide criteria for how to test against each fault. We also propose a study of whether or not faults, which are a result of emergent properties, require more test effort. The main motivation behind defining a fault model is the idea that to be effective, any systematic testing strategy must be based on some notion of the type of faults that can occur in an implementation. Furthermore, a fault model can aid in developing tools and determining coverage strategies specifically tailored to the system under test. Consequently, we argue that systematic testing of networked embedded systems must be based on fault models, which reflect and account for the specific system characteristics. Guidelines and Recommendations To summarize the findings from the above analysis, we will provide a set of general guidelines and recommendations for how to design a test framework, which best support testing of networked embedded systems. [label=*:,ref=*] The test framework must be based on a holistic approach to testing and address the issue of testing complete systems. While testing of units is important and helps to increase the confidence in the overall system, it can be argued that it is just a specialization of the general case, namely testing a system of motes. Furthermore, creating a unit test system, e.g. for TinyOS, is fairly straightforward. The test framework must not rely on instrumentation of motes. First of all it is problematic to depend on what features are compiled into the software deployed in the testbed. Second, by requiring instrumentation, it is likely that certain results will not be realistic. Instrumentation may be offered as an opt-in. For example for determining test coverage of applications or to account for limitations in both the testbed and motes. The test framework must be able to express and execute complex experiments of considerable scale. The effectiveness of the tests depends on whether or not they are able to model advanced scenarios in a scalable manner. The test framework should encourage extensibility through open APIs. It is necessary to acknowledge that to fully embrace new ideas it is necessary to provide developers the ability to extend the test framework and integrate it into their own applications and tools. Information from software and runtime artifacts should be included into the test framework. There is a great need to including information from many different sources to increase test efficiency and thus the usability of testbeds. This depends on a higher degree of integration of the test framework into the overall development process. Background This chapter presents background knowledge on why and how testing is performed. We will also provide an introduction to networked embedded systems and the specific challenge they face. Finally, related work will be discussed and a summary of the findings. The Nature of Testing Testing in the context of software development is a somewhat blurry concept, which can mean many different things and apply to many different situations. So to better understand what testing is, it is necessary to first take a general look at testing by looking at testing from different perspectives and investigating the different techniques and approaches to testing. This will help to establish areas of interest, and provide an understanding of the benefits it has, and how it affects the decisions made by people involved in software development. Testing as a Software Development Practice Developing software is a complex task that requires an enormous amount of careful planning and thoughtful management to be successful. This has increased focus on the importance of choosing a development methodology, to leverage and help guide and drive the process. Regardless of these more political concerns of the development process, software development remains a very iterative and experimental process, much like scientific research, where decisions often needs to be taken, while the system as a whole is not clearly understood and there is a constant need to go back to previous steps or phases of the process, in order to reevaluate them and revise decisions. This underlines the need to continuously observe and measure the software to guide and control the process. From this perspective, testing is one of the most straightforward and effective methods in which to perform this measurement, because it provides a metric on which to evaluate correctness, quality, adherence to standards, performance, etc. against expected criteria. In this respect, testing does not necessarily have to involve running any code, but can be a matter of analyzing the code, for example using static checking to detect potential race conditions. However, since software eventually is meant to be run, testing that exercises the software by executing it can answers questions about how the software will behave under certain conditions. Types of testing A test framework distinguishes between a test suite and a test case. A test suite is focused on testing a specific part of the system using several test cases, however, the test cases might test different aspects. Given a system and a test case, a test run usually consists of the following states: [Preamble] where the system being tested is brought into the initial state required by the test case. Part of this step is to construct a credible environment, which accounts for reproducible behaviors. [Body] where the actual test is performed. [Postamble] where the system state is restored, for example by closing it down. The main objective is to bring the system into a state where another test has to be run. It is often desirable to confine tests to focus on a specific part of the system. This allows testing to be performed incremental in a bottom-up fashion, where test of small individual parts are first performed and then gradually extended to more complete parts of the system. This sort of incremental testing going from individual components to complete systems helps to limit the scope and ease locating the origin of the error. To achieve this form of granularity, the test usually needs to emulate surrounding components with which the part being tested interfaces. This requires that the system is modular and has been designed with testability in mind. In the following two views on testing is presented. They each try to classify tests and can serve as a way to develop a good test strategy. The most broad way to classify tests is in terms of what knowledge exists of the software being tested. This classification to some extend reflects who is responsible for making the tests and in what phase of the development it is done. The classifications are: [White-box testing] is where complete knowledge of the software being tested is available. This allows a more fine-grained form of testing where a specific code path can be exercised. Usually this form of testing is done upstream by developers, since they know the code, and it is done throughout the lifetime of the software. [Black-box testing] is where no knowledge about the software is considered when creating the tests. This testing is usually done towards the end of a release cycle and performed downstream by end users or people in charge of quality assurance. However, black-box testing can also be done in the form of randomized test generation. [Grey-box testing] is where partial knowledge about the software is used when testing. This covers use cases such, as testing of libraries where only the application programming interface (API) is known, which can be the case when evaluating different providers of the same library or interface. This form of testing is done by developers, but can also be done by people in charge of integration of different software components. Orthogonal with the above classification is the more concrete classification of the types of tests in terms of what they are trying to accomplish. [Functional testing] This form of testing, also sometimes called acceptance testing, is meant to capture the expected behavior of the system in a more story and scenario-based fashion. It usually deals with only the complete system. As such this testing allows customer to define parts of the specification in terms of use cases. [Performance testing] The goal of performance testing is to observe the behavior of the system when it is under heavy load. It could be a web server, which gets many incoming requests. This form of testing can give valuable information about the performance of complex systems, for which it is otherwise hard to predict anything. [Parallel testing] This testing method is meant to document differences between two similar systems. It can be used when replacing or rewriting parts of the system to check when the switch can be completed. This method can use unit testing, functional testing or both to evaluate the differences. [Regression testing] While not strictly a method in itself, regression testing is, however, an essential part of testing, since regressions can sneak in at any time during development. Regression tests take the form of any of the above methods, depending on whether the performance, parts of an API or the whole system must be tested for regressions. As the name implies, regressions are detected by comparing against either previous test results or expected criteria. This means that different types of strategies can be employed if knowledge exists of how the system changes. For example, if only one module changes only tests related to this module needs to be run. [Fuzz testing] Where the above methods use very systematic approaches, fuzz testing tries to achieve the unexpected by randomizing the input. This sort of harness testing can sometimes be desirable to make sure the system is robust and behaves correctly, e.g. by not crashing, even when given non-sensical input. By keeping the methods described above in mind when designing tests, it is possible to develop a good test strategy or framework. It requires that the specification of the system is taken into account. Dynamic testing: Part of this thesis will explore using various methods to randomize or test unexpected scenarios by injecting events such as node failures. Testing Evalution How do we evaluate the software using tests? Over the years different approaches to testing software have emerged to embrace the different needs of software development. Knowing when one approach is preferable to the other can have a great impact on the final result, since different approaches find different defects. It can be hard to figure out what and how exactly to test. To quote ``testing is a bet''. It may and may not be necessary to test every single part of the system. First of all you need to know, to some extend, what you are looking for to successfully design good tests. To make testing effective you might have to employ various assumptions or accept some limitations to not end up with a complex and very time consuming test design. During the development process, changes in focus and goals can have a big impact on how testing is performed. The evaluation is therefore something that might be necessary to continuously address and update in each iteration. Another problem lies in the fact that evaluations are sometimes based on brittle criteria. This is, for example, the case in some types of usability testing, such as think aloudREFERENCE or expert evaluation. However, usually there is some kind of user or use-case to be targeted which will govern how the evaluation process will take place. Something about test case specifications Three such test selection criteria: [Functional] Bound to the intended functionality of the system in terms of application specific scenarios [Structural] Concerned with artifacts that are executed (covered) during a run of the system or executable model. Control flow (statements, code conditions, state machine transitions) Data flow (definition and use of variables) [Stochastic] Random testing and testing on the ground of existing user profiles. To summarize, the nature of testing depends a lot on the project and the used development methodology. It provides a useful tool for evaluating and manage the development process. Clear goals are necessary to tailor testing to make the most benefits. And finally, problems may arise when goals change during the development process or evaluation based on subjective criteria is not carefully taken into consideration. Networked Embedded Systems Networked embedded systems have emerged as part of the movement behind creating a platform for pervasive computing. The goal is to instrument the physical world with pervasive networks of small embedded systems or motes, which are able to sense, process and interact with their surrounding environment. This poses a unique set of challenges that must be overcome and which has influenced the approach and characteristics that distinguish these types of systems. Application Areas To better understand these characteristics and challenges, let us first look at some of the application areas, where network embedded systems have been used. From the start, inter-disciplinary scientific projects have developed and deployed these systems, because they enable close yet conspicuous observation of physical phenomena, especially for the class of applications concerned with habitat and environmental monitoring. A good representative of the many applications in this domain is the project for habitat monitoring of birds on Great Duck Island. The monitoring ran for a few months while the birds were nesting and had as a requirement that it should be as inconspicuous as possible in order not to disrupt behavior of the birds being studies. The deployed system featured a multi-tiered architecture with small battery powered sensor motes placed in the underground nests of the bird, a transit network of more powerful motes over ground, and a base station for collecting the sensor data. The project showed that networked embedded systems, while still in the research stage, had a lot of promise in shedding new lights in areas where it is difficult to gather information. Other examples include applications for tracking long-term animal migration in Africa and efficient parking management in urban areas. Networked embedded systems have also been applied commercially to areas, such as tracking warehouse inventory and providing detailed information related to logistics. A great potential is also in the area of reducing energy consumption, where networked embedded systems can provide insights into the efficiency of manufacturing processes and help to reduce waste. Common Characteristics and Systems Challenges While each application area has its own requirements and restrictions, the systems all share some of the same characteristics. They are all deeply distributed systems consisting of resource constraint motes, which have to operate in extremely dynamic environments. The dominating factors are the requirements to the form-factor, the cost of hardware, and the life-time of the application. The deployed motes generally consist of small, low-cost, energy-efficient hardware platforms, which are then equipped with components for acquiring and storing data depending on the needs of the application. By scaling down each mote, dense instrumentation of phenomena under observation and the creation of systems of immense scale is made possible. This poses the challenge of creating autonomous systems which can operate with limited human access. While some systems rely on harvesting energy, they are most commonly battery powered. As a result, dealing with the energy constraints poses the most dominating challenges in the design of the overall system. The reduced energy budget requires that the motes use efficient and low duty-cycling, where they power down unused hardware and spend most most of their time with the main processing unit in ultra-low energy sleep-mode. The single most expensive operation in these networks is communication, which is provided via short-distance radios. To cover larger areas, the systems rely on forming ad-hoc networks and use multi-hop and location-based routing schemes to efficiently propagated information to its destination. To minimize communication needs, in-network processing and topology-aware aggregation must be applied. Furthermore, the limited uptime of motes combined with the unreliable radio medium results in high-latency of data propagation, which means that communication must be fault- and delay-tolerant. Using strategies such as ``sense and send'' and ``store and forward'', applications are highly driven by interactions with the environment and often have soft real-time requirements. This means they have to achieve a high level of concurrency, especially on platforms relying on cooperative scheduling. Lastly, the immense scale and limited access, means that it is difficult to retask and updating application once they have been deployed. Programming Models and TinyOS In terms of software, applications are deeply tied to the hardware. Furthermore, while a high diversity of platforms offer a lot of choice in terms of customizing the hardware, the software/hardware boundary is often not well-defined and there is a lack of general abstractions. The main reason is that resource constraints force the use of optimizations and specialization in favor of abstractions in order to tailor applications to run efficiently. Reevaluation of algorithmic techniques and use of approximations are often necessary to accommodate for uncertainty and reduce the overall execution foot-prints. To overcome some of these challenges, work on developing a more appropriate programming model has been undertaken. One of the most successful programming models for networked embedded systems is the model developed for TinyOS, a small operating system-like layer. To support this model the language nesC networked embedded system C has been developed, as a dialect of C with a small set of domain specific constructs. The most important language construct is the support for decomposing systems into components, which expose functionality via interfaces. Applications are then composed by wiring components together, whereby it becomes easy to replace components and thus customize the system to the individual platforms. Furthermore, by making it possible to define components as thin layers, which wraps hardware components, the problem of defining a software/hardware boundary is overcome. The TinyOS programming model also provides an abstraction in the form of a simple two-level execution model to support the highly event-driven nature of applications. Using cooperative scheduling, long running tasks, which need to process data from the network or sensors, can be posted to run in FIFO order. Events in the form of interrupts from hardware, preempt the execution of tasks and allows management of hardware. The force behind nesC is that it allows whole-program compilation, which bring two major advantages. First, applications are preprocessed and assembled into one single code artifact using compile time bindings based on the component wiring. This significantly reduces the cost of the component abstraction and allows very aggressive optimizations. Second, using whole-program compilation the compiler is able to do static checking of the complete control flow of the program with knowledge of the execution context for the different parts of the code. This helps to reduce defects by detecting problems, such as race conditions when variables are accessed from both normal context and interrupt context. Related Work Inspiration for the work presented in this thesis has come both from research in the field of networked embedded systems and sensor networks, but also from the research of distributed system in general. In the following a brief overview of related work is presented. MoteLab was one of the first testbeds developed for wireless sensor networks. Using a networked backchannel of permanently deployed and powered nodes connected to a central server, remote reprogramming and monitoring of the nodes are available through a web interface where users can schedule jobs and collect data from test runs logged to a database. In addition, users can access nodes in real-time to use custom programs for monitoring and injecting data into the running application. One of the founding ideas of MoteLab is the use of active messages, which can be compared to an extensible packet format. By defining parsers or packet decomposers, messages received from the mote consoles can be logged into a database together with a timestamp, allowing the tester to use them to later recapture and reason about the states of the individual motes. While feature-rich, the MoteLab design choices put significant limitations on what kind of experiments it can handle and the important aspect of resource management. Some of these limitations are addresses in Mirage, a microeconomic resource allocation scheme for sensornet testbeds. Chun et al. found that flexible sharing of a sensornet testbed in both space and time is necessary and argue for an auction-based reservation system, where users compete for resources, to address the problem of resource contention. By placing bids, users contribute to maximizing the testbed usage, which makes the system more flexible and better at coping with low and high resource demands compared to a strictly quota-based system with fixed-valued time slots. However, they also note that such a system gives users an incentive to try to game the system during heavy demand, and suggest that strategy-proof or hard-to-manipulate auction mechanisms needs to be considered to avoid pathologic gaming of the system. Abstract resource specification is another important method to extend the flexibility and usage maximization of a testbed. Mirage uses a combinatorial approach, where requests are made for collections of nodes with certain characteristics, such as platform and topology, opposed to specific physical nodes. This more accurately allows users to express their preferences, and enables the system to find the best mapping and schedule multiple specifications simultaneously. A more elaborate system for mapping of testbed resources is given in . Although done in the context of Emulab, the approach of using virtual equivalent classes and simulated annealing to reduce and efficiently solve the mapping problem can also be applied to wireless sensor network testbeds. Ricci et al. also address the problem of minimizing inter-experiment effects by considering node localization and using load-balancing, something that is not addressed in the work on wireless sensor network testbeds. The Case of AJAX: Selenium To illustrate how to cope with some of the challenges of testing distributed systems, we will take a look at Selenium, a system for testing web applications. These types of applications use a client-server architecture, where content is pushed from the server to the client for display. The architecture is generally loosely coupled in that no or only few assumptions are made from the server- and client-side about the other side. One of the challenges of developing web applications is the historical lack of standards on the web and the diversity of browser platforms. With the introduction of Asynchronous JavaScript and XML (AJAX), a technique for making more interactive and rich web applications, web applications have become more advanced to the extend where they are pushing the limits of what is capable. Consequently, there is a great need for testing if the web application work uniformly across different browsers. The basic idea behind the test framework in Selenium is to use existing web-based technologies to make it possible to run across multiple platforms. First, a remote control server is used for orchestrate the tests by launching a browser instance and acting as a proxy between the web application running on the server. By using a web proxy it is possible to inspect both what data the client requests and what data the server sends. Second, the browser's own capability of executing JavaScript code is used for embedding a test automation engine on the client side. This method gives full access to inspecting the document content and the browser state as well as controlling the browser's behavior, such as clicking on links. To further extend the system Selenium provides different means to create tests. A Firefox plugin exists called Selenium IDE which allows tests to be recorded by tracking the user actions and replayed. For more complex tests, Selenium also has the notion of test drivers, which are external programs, which can control the whole test process. This way Selenium can be integrated into whatever test frameworks the project is using for it's other tests. While this architecture is very powerful and easy to use, one downside, which the developers behind Selenium points out, is that tests are brittle and may easily break. This can happen if the formatting of the document content is changed. Another problem is that the framework does not scale well when the size of the test suite increases, because the remote control mechanism is slow and has limitations in terms of how much parallelization is possible. An extension to Selenium called Selenium Grid http://selenium-grid.seleniumhq.org/how_it_works.html addresses the latter issue by allowing tests to be distributed and run on a grid. Summary There are clearly unique challenges to overcome when developing networked embedded systems. Many of them are related to different parts of the design space, which means that it is very unlikely that any single approach is able to be successful. Consequently, the challenges need to be addressed from many different angles. TinyOS and nesC has shown that one angle, which plays an important role, is the tool chain. We believe that these efforts must be extended. By now it should be clear that designing and writing tests is not always easy and straightforward. There are many issues and trade-offs to consider if the time writing tests should be as effective as possible. From the findings presented above, the following features or desirable characteristics of tests and test frameworks will serve to guide later evaluation and design choices. For test cases and test suites we identify the following characteristics: [Help to locate the defect] The granularity of the unit being tested should be small enough that locating in what part of the system the defect resides does not take valuable resources. One way to achieve this is to partition the test suite into logical parts, each of which focuses on testing a particular part of the system. Furthermore, the test suite should be arranged so that it uses incremental testing, aggregating result by starting with smaller units and after which it gradually increase the scope. [Reliability and reproducible] It needs to be possible to reliably reproduce test failures. This means that the test should account for all the possible different aspects of the test environment such as the architecture and platform, on which the system runs. Another important aspect is to account for specific timing requirements. This can be especially tricky for performance and randomized tests, where part of the test environment cannot easily be controlled. In both cases, this sort of behavior needs to be documented in the test results along with information critical to later attempts to rerun the test or perform post-mortem analysis. [Coverage] A test suite without coverage is no more than a false sense of security. Coverage is time consuming and requires careful planing and thought about different use cases and failure scenarios. However, in the end coverage can also act as a feedback loop regarding parts of the system or interface that could be considered corner cases and thus where the expected behavior is either not documented or even defined. One of the biggest obstacle to achieve good coverage is that it requires a lot of knowledge of the system being tested. To get around this tools can used to measure the coverage by profiling the code or instrumenting it with probes to get information about which code paths are tested. [Timeliness and responsiveness] For testing to act as a useful tool and something developers take serious it needs to be fast and responsive. The time to test the part of the system of interest should not be greater than a few minutes to keep the ideas fresh in the minds of the developer. This is somewhat contradictory to the goal of having coverage. For this reason, it may be necessary to have several test suites for large systems, where a minimal and fast test suite covering the most basic functionality can be run in each smaller development cycle and a test suite with full coverage is run automatically and periodically on a server. [Maintainable] Tests like all other software needs to be maintained as the surrounding system evolves. This suggests that as patterns emerge, tests may need refactoring and their own set of abstractions. Furthermore, tests should avoid to make assumptions on the test environment that can make them brittle. For test framework, the following characteristics have been found valuable: [Simple to write] The test framework must make it simple to write tests. [Automation] Another crucial feature to lower the barrier to testing is automation. Making the testing process as automated as possible is really essential to ensure that the test suite is run frequently. For this reason automation is often integrated into the release cycle so that test suites are run on a server after an update or once a day. Automation gets especially important when deadlines advance and the stress level rises. [Generic and versatile] Being a framework, it is very important that it provides a very generic and general usable interface for writing tests in order to support many different work and test models. The most important issue here has to do with allowing tests to be controlled. A versatile test framework should also provide support for reducing the amount of work required for common idioms and testing tasks. Part of this has to do with allowing an easy way to setup and teardown test cases via preamble and postambles, which removes the need for a lot of boilerplate code. A similar part has to do with checking the gathered test data [Highlight errors and breakage] Spotting problems related to errors and breakage is the main reason why testing is done, therefore this should be a top priority. This means that it should not require any manual work and that it should be both easy to see which test suite and which specific test failed. Furthermore, if possible information about the breakage in terms of how the test result differs from the expected. Introduction Over the last decade, networked embedded systems have moved from an emerging concept to a mature and powerful technology. Driven by the vision of pervasive computing, it is taking part in shaping the future of computing by breaking down the barriers of what was previously thought possible in the areas of data acquisition and distribution. With the advance of increasingly powerful hardware platforms and standards, networked embedded systems are today being applied in areas ranging from personal health care, to intelligent buildings, to cross-continental tracking systems. A key objective characterizing pervasive systems is that they disappear into the environment and work autonomously without human involvement. Herein lies a big challenge, in terms of meeting the important need for testing the systems with respect to correctness, performance, and other parameters. This thesis argues that to fully obtain this, there is a great need for rethinking parts of the tool chain to provide researchers and developers with a better understanding of pervasive systems. The main hypothesis is that current testing practices are inadequate at capturing the nature and behavior of pervasive applications and that there is a need for new approaches, which more actively embrace testing techniques that facilitate aggregation and allows to decompose application flows into behavioral patterns. Context To further elaborate on the field of testing, networked embedded systems, and testbeds, this section provides an overview of the context in which this thesis takes place. Develop an effective and systematic testing strategy, which supports both the notion of the type of faults generally found in networked embedded systems and also has the measures necessary to properly decompose application into testable behaviors. To verify correct implementation by analysing requirements and developing test cases is not effective in understanding the overall complex behavior of software. Testing This brings up an important subject: the goals of testing. What do we want to accomplish by testing our software? There seems to be a consensus that one of the most important goals of testing is to build confidence. To quote David Gries: ``We should run test cases not to look for bugs, but to increase our confidence in a program we are quite sure is correct; finding an error should be the exception rather than the rule.'' Furthermore, this confidence is incremental in that if there is no confidence in the smaller parts, then there is none in the whole. This suggests that testing is most of all about making sense of the code and creating a solid foundation for future development. Testing can also be seen as a form of specification or a method to validate the software against a functional specification of expected behavior. Using this perspective testing is a way to measure the amount of different between the current state of the software and the expected behavior written in the specification. This is valuable when releasing since testing helps to make quality assurance more straightforward. Testing also provides an opportunity to take a step back and look at the system under test from a different angle, be it from an end user's perspective or in library development, from the API user's perspective. This brings up an important point, namely that testing or testability of software needs to be considered during the design process. For example, modular design and use of dependency injection makes it possible to test modules and subsystems separately by using mock objects. Looking at testing from this perspective, testing also becomes a way to ensure not only a robust system, but a more modular and reusable system. Over time when the software evolves and changes its shape, this helps to keep track and control the changes by locating regressions. With a good test suite, changes become easier to make and reason about because it helps to quickly find regressions. Furthermore, from the perspective of economics, problems found early are easier and cheaper to fix. In this way, effective testing keeps software alive longer by reducing the cost of developing and maybe even more important maintaining software over its lifetime. In short, the investment in testing early on pays off in many of the later stages. As networked embedded systems become more widespread and mainstream, failure will become more than just an expensive inconvenience, but something that can put our way of life at risk. This clearly underlines the demand for creating testing tools that assist in finding problems and catching defects as early as possible to lower the cost of developing systems and prevent fatal errors. Software defects are inherent parts of the programs we produce. Testing has received a lot of interest and attention in the last decade. Some of this can be attributed to the extreme programming and agile development movement, which had as a mantra to write test for everything and succeeded in making testing and more important writing tests, painfully easy. High quality software is one of the primary development goal of any project regardless of the programming technology being employed. Key questions related to quality include: How do we adequately test programs? How do we know that our testing and quality objectives have been attained? Can we determine when we have tested enough? These questions cannot be addressed using traditional unit or integration testing techniques. One of the reasons is that much of the behavior depends on the context in which the system operates and the surroundings. With the ever increasing demands and requirements for development process to deliver better quality systems earlier and reduce the cost of maintenance. Developing new systems is a complex task, which requires many different skills. We have come to expect that software defects will be parts of the system we develop. Testing can only show the presence of defects, but not their absence. Dijkstra While it can be argued that it is impossible to thoroughly test a system of any real size, Coupled with Dijkstra's comment that testing can only show the presence of defects, but not their absence. Networked Embedded Systems cascading/conflicting goals: form factor, lifetime, yield, unattended period For networked embedded systems, different challenges have made it difficult to provide a framework supporting the various needs for testing. Testbeds One of the big challenges with developing networked embedded systems is that they are inherently hard to test due to their scale, lack of reliability, and limitation in terms of hardware and communication. The main motivation behind the work on wireless sensor network testbeds, is the search for understanding the technical challenges of wireless sensor network at scale. This has given rise to a number of different testbed frameworks, some with widely different goals. Testbeds have been used as a mean to tests distributed systems on real hardware. Furthermore, compared to simulation, only a testbed using real hardware can provide a solid and thorough understand of the real-world technical challenges of networked embedded systems, such as resource limitations, communication loss, and energy constraints. This is a very important point, since an essential aspect of networked embedded systems is that they are physically coupled to the environment and therefore effective experimental facilities must encompass realistic applications. Testbeds have been used as a mean to tests distributed systems on real hardware. At the University of Copenhagen's Department of Computer Science work has been done to develop a complete framework called Re·Mote Testbed Framework. The project is an attempt to create an open framework of loosely connected components, which can be tailored to the needs and requirements of institutions. One of the underlying principles and design goals of the framework is to be as platform agnostic and independent as possible. This recognizes the fact that there are a great number of different platforms being used in networked embedded systems. To some extend, it is also independent to the system software running on the motes, so that testbeds can customize the framework to support the software they need. Currently, Re·Mote allows interactive usage and does not support reservation. The design should therefore contain an analysis of what mote properties should be included in mote selection and how often used network topologies can be chosen. Historically, the Re·Mote Testbed Framework has been tied to interactive use provided by a rich client application. The introduction of jobs is a step in the direction towards making the framework more versatile in terms of supported usage models. Following the framework's underlying design goals of platform independence, it is therefore important to decide what scenarios and use cases to consider in the analysis of jobs. and The introduction of jobs will not remove the need for interactive use. However, it can help to seriously improve the experience by automating parts of the session. In this respect a job can provide support for session saving/restoring. By recording information on how the user interacts with the client, a job can be created which replicates the behavior. While it is not truly The framework has been deployed in the DIKU Testbed and used during master level courses at DIKU. All the main central server-side components are running on a single normal commodity PC with mote host software and attached motes installed and attached to workstations placed around DistLab at DIKU. Research Questions and Goals As mentioned, the underlying hypothesis is that current testing techniques for networked embedded systems are immature and do not accurately capture the needs of experiments. Consequently, the focus of this thesis is to take an exploratory approach and look beyond existing research efforts done in the networked embedded system community. With this broad scope, we seek to identify novel concepts and better understand the research space encompassing the nature of networked embedded system testing. In addition, we are also interested in gathering practical experience to establish how to move forward. These concerns are captured by the following project specific learning goals, which guide the work presented in this thesis: Give a comprehensive analysis of the requirements and technical problems of testing (applications developed for) deeply embedded networked systems. Based on theory and similar projects, evaluate methods for building a robust test scheduling framework. Design and implement a test scheduling framework. Evaluate different methods for dynamically configure tests with the aim of extending the set of tested features. During the work, the interest and focus have changes as more knowledge about the specific challenges became clear. This has let us to revise and refine the goals to focus less on scheduling of tests and more on test frameworks from a broad perspective. Consequently, to formalize the project specific learning goals and describe the problems faced in this thesis, we pose the following set of research questions: Update conclusion Is it possible to implement a robust test framework for networked embedded systems? What challenges must be faced when designing a test framework for networked embedded systems? How do we approach the problem of efficient and systematic testing? Can dynamic testing techniques be applied to extends the set of tested features? To further formalize why research of test frameworks for networked embedded systems is interesting, we note that no common testbed platform or framework for testing has emerged from the research communities involved in TinyOS. While efforts has been made to work towards this, testing of networked embedded systems remains insufficient. Design Goals The main design goals for the test framework are: [Holistic approach to testing] A main concern is that the framework has a holistic approach, both in theory and practice, which allows testing of complete systems. [Powerful control of experiments] To facilitate more advanced experiments with timed events such as simulated mote failure or injection of data and packages an interface for controlling experiments is required. [Versatile and extensible] Being a framework, it is very important that it is both versatile and extensible so it can support a wide range of different work flows and test approaches. [Job creation and configuration] Since the intended users of the system is both experienced researchers as well as students taking their first course on sensor networks, the system should allow powerful configuration while remaining easy to use. For very complex jobs, dynamic configuration should be supported by providing tools for topology management and job control. [Testbed resource management] The scheduler for the framework will involve a resource manager with the aim of configuring the resource utilization of the testbed. Currently, Re·Mote allows interactive usage and does not support reservation. The design should therefore contain an analysis of what mote properties should be included in mote selection and how often used network topologies can be chosen. Approach An important task is to first understand the basic concepts of testing and the context surrounding networked embedded system testing. To accomplish this we have done a literature study of related work. The next task is to investigate and identify key challenges and explore the topic of dynamic testing techniques. Here we have taken a problem solving approach, which groups related challenges and analysis possible solutions in terms of benefits and weaknesses. Findings from both of these exercises are summarized and a set of general guidelines are derived from the key challenges. Based on the guidelines, a design of a test framework is presented. Since a functional proof of the concept is important to show that this is more than just an academic exercise, a partial implementation is constructed based on the design. The system is constructed using a bottom-up approach, which focuses on the construction of a preliminary client capable of running test experiments. Contributions The work presented in this thesis contributes to the networked embedded system community in several ways. The key contributions are: A thorough overview of the landscape of networked embedded system testing. A candidate fault model, which identifies common sources of faults in networked embedded systems. An exploration of concepts and methods to dynamically increase test coverage and isolation of faults. A design for applying these concepts to test frameworks for networked embedded systems. A partial proof-of-concept implementation of the test framework. Audience Students of computer science, and people interested in testing of distributed systems, will find this document rich in details and discussions within this field. Additionally, embedded system developers will be interested in the discussion of the design of the test framework. It is assumed that the reader of the document is familiar with the basics of computer science and hardware and furthermore understands the fundamentals of distributed systems. No knowledge of testing and networked embedded systems is required. Document Structure This document is structured as follows. Chapter will first provide more background information, after which Chapter takes a closer look at testing in the context of networked embedded systems. This is followed by an exploration of dynamic testing techniques in Chapter . The design of the test framework is presented in Chapter . Chapter gives an overview of the implementation of the proposed system. The implementation is tested and evaluated in Chapter , followed by a conclusion in Chapter . *Acknowledgments The initial work on the Re·Mote Testbed Framework was done by Jan Flora and Esben Zeuthen at Copenhagen University's Department of Computer Science under the supervision of Philippe Bonnet. The Re·Mote Testbed Framework has been developed in the context of Embedded WiSeNts and has further been developed in the context of the CRUISE IST. Thanks to Rostislav Špinar for the collaboration and sharing of ideas. 10cm Estelle, je t'aime *Abstract *English This thesis argues that current testing practices of networked embedded systems are inadequate at capturing the nature and behavior of pervasive applications. There is a need for new approaches, which facilitate decomposition of these systems into testable behavioral patterns. Key challenges and work flows related with testing networked embedded systems are identified and a candidate fault model is derived. Based on these findings, a design of a test framework inspired by the aspect-oriented programming paradigm is proposed. Finally, a partial implementation of the test framework is developed and evaluated. *Dansk Dette speciale argumentere for at eksisterende teknikker til test af forbundne indlejrede systemer ikke gør det muligt at udtrykke den basale natur og adfærd i altomsiggribende programmer. Der er brug for nye tilgange, der kan hjælpe med at nedbryde disse systemer til testbare adfærdsmønstre. Væsentlige udfordringer og arbejdsgange relateret til forbundne indlejrede systemer identificeres og en mulig fejlmodel udledes. Baseret på disse fund, foreslås et design af et testsystem inspireret af det aspect-orienterede programmerings paradigme. Endeligt, udvikles og evalueres en partiel implementation af testsystemet. 20cm Le silence éternel de ces espaces infinis m'effraie. --Blaise Pascal (Pensées)