author: Shereen Bellamy date: 22-04-2023 ...

Introduction

In recent years, robotics research and development have made significant strides, fueled by advancements in hardware, software, and artificial intelligence. Among the numerous tools and platforms employed in the field of robotics, the Robot Operating System (ROS) has emerged as a highly effective and widely adopted solution for developing and deploying robotic systems. As an open-source framework, ROS offers a standardized communication infrastructure.

The system comprises a collection of libraries, tools, and packages designed for constructing and testing robot applications, as well as a community of developers who voluntarily contribute to the ongoing development and evolution of the system.

This survey paper aims to present an overview of the current state of research in robotics, with a particular emphasis on ROS as the primary platform for developing robotic systems. To this end, the papers will concentrate on three recent studies that exemplify the diverse range of opportunities presented by ROS-based robotics research. Here are the papers titled with: Efficient office assistant robot system: autonomous navigation and controlling based on ROS, Slam and navigation of electric power intelligent inspection robot based on ROS, Implementation and verification of a virtual testing system based on ROS and unity for computer vision algorithms.

Each paper offers a distinct perspective on the application of ROS in robotics research, encompassing a range of use cases that demonstrate ROS's potential in areas such as autonomous navigation, inspection and monitoring, and computer vision.

By analyzing and synthesizing these papers, this survey seeks to provide insights into the latest trends, challenges, and opportunities in ROS-based robotics research, as well as to identify avenues for future research and development.

The paper is structured into the following sections: background, approach, cross-cutting themes, synergistic approaches, strengths and weaknesses, and conclusion. The background section presents an overview of the current state of robotics research and a higher-level analysis of ROS's capabilities. Subsequently, the approach section explores existing and potential strategies within this domain. This is followed by a section that highlights common themes found across the papers. Next, the strengths and weaknesses section assesses the existing merits and areas for improvement within the observed research. Lastly, the conclusion evaluates the contributions of this paper to the domain of research.

Background

The background section offers a comprehensive overview of the current state of robotics research, along with a higher-level analysis of the capabilities of ROS.

The State of Robotics Research

Initially, the field of robotics was primarily focused on robotic arms and straightforward factory automation tasks, such as packaging. However, the high cost of computation, limited understanding of robot control, and the slow evolution of sensors posed significant barriers to progress.

Today, robots can be found in various settings, including major grocery store chains, hotels, and universities, performing a wide array of tasks such as room service, parking assistance, and deli customer service. The burgeoning artificial intelligence research industry has further facilitated the scalability and adaptability of robotic implementations. Robots are autonomous systems comprising numerous interactive software and hardware components. To enable seamless communication between these components, middleware solutions like ROS have been developed, which facilitate connections through graphical user interfaces (GUIs). Prior to ROS, research had already been conducted on robot control architectures and frameworks.

The Power of ROS and SLAM

Moreover, alongside ROS, a compatible process known as SLAM (Simultaneous Localization and Mapping) has gained popularity in the field of robotic mapping.

Simulators also play a crucial role in advancing robotics research by offering a platform to evaluate the efficacy of novel algorithms. As numerous research avenues are concurrently exploring new algorithms, it is beneficial to assess their robustness in controlled environments.

The robustness of algorithms can vary in aspects such as sensitivity to outdoor conditions or light sensitivity.

Approach

The objective of the subsequent section is to examine both existing and potential strategies within the realm of robotics development.

Existing Approaches

Industries have exhibited an increasing willingness to integrate robots into their workflows. However, as new robots and artificial intelligence algorithms emerge, both industry and consumers bear the responsibility of embracing early adoption to further develop these technologies.

According to a 2012 survey from Automation World, 65.1% of respondents adopted industrial-only robots, while 34.9% adopted collaborative robots (cobots). The survey indicates that more than half of the industry has engaged in the adoption of fully industrial robots, such as a robotic machine that twists caps onto water bottles for production. Tasks like water bottle capping, when performed by a robot, can significantly enhance speed and efficiency compared to human workers. Nonetheless, this has led to concerns among skeptics who believe that robots will replace human jobs and are thus reluctant to adopt them. This viewpoint is fueled by an Oxford University statistic stating that 47% of US jobs will be computerized.

However, the process of computerization can involve collaboration, as exemplified by cobots. Although fewer industries have adopted cobots thus far, providing further assurances through research opportunities and virtual testing will aid in improving human-computer interaction with robots.

Paper 1

Paper 1 is titled, “Efficient office assistant robot system: autonomous navigation and controlling based on ROS” by Wanniarachchi et al.

Paper 1's central theme revolves around the design of a robot with autonomous navigation capabilities, made possible through the integration of SLAM and ROS Indigo. This product-focused paper details the communication of its various components to achieve its mapping objective. The researchers effectively introduced the hardware first, providing an overview of the product built and tested. While SLAM is mentioned in the literature survey, its role remains unclear in the context of the methods described. Paper 2 addresses this issue and offers clarification. Moreover, the office assistant robot system's diagram does not clearly show the location of SLAM, although ROS Indigo is visibly positioned within the OS block above the GUI program, indicating that it runs processes within Ubuntu. The mention of the Map running on an rviz turtlebot note within the ROS block could also visually convey SLAM's role. Paper 1 complements Paper 2 by presenting an additional use case employing SLAM and ROS. However, due to its brief analysis, the inclusion of a testing dimension, as seen in Paper 3, may prove beneficial. Since this study takes place in an office setting, future extensions could involve gathering additional metrics from employees interacting with the robot. Although the primary goal is for humans to navigate and coexist with the robot on the office floor, it is unclear how the five individuals from the experiment interacted with the robot beyond clicking their names on the GUI. While the GUI is an essential feature for aligning the prefixed coordinates with the user, it also limits any measurable human activity resulting from that interaction.

The proposed study involves a mobile robot designed for delivering documents or small packages between employees in an unstructured indoor office environment. The robot system consists of three wirelessly communicating elements: an isolated processing unit, a mobile robot structure, and a robot controlling unit. This robot serves as an agent to enhance societal efficiency in task completion.

Previous work by Safdar et al. has explored the development of a ROS-based Mapping, Localization, and Autonomous Navigation system using Pioneer 3-DX. The current research builds upon their findings by implementing the system in an office assistant robot. This novel approach employs Turtlebot software for additional pre-built mapping functions to create a map of an unstructured room. The SLAM-implemented Turtlebot functions of interest include: bring up, gmapping, keyboard teleop, and Rviz nodes for generating the environment's map on a PC. The environment was captured using RGB-D images from the vision sensor mounted on the robot.

GPS accuracy limitations present an opportunity for SLAM to offer improved services. Originally proposed by Smith, Self, and Cheeseman in 1988, SLAM (Simultaneous Localization and Mapping, real-time Localization, and map construction) enhances robot navigation and autonomous functionality using ROS, thanks to readily available packages like Gmapping, Hector_SLAM, MOVE_Base, AMCL, etc.

The robot serves as an autonomous navigation system supported by SLAM, with a vision sensor acting as eyes, a mobile robot base composed of controllable actuators as legs or wheels, and a processing unit functioning as a human brain. SLAM enables a roaming robot to construct an environment map while deducing its location within that environment.

Upon map creation, the selected Turtlebot functions serve as nodes placed on the SLAM-generated map. The system was tested with five users located along the sides of the unstructured office, with five predefined coordinates determining the robot's destination points. The study's results indicate that the robot achieves approximately 98.4% location goal accuracy in both bright and dark lighting conditions.

Additionally, the robot controlling system forms part of the overall robot design. To initiate robot navigation, the user must initiate a pose estimation, enabling the robot to determine its position on the map.

The findings from this paper are significant due to the project's scalability and efficiency. If robots can effectively navigate within an unstructured office environment, there is a clear potential for extending this project to other settings, such as schools and hotels, as long as the terrain permits. Future work is suggested to incorporate voice command-related enhancements.

Overall, the study showcases the power of integrating SLAM and ROS Indigo in designing a robot with autonomous navigation capabilities. It highlights the need for further research and development, with a focus on enhancing the human-robot interaction and exploring new environments for potential applications. By understanding and addressing the limitations and challenges identified in this paper, researchers can continue to advance the field of robotics and improve the overall functionality of these systems in a variety of settings.

Paper 2

Paper 2 is titled: “Slam and navigation of electric power intelligent inspection robot based on ROS” Lu et al

Overall, Paper 2 supports the claims from Paper 1, but with a different use case focused on navigating power stations. The general theme revolves around software architecture and robot control systems, unlike Paper 1's stronger emphasis on the product. However, both papers share the theme of utilizing SLAM and ROS for robot navigation. With the addition of an inspection task, the researchers aimed to emphasize the ability to monitor the robot's navigation trajectory. Instead of following Paper 1's discussion structure, the authors focused on the layers of the robot software system they designed: the user, decision, and execution layers. Figure 5 in their paper refers to these layers as "application," "decision," and "implementation," but maintaining consistency in naming would be helpful. Nevertheless, it is clear that the layers work together via ROS nodes to handle the robot's movement. Both Paper 1 and Paper 2 emphasized the Gmapping and move_base functions.

This paper was inspired by the rising demand for electricity and rapid power system development in China, which has led to an increase in corresponding power systems, transmission, and distribution, as well as traditional manual inspections. The overall design of a power inspection robot consists of a robot hardware platform inspection, selection of key equipment (such as the operating platform of a software system and Lidar laser radar), and consideration of the software architecture and design, including the user layer, decision layer, and execution layer. SLAM's Gmapping algorithm is then used for mapping and navigation.

The researchers' contribution to this paper builds upon the positioning and navigation function of an existing inspection robot. They developed a set of inspection robot platforms capable of relocating freely for electric power system inspection. A Raspberry Pi running on ROS provided navigation and positioning, while velocity and direction information were sent from a serial port to the motion controller based on STM32 direction. After understanding the robot's motion, a laser radar detected obstacles to avoid. Ultimately, Gmapping's trajectory planning algorithm enabled the robot to move to a specified location.

The power inspection robot was tested in a power station setting using a predetermined track. Trajectory metrics were recorded and analyzed to assess the motion control program. Limitations included confinement to low-temperature environments, dustproofing, and heat dissipation capacity. Consequently, the researchers chose the Advantech UNO-2184G industrial computer, which has a working temperature of -40~60 C and IP40 protection level.

Map information was constructed using Gmapping and Hector_SLAM algorithms that returned Lidar data. The Monte Carlo algorithm was used for positioning, while the A* algorithm found the optimal path through move_base navigation. A suggestion for future research is to add a robot arm to carry out additional, relevant operations for a power station, such as machine tending.

Paper 3

Paper 3 is titled “Implementation and verification of a virtual testing system based on ROS and Unity for computer vision algorithms” by Zhang et al.

Paper 3 is the first of the three papers that focuses on virtual environments and testing. Taxonomically, this paper belongs to the testing category due to its primary theme of verifying computer vision algorithms using a simulator. After previously using Gmapping, which is only suitable for indoor environments, it seems fitting that the following paper discusses simulations. Similar to Paper 1, Paper 3 creates a GUI for users to interact with. However, unlike Paper 1's GUI, it has more robust graphics and color. The researchers for Paper 3 might consider adding user interaction reporting as an additional testing approach. Nevertheless, the testing system that Paper 3 verified seems worthwhile for Papers 1 and 2 to explore to increase verification of their tests.

This study focused on implementing and verifying a virtual test system for computer vision called URCV. Computer vision, a field of artificial intelligence, is used to derive information from digital images, videos, and other visual inputs. Examples of applications include traffic flow analysis, road condition monitoring, and cancer detection. In automated driving, this scenario generates the need for the system to "see" the surrounding environment. With that input, mechanical driving behavior can be generated through learning and decision-making. However, current obstacles exist where traditional testing methods for this protocol, based on real elements, provide singular samples. The singular sample size indicates a lower chance of achieving the ground truth for decision-making. ROS and the game engine Unity were selected to build URCV. For verification, the focus was on the feasibility of the system and the influence of virtual elements during the testing of the monocular ORB_SLAM2 algorithm from OpenSlam. OpenSlam is an open-source repository exposing projects that cover the SLAM process. This included the rendering path of Unity's RGB camera, texture accuracy, and illumination model. To achieve this, a simulator was built due to its real-time focus and ability to scale algorithms.

URCV is unique compared to existing simulations because it was structured to accommodate all computer vision algorithms. Previously, simulators like Gazebo and Unreal CV were designed to support only a select few. Unity was used to provide 3D virtual indoor, urban, and suburban scenes to test the computer vision algorithms. Accordingly, virtual sensors acquire information from the scenes, such as the RGB image details, sensor positioning information, etc. These details are then transmitted to ROS under the control of the ROS Connector for subscription. From there, ROS communicates with a URCV incentive container to receive, store, and visualize or republish the information. Similar to Paper 2, a user interface was designed to aid human-computer interaction. In this case, the user interface was designed to help users set and view the virtual scene parameters and sensors. However, unlike previous ROS use cases, this paper focuses on ROS#'s ability to subscribe nodes to the information. ROS# was used due to its compatibility with Unity, which uses the programming language C#. The choice of ROS was to accommodate point-to-point design and multiple programming languages.

In robotics, there are four types of robot control: point-to-point, continuous path, and controlled path. As mentioned earlier, this testing was designed to support point-to-point design. The point-to-point concept allows the robot to move continuously, with the ability to stop at any point they wish. The locations are then recorded in the memory of the robot control system. However, robots using this type of system are unable to move from one point to the next. Examples of robot tasks in this category include spot welding, hole drilling, and assembly operations.

The analysis was carried out to verify the impact of virtual scene elements by focusing on the feasibility verification of URCV, the effect of different rendering paths of the RGB camera, the effect of texture accuracy change, and the effect of different illumination models on the monocular algorithm test. In this paper, there is a reemergence of the light versus dark discussion, which emerged in Paper 1. However, from a testing perspective, a new dimension of shadowing is added to the analysis.

Feasibility was measured by comparing URCV's performance with that of the TUM Test. The TUM dataset contains color and depth images from a Microsoft Kinect sensor and the associated ground truth. In computer vision, ground truth can be defined as information known to be real or true. Ultimately, the experiment showed that the monocular ORB_SLAM2 algorithm successfully ran with the URCV testing system, with a small error gap compared to TUM. Regarding the analysis of different RGB rendering paths, those observed included Forward, Deferred, and Legacy Vertex Lit. Ultimately, the researchers reported that edges and shadows had the greatest degrees of influence. The analysis of texture accuracy change focused on the Forward RGB camera section, with an emphasis on ground and wood textures. For this test, the textures were purposely altered so that wood had the texture of the ground, and the ground had the texture of wood. It was found that increased texture did not significantly impact accuracy. The final test focused on different lighting models, and it was determined that the illumination models had little significant effect on the results of the computer vision algorithm. To add a deeper realism dimension to support the ground truth, the researchers suggest extending the paper by adding a focus on more virtual sensor types, such as depth sensors and lidar.

The results of this paper have an essence of novelty due to their comparison with the TUM data. By comparing the new results to existing ones, readers can see what has improved.

This paper was interesting because its verification nature led to the most extensive evaluation section compared to the previous two papers. The paper's claim to make this system broadly applicable covers the use cases from the previous paper in that reach for broadness. Additionally, the stack used for testing is similar to that of the prior papers as well, ROS and SLAM.

Suggested Approaches

All of the papers would benefit from a discussion on memory handling, which was not substantially addressed in any of them. For example, in Paper 1, if a robot will be in the workplace office every day, it would be beneficial to know the capacity of how long a robot can remember a location. This information would allow the reader or consumer to determine whether the office needs one robot per floor or if one is sufficient. Additionally, a conversation about memory would have been helpful in all the papers when referring to error handling. The papers were written from a perspective of showcasing accuracy, but in the case of a corner case, does the robot have a checkpoint to return to the last location? This is especially true in Paper 3, where the researchers are claiming to find the ground truth for realism. Adding an evaluation or extension that includes a memory perspective provides a worthwhile, realistic prediction of how these products will work alongside users.

Similarly, as Paper 3 focuses on computer vision algorithms, it would be relevant to run a similar study testing foundation models. With the rising interest in ChatGPT, artificial intelligence has taken a sharp turn to accommodate this framework. Foundation models are a framework for building AI systems where one model can be adapted to multiple applications. If the virtual testing system added a new component that involved suggesting additional computer vision models to advance efficiency, it would enlarge the contribution of URCV.

In terms of systematic framework, each paper had its own approach to mapping their system.

For Paper 1, the software mapping was displayed in a structure similar to their hardware diagram. The structure for the office assistant robot system comprised a Primesense 3D Vision Sensor, iRobot Create 2 Programmable Robot, and a mini Laptop. Correspondingly, the office assistant robot system overview shows how the GUI communicated with the Central Location PC at an IP address (holding the operating system and information node), along with the Robot (composed of Primesense) and two additional Turtlebot nodes to communicate information in ROS. Overall, the diagram shows how information from the sensors and nodes is translated to the iRobot base through a point-to-point wireless connection.

For Paper 2, as mentioned earlier, there is a clear focus on software architecture, which is clearly communicated through the diagrams. Starting with a description of the decision layer system block diagram, the application layer, decision layer, and implementation layer all have their encompassed responsibilities. The process begins at the application layer, which is responsible for target location. Once the system has a target location, it sets a goal. However, to reach the goal, that information needs to be sent to the decision layer for path planning. The decision layer holds the steps for path planning, positioning, map building, and odometer data. Once the path planning stage is reached, the decision layer gathers information from odometer data, map building, and positioning to choose the best possible path for execution. It is at the decision layer stage that any additional information regarding proximity algorithms comes into effect to proceed accordingly. Finally, once a path is decided, the options for the path are located in the implementation layer for deployment. In this implementation layer, there are two options: motor controller and auxiliary controller to control the robot's movement.

More specifically, for the mapping function, Paper 2 also shared the robot mapping system framework. The framework begins with the data to be mapped, such as Udar data and Odometer data. Then, that information is sent to ROS gmapping for processing the robot positioning and navigation. To calculate a final position, a cycle takes place, composed of SLAM gmapping, motion modeling, scan matching, proposal distribution, weight computation, resampling, and then updating the field at ROS mapping.

Similar to Paper 1, they discuss mapping, but the researchers highlight the underlying processes of how data goes through the system, beyond positionality.

Lastly, for Paper 3, the focus is on relationships, similar to Paper 1. Paper 3 presents a diagram depicting the structure of their virtual testing system, URCV. The figure shows the relationship between the game engine Unity, ROS, and the plugin, ROS#. The diagram addresses the responsibilities of Unity, which include the scene, sensor, scripts, and interface for the virtual environment. Within the scene subsection, attributes include geometry, material, and lighting. Similarly, for the interface subsection, those three attributes reappear as the interface repeats the parameters above the box. The authors could have improved the design by including a floating box or deleting the scene and sensor box, as it is repeated directly below. Adjacent to the interface, there is another box that holds the functions that help publish the script, like ImagePublish, PoseStamped-Publish, and JoySubscriber, all held together by ROS Connector. Since Unity 3D is used for the GUI, C# script is necessary to connect it to ROS for operations using the ROS# plugin. At that point, the Unity GUI information can be connected to the ROS processes, like the incentive container and the computer vision algorithm to be tested.

Paper 3 could be improved by showing a diagram of the ORB_SLAM2 algorithm, similar to Paper 2. Currently, the reader can only see the output of the visuals once the ORB_SLAM2 visuals are applied. This would give the reader a better understanding of the inner workings of the algorithm and its role in the virtual testing system.

Cross Cutting Themes

Paper 2 also focuses on enhancing navigation for robots with SLAM and ROS Melodic. This paper bears similarity to Paper 1 due to their choice of building upon ROS with the aim of reducing development costs. However, Paper 2 centers more on the discussion of hierarchical control used to establish clear coupling and strong expansibility.

Overall, Paper 2 supports the claims from Paper 1 but with a different use case, navigating power stations. The general theme revolves around software architecture and robot control systems, unlike Paper 1's stronger focus on the product. Nevertheless, both papers share the theme of utilizing SLAM and ROS to help navigate a robot. By adding an inspection task for the robot, the researchers emphasize the ability to monitor the robot's navigation trajectory. Rather than adopting the discussion structure from Paper 1, they focus on the layers of the robot software system they designed—specifically, the user, decision, and execution layers. Figure 5 in their paper presents the layers as "application", "decision", and "implementation"; consistency in layer naming would be helpful. However, it is clear that the layers work together via nodes from ROS to handle the robot's movement.

In terms of complementary approaches, both Paper 1 and Paper 2 emphasize the use of Gmapping and the move_base functions.

The theme of robot mapping algorithms assisted by ROS continues in Paper 3, but this time from the perspective of a virtual testing system. Papers 1 and 2 analyze use cases where robots are employed in industrial settings, translating to future financial impact. However, testing robot systems is an important aspect due to its direct correlation with the robot's financial earning potential.

Synergistic Approaches

Now, we will analyze any possibilities for combining the various survey approaches into one system. In order to combine the various survey approaches from the three papers into one comprehensive system, we can consider the following steps:

Unified software architecture: Create a unified software architecture that combines the best practices from Papers 1 and 2, focusing on a hierarchical control system with clear coupling and strong expansibility. This could involve integrating the user, decision, and execution layers, as well as maintaining consistency in layer naming and organization.

Combined mapping and navigation: Leverage the strengths of both Gmapping and the move_base functions from Papers 1 and 2 for improved mapping and navigation capabilities. Utilizing SLAM and ROS as a foundation for navigation will ensure seamless integration of these components in the combined system.

Virtual testing system: Incorporate the virtual testing system from Paper 3 into the combined system to allow for efficient testing and validation of the robot's performance in various scenarios. This would enable developers to identify and address any potential issues before deploying the robot in real-world environments.

Error handling and memory management: Integrate discussions on memory handling and error management from all three papers. This could involve addressing the capacity of robot memory, error handling in corner cases, and adding checkpoints to allow the robot to return to its last location if needed.

Expanding algorithm capabilities: Investigate the possibility of integrating additional computer vision models and foundation model frameworks, such as ChatGPT, to enhance the system's overall capabilities and efficiency. This could lead to the development of new features or improvements to existing functionalities.

Scalability and adaptability: Design the combined system to be scalable and adaptable to various use cases, such as office settings (Paper 1) and power stations (Paper 2). This would involve creating a modular system that can be easily customized to meet the requirements of different industries and applications.

Holistic evaluation and validation: Develop a comprehensive evaluation and validation process for the combined system, ensuring its performance and reliability across different use cases and environments. This could involve utilizing metrics related to accuracy, efficiency, and realism, as well as incorporating user feedback and real-world testing.

By combining the various survey approaches into one cohesive system, it is possible to create a more robust and versatile solution for robot navigation and control in a wide range of applications and environments.

Strengths and Weaknesses

This section evaluates the strengths and weaknesses of the observed research, highlighting both existing strengths and areas for potential growth.

In Paper 1, the discussion is somewhat limited, with the primary focus on the methodology. The researchers provided a thorough account of the Turtlebot functions used and their roles within the robot system. Additionally, they detailed the functions' generation upon startup. However, future improvements could include a more in-depth explanation of the robot's position generation. By doing so, the authors could better evaluate the suitability of the chosen Ubuntu operating system for the robot. Given that ROS is designed to complement Linux-based operating systems, it would be intriguing to assess the robustness of ROS, as claimed in the introduction.

For Paper 2, although the paper addresses existing problems in China, its position in relation to existing work is unclear. Consequently, the novelty of this study's contribution is not strongly evident. Nonetheless, there is a trend in robot design across the first two papers, where SLAM is employed to enhance the robots. Despite the differing contexts, both cases demonstrate high accuracy and low error. Regarding the analysis design, it is notable that Paper 1 uses a predetermined coordinate, while Paper 2 opts for a predetermined track. This distinction likely arises from the unstructured office environment in Paper 1, where the robot assists office processes by interacting with employees at their desks.

Conversely, Paper 2's choice of an inspection track is influenced by the unique requirements of a power station setting. In this context, the robot replaces a human worker, and its track is crucial for monitoring performance.

Conclusion

In conclusion, all three papers offer additional opportunities to further contribute to the field of robotics.

The resulting products from these contributions can be implemented in various settings such as offices, power plants, or virtual environments. However, providing further clarification, such as detailing the process of ORB_SLAM2 in Paper 3 and justifying the rationale for a brief evaluation in Paper 2, would assist future users or researchers in promoting robot adoption.

While ROS serves as a valuable middleware, its numerous versions and sunsets can present challenges in onboarding and adapting to the latest iteration. The techniques employed varied across the papers; nonetheless, this overview of ROS, SLAM, and testing methods emphasizes the overarching usability and challenges in the field.