-
BonnBot-I Plus: A Bio-diversity Aware Precise Weed Management Robotic Platform
Authors:
Alireza Ahmadi,
Michael Halstead,
Claus Smitt,
Chris McCool
Abstract:
In this article, we focus on the critical tasks of plant protection in arable farms, addressing a modern challenge in agriculture: integrating ecological considerations into the operational strategy of precision weeding robots like \bbot. This article presents the recent advancements in weed management algorithms and the real-world performance of \bbot\ at the University of Bonn's Klein-Altendorf…
▽ More
In this article, we focus on the critical tasks of plant protection in arable farms, addressing a modern challenge in agriculture: integrating ecological considerations into the operational strategy of precision weeding robots like \bbot. This article presents the recent advancements in weed management algorithms and the real-world performance of \bbot\ at the University of Bonn's Klein-Altendorf campus. We present a novel Rolling-view observation model for the BonnBot-Is weed monitoring section which leads to an average absolute weeding performance enhancement of $3.4\%$. Furthermore, for the first time, we show how precision weeding robots could consider bio-diversity-aware concerns in challenging weeding scenarios. We carried out comprehensive weeding experiments in sugar-beet fields, covering both weed-only and mixed crop-weed situations, and introduced a new dataset compatible with precision weeding. Our real-field experiments revealed that our weeding approach is capable of handling diverse weed distributions, with a minimal loss of only $11.66\%$ attributable to intervention planning and $14.7\%$ to vision system limitations highlighting required improvements of the vision system.
△ Less
Submitted 4 July, 2024; v1 submitted 15 May, 2024;
originally announced May 2024.
-
PAg-NeRF: Towards fast and efficient end-to-end panoptic 3D representations for agricultural robotics
Authors:
Claus Smitt,
Michael Halstead,
Patrick Zimmer,
Thomas Läbe,
Esra Guclu,
Cyrill Stachniss,
Chris McCool
Abstract:
Precise scene understanding is key for most robot monitoring and intervention tasks in agriculture. In this work we present PAg-NeRF which is a novel NeRF-based system that enables 3D panoptic scene understanding. Our representation is trained using an image sequence with noisy robot odometry poses and automatic panoptic predictions with inconsistent IDs between frames. Despite this noisy input, o…
▽ More
Precise scene understanding is key for most robot monitoring and intervention tasks in agriculture. In this work we present PAg-NeRF which is a novel NeRF-based system that enables 3D panoptic scene understanding. Our representation is trained using an image sequence with noisy robot odometry poses and automatic panoptic predictions with inconsistent IDs between frames. Despite this noisy input, our system is able to output scene geometry, photo-realistic renders and 3D consistent panoptic representations with consistent instance IDs. We evaluate this novel system in a very challenging horticultural scenario and in doing so demonstrate an end-to-end trainable system that can make use of noisy robot poses rather than precise poses that have to be pre-calculated. Compared to a baseline approach the peak signal to noise ratio is improved from 21.34dB to 23.37dB while the panoptic quality improves from 56.65% to 70.08%. Furthermore, our approach is faster and can be tuned to improve inference time by more than a factor of 2 while being memory efficient with approximately 12 times fewer parameters.
△ Less
Submitted 11 September, 2023;
originally announced September 2023.
-
BonnBot-I: A Precise Weed Management and Crop Monitoring Platform
Authors:
Alireza Ahmadi,
Michael Halstead,
Chris McCool
Abstract:
Cultivation and weeding are two of the primary tasks performed by farmers today. A recent challenge for weeding is the desire to reduce herbicide and pesticide treatments while maintaining crop quality and quantity. In this paper we introduce BonnBot-I a precise weed management platform which can also performs field monitoring. Driven by crop monitoring approaches which can accurately locate and c…
▽ More
Cultivation and weeding are two of the primary tasks performed by farmers today. A recent challenge for weeding is the desire to reduce herbicide and pesticide treatments while maintaining crop quality and quantity. In this paper we introduce BonnBot-I a precise weed management platform which can also performs field monitoring. Driven by crop monitoring approaches which can accurately locate and classify plants (weed and crop) we further improve their performance by fusing the platform available GNSS and wheel odometry. This improves tracking accuracy of our crop monitoring approach from a normalized average error of 8.3% to 3.5%, evaluated on a new publicly available corn dataset. We also present a novel arrangement of weeding tools mounted on linear actuators evaluated in simulated environments. We replicate weed distributions from a real field, using the results from our monitoring approach, and show the validity of our work-space division techniques which require significantly less movement (a 50% reduction) to achieve similar results. Overall, BonnBot-I is a significant step forward in precise weed management with a novel method of selectively spraying and controlling weeds in an arable field
△ Less
Submitted 24 July, 2023;
originally announced July 2023.
-
Panoptic Mapping with Fruit Completion and Pose Estimation for Horticultural Robots
Authors:
Yue Pan,
Federico Magistri,
Thomas Läbe,
Elias Marks,
Claus Smitt,
Chris McCool,
Jens Behley,
Cyrill Stachniss
Abstract:
Monitoring plants and fruits at high resolution play a key role in the future of agriculture. Accurate 3D information can pave the way to a diverse number of robotic applications in agriculture ranging from autonomous harvesting to precise yield estimation. Obtaining such 3D information is non-trivial as agricultural environments are often repetitive and cluttered, and one has to account for the p…
▽ More
Monitoring plants and fruits at high resolution play a key role in the future of agriculture. Accurate 3D information can pave the way to a diverse number of robotic applications in agriculture ranging from autonomous harvesting to precise yield estimation. Obtaining such 3D information is non-trivial as agricultural environments are often repetitive and cluttered, and one has to account for the partial observability of fruit and plants. In this paper, we address the problem of jointly estimating complete 3D shapes of fruit and their pose in a 3D multi-resolution map built by a mobile robot. To this end, we propose an online multi-resolution panoptic mapping system where regions of interest are represented with a higher resolution. We exploit data to learn a general fruit shape representation that we use at inference time together with an occlusion-aware differentiable rendering pipeline to complete partial fruit observations and estimate the 7 DoF pose of each fruit in the map. The experiments presented in this paper evaluated both in the controlled environment and in a commercial greenhouse, show that our novel algorithm yields higher completion and pose estimation accuracy than existing methods, with an improvement of 41% in completion accuracy and 52% in pose estimation accuracy while keeping a low inference time of 0.6s in average. Codes are available at: https://github.com/PRBonn/HortiMapping.
△ Less
Submitted 22 August, 2023; v1 submitted 15 March, 2023;
originally announced March 2023.
-
Panoptic One-Click Segmentation: Applied to Agricultural Data
Authors:
Patrick Zimmer,
Michael Halstead,
Chris McCool
Abstract:
In weed control, precision agriculture can help to greatly reduce the use of herbicides, resulting in both economical and ecological benefits. A key element is the ability to locate and segment all the plants from image data. Modern instance segmentation techniques can achieve this, however, training such systems requires large amounts of hand-labelled data which is expensive and laborious to obta…
▽ More
In weed control, precision agriculture can help to greatly reduce the use of herbicides, resulting in both economical and ecological benefits. A key element is the ability to locate and segment all the plants from image data. Modern instance segmentation techniques can achieve this, however, training such systems requires large amounts of hand-labelled data which is expensive and laborious to obtain. Weakly supervised training can help to greatly reduce labelling efforts and costs. We propose panoptic one-click segmentation, an efficient and accurate offline tool to produce pseudo-labels from click inputs which reduces labelling effort. Our approach jointly estimates the pixel-wise location of all N objects in the scene, compared to traditional approaches which iterate independently through all N objects; this greatly reduces training time. Using just 10% of the data to train our panoptic one-click segmentation approach yields 68.1% and 68.8% mean object intersection over union (IoU) on challenging sugar beet and corn image data respectively, providing comparable performance to traditional one-click approaches while being approximately 12 times faster to train. We demonstrate the applicability of our system by generating pseudo-labels from clicks on the remaining 90% of the data. These pseudo-labels are then used to train Mask R-CNN, in a semi-supervised manner, improving the absolute performance (of mean foreground IoU) by 9.4 and 7.9 points for sugar beet and corn data respectively. Finally, we show that our technique can recover missed clicks during annotation outlining a further benefit over traditional approaches.
△ Less
Submitted 15 March, 2023;
originally announced March 2023.
-
Explicitly incorporating spatial information to recurrent networks for agriculture
Authors:
Claus Smitt,
Michael Halstead,
Alireza Ahmadi,
Chris McCool
Abstract:
In agriculture, the majority of vision systems perform still image classification. Yet, recent work has highlighted the potential of spatial and temporal cues as a rich source of information to improve the classification performance. In this paper, we propose novel approaches to explicitly capture both spatial and temporal information to improve the classification of deep convolutional neural netw…
▽ More
In agriculture, the majority of vision systems perform still image classification. Yet, recent work has highlighted the potential of spatial and temporal cues as a rich source of information to improve the classification performance. In this paper, we propose novel approaches to explicitly capture both spatial and temporal information to improve the classification of deep convolutional neural networks. We leverage available RGB-D images and robot odometry to perform inter-frame feature map spatial registration. This information is then fused within recurrent deep learnt models, to improve their accuracy and robustness. We demonstrate that this can considerably improve the classification performance with our best performing spatial-temporal model (ST-Atte) achieving absolute performance improvements for intersection-over-union (IoU[%]) of 4.7 for crop-weed segmentation and 2.6 for fruit (sweet pepper) segmentation. Furthermore, we show that these approaches are robust to variable framerates and odometry errors, which are frequently observed in real-world applications.
△ Less
Submitted 27 June, 2022;
originally announced June 2022.
-
Towards Autonomous Visual Navigation in Arable Fields
Authors:
Alireza Ahmadi,
Michael Halstead,
Chris McCool
Abstract:
Autonomous navigation of a robot in agricultural fields is essential for every task from crop monitoring to weed management and fertilizer application. Many current approaches rely on accurate GPS, however, such technology is expensive and also prone to failure (e.g. through lack of coverage). As such, autonomous navigation through sensors that can interpret their environment (such as cameras) is…
▽ More
Autonomous navigation of a robot in agricultural fields is essential for every task from crop monitoring to weed management and fertilizer application. Many current approaches rely on accurate GPS, however, such technology is expensive and also prone to failure (e.g. through lack of coverage). As such, autonomous navigation through sensors that can interpret their environment (such as cameras) is important to achieve the goal of autonomy in agriculture. In this paper, we introduce a purely vision-based navigation scheme that is able to reliably guide the robot through row-crop fields without manual intervention. Independent of any global localization or mapping, this approach is able to accurately follow the crop-rows and switch between the rows, only using onboard cameras. With the help of a novel crop-row detection and a novel crop-row switching technique, our navigation scheme can be deployed in a wide range of fields with different canopy types in various growth stages with limited parameter tuning, creating a crop agnostic navigation approach. We have extensively evaluated our approach in three different fields under various illumination conditions using our agricultural robotic platform (BonnBot-I). For navigation, our approach is evaluated on five crop types and achieves an average navigation accuracy of 3.82cm relative to manual teleoperation.
△ Less
Submitted 1 July, 2022; v1 submitted 24 September, 2021;
originally announced September 2021.
-
Combining Local and Global Viewpoint Planning for Fruit Coverage
Authors:
Tobias Zaenker,
Chris Lehnert,
Chris McCool,
Maren Bennewitz
Abstract:
Obtaining 3D sensor data of complete plants or plant parts (e.g., the crop or fruit) is difficult due to their complex structure and a high degree of occlusion. However, especially for the estimation of the position and size of fruits, it is necessary to avoid occlusions as much as possible and acquire sensor information of the relevant parts. Global viewpoint planners exist that suggest a series…
▽ More
Obtaining 3D sensor data of complete plants or plant parts (e.g., the crop or fruit) is difficult due to their complex structure and a high degree of occlusion. However, especially for the estimation of the position and size of fruits, it is necessary to avoid occlusions as much as possible and acquire sensor information of the relevant parts. Global viewpoint planners exist that suggest a series of viewpoints to cover the regions of interest up to a certain degree, but they usually prioritize global coverage and do not emphasize the avoidance of local occlusions. On the other hand, there are approaches that aim at avoiding local occlusions, but they cannot be used in larger environments since they only reach a local maximum of coverage. In this paper, we therefore propose to combine a local, gradient-based method with global viewpoint planning to enable local occlusion avoidance while still being able to cover large areas. Our simulated experiments with a robotic arm equipped with a camera array as well as an RGB-D camera show that this combination leads to a significantly increased coverage of the regions of interest compared to just applying global coverage planning.
△ Less
Submitted 18 August, 2021;
originally announced August 2021.
-
Virtual Temporal Samples for Recurrent Neural Networks: applied to semantic segmentation in agriculture
Authors:
Alireza Ahmadi,
Michael Halstead,
Chris McCool
Abstract:
This paper explores the potential for performing temporal semantic segmentation in the context of agricultural robotics without temporally labelled data. We achieve this by proposing to generate virtual temporal samples from labelled still images. By exploiting the relatively static scene and assuming that the robot (camera) moves we are able to generate virtually labelled temporal sequences with…
▽ More
This paper explores the potential for performing temporal semantic segmentation in the context of agricultural robotics without temporally labelled data. We achieve this by proposing to generate virtual temporal samples from labelled still images. By exploiting the relatively static scene and assuming that the robot (camera) moves we are able to generate virtually labelled temporal sequences with no extra annotation effort. Normally, to train a recurrent neural network (RNN), labelled samples from a video (temporal) sequence are required which is laborious and has stymied work in this direction. By generating virtual temporal samples, we demonstrate that it is possible to train a lightweight RNN to perform semantic segmentation on two challenging agricultural datasets. Our results show that by training a temporal semantic segmenter using virtual samples we can increase the performance by an absolute amount of $4.6$ and $4.9$ on sweet pepper and sugar beet datasets, respectively. This indicates that our virtual data augmentation technique is able to accurately classify agricultural images temporally without the use of complicated synthetic data generation techniques nor with the overhead of labelling large amounts of temporal sequences.
△ Less
Submitted 5 September, 2021; v1 submitted 18 June, 2021;
originally announced June 2021.
-
Viewpoint Planning for Fruit Size and Position Estimation
Authors:
Tobias Zaenker,
Claus Smitt,
Chris McCool,
Maren Bennewitz
Abstract:
Modern agricultural applications require knowledge about the position and size of fruits on plants. However, occlusions from leaves typically make obtaining this information difficult. We present a novel viewpoint planning approach that builds up an octree of plants with labeled regions of interest (ROIs), i.e., fruits. Our method uses this octree to sample viewpoint candidates that increase the i…
▽ More
Modern agricultural applications require knowledge about the position and size of fruits on plants. However, occlusions from leaves typically make obtaining this information difficult. We present a novel viewpoint planning approach that builds up an octree of plants with labeled regions of interest (ROIs), i.e., fruits. Our method uses this octree to sample viewpoint candidates that increase the information around the fruit regions and evaluates them using a heuristic utility function that takes into account the expected information gain. Our system automatically switches between ROI targeted sampling and exploration sampling, which considers general frontier voxels, depending on the estimated utility. When the plants have been sufficiently covered with the RGB-D sensor, our system clusters the ROI voxels and estimates the position and size of the detected fruits. We evaluated our approach in simulated scenarios and compared the resulting fruit estimations with the ground truth. The results demonstrate that our combined approach outperforms a sampling method that does not explicitly consider the ROIs to generate viewpoints in terms of the number of discovered ROI cells. Furthermore, we show the real-world applicability by testing our framework on a robotic arm equipped with an RGB-D camera installed on an automated pipe-rail trolley in a capsicum glasshouse.
△ Less
Submitted 18 August, 2021; v1 submitted 31 October, 2020;
originally announced November 2020.
-
PATHoBot: A Robot for Glasshouse Crop Phenotyping and Intervention
Authors:
Claus Smitt,
Michael Halstead,
Tobias Zaenker,
Maren Bennewitz,
Chris McCool
Abstract:
We present PATHoBot an autonomous crop surveying and intervention robot for glasshouse environments. The aim of this platform is to autonomously gather high quality data and also estimate key phenotypic parameters. To achieve this we retro-fit an off-the-shelf pipe-rail trolley with an array of multi-modal cameras, navigation sensors and a robotic arm for close surveying tasks and intervention. In…
▽ More
We present PATHoBot an autonomous crop surveying and intervention robot for glasshouse environments. The aim of this platform is to autonomously gather high quality data and also estimate key phenotypic parameters. To achieve this we retro-fit an off-the-shelf pipe-rail trolley with an array of multi-modal cameras, navigation sensors and a robotic arm for close surveying tasks and intervention. In this paper we describe PATHoBot design choices made to ensure proper operation in a commercial glasshouse environment. As a surveying platform we collect a number of datasets which include both sweet pepper and tomatoes. We show how PATHoBot enables novel surveillance approaches by first improving our previous work on fruit counting by incorporating wheel odometry and depth information. We find that by introducing re-projection and depth information we are able to achieve an absolute improvement of 20 points over the baseline technique in an "in the wild" situation. Finally, we present a 3D mapping case study, further showcasing PATHoBot's crop surveying capabilities.
△ Less
Submitted 26 March, 2021; v1 submitted 30 October, 2020;
originally announced October 2020.
-
A Sweet Pepper Harvesting Robot for Protected Cropping Environments
Authors:
Chris Lehnert,
Chris McCool,
Inkyu Sa,
Tristan Perez
Abstract:
Using robots to harvest sweet peppers in protected cropping environments has remained unsolved despite considerable effort by the research community over several decades. In this paper, we present the robotic harvester, Harvey, designed for sweet peppers in protected cropping environments that achieved a 76.5% success rate (within a modified scenario) which improves upon our prior work which achie…
▽ More
Using robots to harvest sweet peppers in protected cropping environments has remained unsolved despite considerable effort by the research community over several decades. In this paper, we present the robotic harvester, Harvey, designed for sweet peppers in protected cropping environments that achieved a 76.5% success rate (within a modified scenario) which improves upon our prior work which achieved 58% and related sweet pepper harvesting work which achieved 33\%. This improvement was primarily achieved through the introduction of a novel peduncle segmentation system using an efficient deep convolutional neural network, in conjunction with 3D post-filtering to detect the critical cutting location. We benchmark the peduncle segmentation against prior art demonstrating a considerable improvement in performance with an F_1 score of 0.564 compared to 0.302. The robotic harvester uses a perception pipeline to detect a target sweet pepper and an appropriate grasp and cutting pose used to determine the trajectory of a multi-modal harvesting tool to grasp the sweet pepper and cut it from the plant. A novel decoupling mechanism enables the gripping and cutting operations to be performed independently. We perform an in-depth analysis of the full robotic harvesting system to highlight bottlenecks and failure points that future work could address.
△ Less
Submitted 28 October, 2018;
originally announced October 2018.
-
3D Move to See: Multi-perspective visual servoing for improving object views with semantic segmentation
Authors:
Chris Lehnert,
Dorian Tsai,
Anders Eriksson,
Chris McCool
Abstract:
In this paper, we present a new approach to visual servoing for robotics, referred to as 3D Move to See (3DMTS), based on the principle of finding the next best view using a 3D camera array and a robotic manipulator to obtain multiple samples of the scene from different perspectives. The method uses semantic vision and an objective function applied to each perspective to sample a gradient represen…
▽ More
In this paper, we present a new approach to visual servoing for robotics, referred to as 3D Move to See (3DMTS), based on the principle of finding the next best view using a 3D camera array and a robotic manipulator to obtain multiple samples of the scene from different perspectives. The method uses semantic vision and an objective function applied to each perspective to sample a gradient representing the direction of the next best view. The method is demonstrated within simulation and on a real robotic platform containing a custom 3D camera array for the challenging scenario of robotic harvesting in a highly occluded and unstructured environment. It was shown on a real robotic platform that by moving the end effector using the gradient of an objective function leads to a locally optimal view of the object of interest, even amongst occlusions. The overall performance of the 3DMTS method obtained a mean increase in target size by 29.3% compared to a baseline method using a single RGB-D camera, which obtained 9.17%. The results demonstrate qualitatively and quantitatively that the 3DMTS method performed better in most scenarios, and yielded three times the target size compared to the baseline method. The increased target size in the final view will improve the detection of key features of the object of interest for further manipulation, such as grasping and harvesting.
△ Less
Submitted 20 September, 2018;
originally announced September 2018.
-
A Rapidly Deployable Classification System using Visual Data for the Application of Precision Weed Management
Authors:
David Hall,
Feras Dayoub,
Tristan Perez,
Chris McCool
Abstract:
In this work we demonstrate a rapidly deployable weed classification system that uses visual data to enable autonomous precision weeding without making prior assumptions about which weed species are present in a given field. Previous work in this area relies on having prior knowledge of the weed species present in the field. This assumption cannot always hold true for every field, and thus limits…
▽ More
In this work we demonstrate a rapidly deployable weed classification system that uses visual data to enable autonomous precision weeding without making prior assumptions about which weed species are present in a given field. Previous work in this area relies on having prior knowledge of the weed species present in the field. This assumption cannot always hold true for every field, and thus limits the use of weed classification systems based on this assumption. In this work, we obviate this assumption and introduce a rapidly deployable approach able to operate on any field without any weed species assumptions prior to deployment. We present a three stage pipeline for the implementation of our weed classification system consisting of initial field surveillance, offline processing and selective labelling, and automated precision weeding. The key characteristic of our approach is the combination of plant clustering and selective labelling which is what enables our system to operate without prior weed species knowledge. Testing using field data we are able to label 12.3 times fewer images than traditional full labelling whilst reducing classification accuracy by only 14%.
△ Less
Submitted 26 April, 2018; v1 submitted 25 January, 2018;
originally announced January 2018.
-
Fruit Quantity and Quality Estimation using a Robotic Vision System
Authors:
M. Halstead,
C. McCool,
S. Denman,
T. Perez,
C. Fookes
Abstract:
Accurate localisation of crop remains highly challenging in unstructured environments such as farms. Many of the developed systems still rely on the use of hand selected features for crop identification and often neglect the estimation of crop quantity and quality, which is key to assigning labor during farming processes. To alleviate these limitations we present a robotic vision system that can a…
▽ More
Accurate localisation of crop remains highly challenging in unstructured environments such as farms. Many of the developed systems still rely on the use of hand selected features for crop identification and often neglect the estimation of crop quantity and quality, which is key to assigning labor during farming processes. To alleviate these limitations we present a robotic vision system that can accurately estimate the quantity and quality of sweet pepper (Capsicum annuum L), a key horticultural crop. This system consists of three parts: detection, quality estimation, and tracking. Efficient detection is achieved using the FasterRCNN framework. Quality is then estimated in the same framework by learning a parallel layer which we show experimentally results in superior performance than treating quality as extra classes in the traditional Faster-RCNN framework. Evaluation of these two techniques outlines the improved performance of the parallel layer, where we achieve an F1 score of 77.3 for the parallel technique yet only 72.5 for the best scoring (red) of the multi-class implementation. To track the crop we present a tracking via detection approach, which uses the FasterRCNN with parallel layers, that is also a vision-only solution. This approach is cheap to implement as it only requires a camera and in experiments across 2 days we show that our proposed system can accurately estimate the number of sweet pepper present, within 4.1% of the ground truth.
△ Less
Submitted 17 January, 2018;
originally announced January 2018.
-
In-Field Peduncle Detection of Sweet Peppers for Robotic Harvesting: a comparative study
Authors:
Chris Lehnert,
Chris McCool,
Tristan Perez
Abstract:
Robotic harvesting of crops has the potential to disrupt current agricultural practices. A key element to enabling robotic harvesting is to safely remove the crop from the plant which often involves locating and cutting the peduncle, the part of the crop that attaches it to the main stem of the plant.
In this paper we present a comparative study of two methods for performing peduncle detection.…
▽ More
Robotic harvesting of crops has the potential to disrupt current agricultural practices. A key element to enabling robotic harvesting is to safely remove the crop from the plant which often involves locating and cutting the peduncle, the part of the crop that attaches it to the main stem of the plant.
In this paper we present a comparative study of two methods for performing peduncle detection. The first method is based on classic colour and geometric features obtained from the scene with a support vector machine classifier, referred to as PFH-SVM. The second method is an efficient deep neural network approach, MiniInception, that is able to be deployed on a robotic platform. In both cases we employ a secondary filtering process that enforces reasonable assumptions about the crop structure, such as the proximity of the peduncle to the crop. Our tests are conducted on Harvey, a sweet pepper harvesting robot, and is evaluated in a greenhouse using two varieties of sweet pepper, Ducati and Mercuno. We demonstrate that the MiniInception method achieves impressive accuracy and considerably outperforms the PFH-SVM approach achieving an F1 score of 0.564 and 0.302 respectively.
△ Less
Submitted 29 September, 2017;
originally announced September 2017.
-
Lessons Learnt from Field Trials of a Robotic Sweet Pepper Harvester
Authors:
Christopher Lehnert,
Christopher McCool,
Tristan Perez
Abstract:
In this paper, we present the lessons learnt during the development of a new robotic harvester (Harvey) that can autonomously harvest sweet pepper (capsicum) in protected cropping environments. Robotic harvesting offers an attractive potential solution to reducing labour costs while enabling more regular and selective harvesting, optimising crop quality, scheduling and therefore profit. Our approa…
▽ More
In this paper, we present the lessons learnt during the development of a new robotic harvester (Harvey) that can autonomously harvest sweet pepper (capsicum) in protected cropping environments. Robotic harvesting offers an attractive potential solution to reducing labour costs while enabling more regular and selective harvesting, optimising crop quality, scheduling and therefore profit. Our approach combines effective vision algorithms with a novel end-effector design to enable successful harvesting of sweet peppers. We demonstrate a simple and effective vision-based algorithm for crop detection, a grasp selection method, and a novel end-effector design for harvesting. To reduce the complexity of motion planning and to minimise occlusions we focus on picking sweet peppers in a protected cropping environment where plants are grown on planar trellis structures. Initial field trials in protected cropping environments, with two cultivars, demonstrate the efficacy of this approach. The results show that the robot harvester can successfully detect, grasp, and detach crop from the plant within a real protected cropping system. The novel contributions of this work have resulted in significant and encouraging improvements in sweet pepper picking success rates compared with the state-of-the-art. Future work will look at detecting sweet pepper peduncles and improving the total harvesting cycle time for each sweet pepper. The methods presented in this paper provide steps towards the goal of fully autonomous and reliable crop picking systems that will revolutionise the horticulture industry by reducing labour costs, maximising the quality of produce, and ultimately improving the sustainability of farming enterprises.
△ Less
Submitted 19 June, 2017;
originally announced June 2017.
-
Autonomous Sweet Pepper Harvesting for Protected Cropping Systems
Authors:
Chris Lehnert,
Andrew English,
Chris McCool,
Adam Tow,
Tristan Perez
Abstract:
In this letter, we present a new robotic harvester (Harvey) that can autonomously harvest sweet pepper in protected cropping environments. Our approach combines effective vision algorithms with a novel end-effector design to enable successful harvesting of sweet peppers. Initial field trials in protected cropping environments, with two cultivar, demonstrate the efficacy of this approach achieving…
▽ More
In this letter, we present a new robotic harvester (Harvey) that can autonomously harvest sweet pepper in protected cropping environments. Our approach combines effective vision algorithms with a novel end-effector design to enable successful harvesting of sweet peppers. Initial field trials in protected cropping environments, with two cultivar, demonstrate the efficacy of this approach achieving a 46% success rate for unmodified crop, and 58% for modified crop. Furthermore, for the more favourable cultivar we were also able to detach 90% of sweet peppers, indicating that improvements in the grasping success rate would result in greatly improved harvesting performance.
△ Less
Submitted 6 June, 2017;
originally announced June 2017.
-
Towards Unsupervised Weed Scouting for Agricultural Robotics
Authors:
David Hall,
Feras Dayoub,
Jason Kulk,
Chris McCool
Abstract:
Weed scouting is an important part of modern integrated weed management but can be time consuming and sparse when performed manually. Automated weed scouting and weed destruction has typically been performed using classification systems able to classify a set group of species known a priori. This greatly limits deployability as classification systems must be retrained for any field with a differen…
▽ More
Weed scouting is an important part of modern integrated weed management but can be time consuming and sparse when performed manually. Automated weed scouting and weed destruction has typically been performed using classification systems able to classify a set group of species known a priori. This greatly limits deployability as classification systems must be retrained for any field with a different set of weed species present within them. In order to overcome this limitation, this paper works towards developing a clustering approach to weed scouting which can be utilized in any field without the need for prior species knowledge. We demonstrate our system using challenging data collected in the field from an agricultural robotics platform. We show that considerable improvements can be made by (i) learning low-dimensional (bottleneck) features using a deep convolutional neural network to represent plants in general and (ii) tying views of the same area (plant) together. Deploying this algorithm on in-field data collected by AgBotII, we are able to successfully cluster cotton plants from grasses without prior knowledge or training for the specific plants in the field.
△ Less
Submitted 25 February, 2017; v1 submitted 4 February, 2017;
originally announced February 2017.
-
Peduncle Detection of Sweet Pepper for Autonomous Crop Harvesting - Combined Colour and 3D Information
Authors:
Inkyu Sa,
Chris Lehnert,
Andrew English,
Chris McCool,
Feras Dayoub,
Ben Upcroft,
Tristan Perez
Abstract:
This paper presents a 3D visual detection method for the challenging task of detecting peduncles of sweet peppers (Capsicum annuum) in the field. Cutting the peduncle cleanly is one of the most difficult stages of the harvesting process, where the peduncle is the part of the crop that attaches it to the main stem of the plant. Accurate peduncle detection in 3D space is therefore a vital step in re…
▽ More
This paper presents a 3D visual detection method for the challenging task of detecting peduncles of sweet peppers (Capsicum annuum) in the field. Cutting the peduncle cleanly is one of the most difficult stages of the harvesting process, where the peduncle is the part of the crop that attaches it to the main stem of the plant. Accurate peduncle detection in 3D space is therefore a vital step in reliable autonomous harvesting of sweet peppers, as this can lead to precise cutting while avoiding damage to the surrounding plant. This paper makes use of both colour and geometry information acquired from an RGB-D sensor and utilises a supervised-learning approach for the peduncle detection task. The performance of the proposed method is demonstrated and evaluated using qualitative and quantitative results (the Area-Under-the-Curve (AUC) of the detection precision-recall curve). We are able to achieve an AUC of 0.71 for peduncle detection on field-grown sweet peppers. We release a set of manually annotated 3D sweet pepper and peduncle images to assist the research community in performing further research on this topic.
△ Less
Submitted 30 January, 2017;
originally announced January 2017.
-
The ACRV Picking Benchmark (APB): A Robotic Shelf Picking Benchmark to Foster Reproducible Research
Authors:
Jürgen Leitner,
Adam W. Tow,
Jake E. Dean,
Niko Suenderhauf,
Joseph W. Durham,
Matthew Cooper,
Markus Eich,
Christopher Lehnert,
Ruben Mangels,
Christopher McCool,
Peter Kujala,
Lachlan Nicholson,
Trung Pham,
James Sergeant,
Liao Wu,
Fangyi Zhang,
Ben Upcroft,
Peter Corke
Abstract:
Robotic challenges like the Amazon Picking Challenge (APC) or the DARPA Challenges are an established and important way to drive scientific progress. They make research comparable on a well-defined benchmark with equal test conditions for all participants. However, such challenge events occur only occasionally, are limited to a small number of contestants, and the test conditions are very difficul…
▽ More
Robotic challenges like the Amazon Picking Challenge (APC) or the DARPA Challenges are an established and important way to drive scientific progress. They make research comparable on a well-defined benchmark with equal test conditions for all participants. However, such challenge events occur only occasionally, are limited to a small number of contestants, and the test conditions are very difficult to replicate after the main event. We present a new physical benchmark challenge for robotic picking: the ACRV Picking Benchmark (APB). Designed to be reproducible, it consists of a set of 42 common objects, a widely available shelf, and exact guidelines for object arrangement using stencils. A well-defined evaluation protocol enables the comparison of \emph{complete} robotic systems -- including perception and manipulation -- instead of sub-systems only. Our paper also describes and reports results achieved by an open baseline system based on a Baxter robot.
△ Less
Submitted 14 December, 2016; v1 submitted 16 September, 2016;
originally announced September 2016.
-
Exploiting Temporal Information for DCNN-based Fine-Grained Object Classification
Authors:
ZongYuan Ge,
Chris McCool,
Conrad Sanderson,
Peng Wang,
Lingqiao Liu,
Ian Reid,
Peter Corke
Abstract:
Fine-grained classification is a relatively new field that has concentrated on using information from a single image, while ignoring the enormous potential of using video data to improve classification. In this work we present the novel task of video-based fine-grained object classification, propose a corresponding new video dataset, and perform a systematic study of several recent deep convolutio…
▽ More
Fine-grained classification is a relatively new field that has concentrated on using information from a single image, while ignoring the enormous potential of using video data to improve classification. In this work we present the novel task of video-based fine-grained object classification, propose a corresponding new video dataset, and perform a systematic study of several recent deep convolutional neural network (DCNN) based approaches, which we specifically adapt to the task. We evaluate three-dimensional DCNNs, two-stream DCNNs, and bilinear DCNNs. Two forms of the two-stream approach are used, where spatial and temporal data from two independent DCNNs are fused either via early fusion (combination of the fully-connected layers) and late fusion (concatenation of the softmax outputs of the DCNNs). For bilinear DCNNs, information from the convolutional layers of the spatial and temporal DCNNs is combined via local co-occurrences. We then fuse the bilinear DCNN and early fusion of the two-stream approach to combine the spatial and temporal information at the local and global level (Spatio-Temporal Co-occurrence). Using the new and challenging video dataset of birds, classification performance is improved from 23.1% (using single images) to 41.1% when using the Spatio-Temporal Co-occurrence system. Incorporating automatically detected bounding box location further improves the classification accuracy to 53.6%.
△ Less
Submitted 24 October, 2016; v1 submitted 1 August, 2016;
originally announced August 2016.
-
Joint Recognition and Segmentation of Actions via Probabilistic Integration of Spatio-Temporal Fisher Vectors
Authors:
Johanna Carvajal,
Chris McCool,
Brian Lovell,
Conrad Sanderson
Abstract:
We propose a hierarchical approach to multi-action recognition that performs joint classification and segmentation. A given video (containing several consecutive actions) is processed via a sequence of overlapping temporal windows. Each frame in a temporal window is represented through selective low-level spatio-temporal features which efficiently capture relevant local dynamics. Features from eac…
▽ More
We propose a hierarchical approach to multi-action recognition that performs joint classification and segmentation. A given video (containing several consecutive actions) is processed via a sequence of overlapping temporal windows. Each frame in a temporal window is represented through selective low-level spatio-temporal features which efficiently capture relevant local dynamics. Features from each window are represented as a Fisher vector, which captures first and second order statistics. Instead of directly classifying each Fisher vector, it is converted into a vector of class probabilities. The final classification decision for each frame is then obtained by integrating the class probabilities at the frame level, which exploits the overlapping of the temporal windows. Experiments were performed on two datasets: s-KTH (a stitched version of the KTH dataset to simulate multi-actions), and the challenging CMU-MMAC dataset. On s-KTH, the proposed approach achieves an accuracy of 85.0%, significantly outperforming two recent approaches based on GMMs and HMMs which obtained 78.3% and 71.2%, respectively. On CMU-MMAC, the proposed approach achieves an accuracy of 40.9%, outperforming the GMM and HMM approaches which obtained 33.7% and 38.4%, respectively. Furthermore, the proposed system is on average 40 times faster than the GMM based approach.
△ Less
Submitted 4 October, 2016; v1 submitted 4 February, 2016;
originally announced February 2016.
-
Comparative Evaluation of Action Recognition Methods via Riemannian Manifolds, Fisher Vectors and GMMs: Ideal and Challenging Conditions
Authors:
Johanna Carvajal,
Arnold Wiliem,
Chris McCool,
Brian Lovell,
Conrad Sanderson
Abstract:
We present a comparative evaluation of various techniques for action recognition while keeping as many variables as possible controlled. We employ two categories of Riemannian manifolds: symmetric positive definite matrices and linear subspaces. For both categories we use their corresponding nearest neighbour classifiers, kernels, and recent kernelised sparse representations. We compare against tr…
▽ More
We present a comparative evaluation of various techniques for action recognition while keeping as many variables as possible controlled. We employ two categories of Riemannian manifolds: symmetric positive definite matrices and linear subspaces. For both categories we use their corresponding nearest neighbour classifiers, kernels, and recent kernelised sparse representations. We compare against traditional action recognition techniques based on Gaussian mixture models and Fisher vectors (FVs). We evaluate these action recognition techniques under ideal conditions, as well as their sensitivity in more challenging conditions (variations in scale and translation). Despite recent advancements for handling manifolds, manifold based techniques obtain the lowest performance and their kernel representations are more unstable in the presence of challenging conditions. The FV approach obtains the highest accuracy under ideal conditions. Moreover, FV best deals with moderate scale and translation changes.
△ Less
Submitted 4 October, 2016; v1 submitted 4 February, 2016;
originally announced February 2016.
-
Fine-Grained Classification via Mixture of Deep Convolutional Neural Networks
Authors:
ZongYuan Ge,
Alex Bewley,
Christopher McCool,
Ben Upcroft,
Peter Corke,
Conrad Sanderson
Abstract:
We present a novel deep convolutional neural network (DCNN) system for fine-grained image classification, called a mixture of DCNNs (MixDCNN). The fine-grained image classification problem is characterised by large intra-class variations and small inter-class variations. To overcome these problems our proposed MixDCNN system partitions images into K subsets of similar images and learns an expert D…
▽ More
We present a novel deep convolutional neural network (DCNN) system for fine-grained image classification, called a mixture of DCNNs (MixDCNN). The fine-grained image classification problem is characterised by large intra-class variations and small inter-class variations. To overcome these problems our proposed MixDCNN system partitions images into K subsets of similar images and learns an expert DCNN for each subset. The output from each of the K DCNNs is combined to form a single classification decision. In contrast to previous techniques, we provide a formulation to perform joint end-to-end training of the K DCNNs simultaneously. Extensive experiments, on three datasets using two network structures (AlexNet and GoogLeNet), show that the proposed MixDCNN system consistently outperforms other methods. It provides a relative improvement of 12.7% and achieves state-of-the-art results on two datasets.
△ Less
Submitted 30 November, 2015;
originally announced November 2015.
-
Subset Feature Learning for Fine-Grained Category Classification
Authors:
Zongyuan Ge,
Christopher Mccool,
Conrad Sanderson,
Peter Corke
Abstract:
Fine-grained categorisation has been a challenging problem due to small inter-class variation, large intra-class variation and low number of training images. We propose a learning system which first clusters visually similar classes and then learns deep convolutional neural network features specific to each subset. Experiments on the popular fine-grained Caltech-UCSD bird dataset show that the pro…
▽ More
Fine-grained categorisation has been a challenging problem due to small inter-class variation, large intra-class variation and low number of training images. We propose a learning system which first clusters visually similar classes and then learns deep convolutional neural network features specific to each subset. Experiments on the popular fine-grained Caltech-UCSD bird dataset show that the proposed method outperforms recent fine-grained categorisation methods under the most difficult setting: no bounding boxes are presented at test time. It achieves a mean accuracy of 77.5%, compared to the previous best performance of 73.2%. We also show that progressive transfer learning allows us to first learn domain-generic features (for bird classification) which can then be adapted to specific set of bird classes, yielding improvements in accuracy.
△ Less
Submitted 9 May, 2015;
originally announced May 2015.
-
Modelling Local Deep Convolutional Neural Network Features to Improve Fine-Grained Image Classification
Authors:
ZongYuan Ge,
Chris McCool,
Conrad Sanderson,
Peter Corke
Abstract:
We propose a local modelling approach using deep convolutional neural networks (CNNs) for fine-grained image classification. Recently, deep CNNs trained from large datasets have considerably improved the performance of object recognition. However, to date there has been limited work using these deep CNNs as local feature extractors. This partly stems from CNNs having internal representations which…
▽ More
We propose a local modelling approach using deep convolutional neural networks (CNNs) for fine-grained image classification. Recently, deep CNNs trained from large datasets have considerably improved the performance of object recognition. However, to date there has been limited work using these deep CNNs as local feature extractors. This partly stems from CNNs having internal representations which are high dimensional, thereby making such representations difficult to model using stochastic models. To overcome this issue, we propose to reduce the dimensionality of one of the internal fully connected layers, in conjunction with layer-restricted retraining to avoid retraining the entire network. The distribution of low-dimensional features obtained from the modified layer is then modelled using a Gaussian mixture model. Comparative experiments show that considerable performance improvements can be achieved on the challenging Fish and UEC FOOD-100 datasets.
△ Less
Submitted 26 February, 2015;
originally announced February 2015.
-
Multi-Action Recognition via Stochastic Modelling of Optical Flow and Gradients
Authors:
Johanna Carvajal,
Conrad Sanderson,
Chris McCool,
Brian C. Lovell
Abstract:
In this paper we propose a novel approach to multi-action recognition that performs joint segmentation and classification. This approach models each action using a Gaussian mixture using robust low-dimensional action features. Segmentation is achieved by performing classification on overlapping temporal windows, which are then merged to produce the final result. This approach is considerably less…
▽ More
In this paper we propose a novel approach to multi-action recognition that performs joint segmentation and classification. This approach models each action using a Gaussian mixture using robust low-dimensional action features. Segmentation is achieved by performing classification on overlapping temporal windows, which are then merged to produce the final result. This approach is considerably less complicated than previous methods which use dynamic programming or computationally expensive hidden Markov models (HMMs). Initial experiments on a stitched version of the KTH dataset show that the proposed approach achieves an accuracy of 78.3%, outperforming a recent HMM-based approach which obtained 71.2%.
△ Less
Submitted 5 February, 2015;
originally announced February 2015.
-
Bags of Affine Subspaces for Robust Object Tracking
Authors:
Sareh Shirazi,
Conrad Sanderson,
Chris McCool,
Mehrtash T. Harandi
Abstract:
We propose an adaptive tracking algorithm where the object is modelled as a continuously updated bag of affine subspaces, with each subspace constructed from the object's appearance over several consecutive frames. In contrast to linear subspaces, affine subspaces explicitly model the origin of subspaces. Furthermore, instead of using a brittle point-to-subspace distance during the search for the…
▽ More
We propose an adaptive tracking algorithm where the object is modelled as a continuously updated bag of affine subspaces, with each subspace constructed from the object's appearance over several consecutive frames. In contrast to linear subspaces, affine subspaces explicitly model the origin of subspaces. Furthermore, instead of using a brittle point-to-subspace distance during the search for the object in a new frame, we propose to use a subspace-to-subspace distance by representing candidate image areas also as affine subspaces. Distances between subspaces are then obtained by exploiting the non-Euclidean geometry of Grassmann manifolds. Experiments on challenging videos (containing object occlusions, deformations, as well as variations in pose and illumination) indicate that the proposed method achieves higher tracking accuracy than several recent discriminative trackers.
△ Less
Submitted 5 February, 2016; v1 submitted 11 August, 2014;
originally announced August 2014.
-
Summarisation of Short-Term and Long-Term Videos using Texture and Colour
Authors:
Johanna Carvajal,
Chris McCool,
Conrad Sanderson
Abstract:
We present a novel approach to video summarisation that makes use of a Bag-of-visual-Textures (BoT) approach. Two systems are proposed, one based solely on the BoT approach and another which exploits both colour information and BoT features. On 50 short-term videos from the Open Video Project we show that our BoT and fusion systems both achieve state-of-the-art performance, obtaining an average F-…
▽ More
We present a novel approach to video summarisation that makes use of a Bag-of-visual-Textures (BoT) approach. Two systems are proposed, one based solely on the BoT approach and another which exploits both colour information and BoT features. On 50 short-term videos from the Open Video Project we show that our BoT and fusion systems both achieve state-of-the-art performance, obtaining an average F-measure of 0.83 and 0.86 respectively, a relative improvement of 9% and 13% when compared to the previous state-of-the-art. When applied to a new underwater surveillance dataset containing 33 long-term videos, the proposed system reduces the amount of footage by a factor of 27, with only minor degradation in the information content. This order of magnitude reduction in video data represents significant savings in terms of time and potential labour cost when manually reviewing such footage.
△ Less
Submitted 3 March, 2014;
originally announced March 2014.