Machine learning analyses thousands of plant specimens, leaves nothing out

The value of machine learning has been demonstrated by scientists from the University of NSW and the Botanic Gardens of Sydney who trained and applied an artificial intelligence (AI) algorithm to measure the leaves of around 3,800 plant specimens in a matter of minutes.

The same task would take a human researcher several years to complete.

“Measuring leaves is not a complex task, but it is labour intensive, and traditional methods become impractical at scale. Machine learning approaches could dramatically accelerate the process of extracting and compiling these data,” states a paper, published in the American Journal of Botany.

The researchers focused on the relationship between leaf size and climate, using machine learning technology to analyse digital plant specimens.

“Within each genus, leaf size was positively associated with temperature and rainfall, consistent with previous observations. However, within species, the associations between leaf size and environmental variables were weaker,” the paper finds.

The researchers trained their machine learning model using high-resolution images of Ficus and Syzgium plant specimens. These were drawn from a collection of more than one million digital images created by the Botanic Gardens of Sydney which scanned plant specimens at the National Herbarium of New South Wales. 

Original2
Preserved specimen of Syzygium floribundum from the National Herbarium of New South Wales / from the Atlas of Living Australia

Associate Professor Will Cornwell, a researcher at the School of BEES and a member of UNSW Data Science Hub, says the Royal Botanic Gardens “embarked on this really gigantic effort to scan every image in their collection”.

“They got this gigantic machine from the Netherlands. And they ran every single herbarium sheet from the last 250 years through this scanning machine […] every plant group from Australia is in there.”

Together with Dr Jason Bragg at the Botanic Gardens of Sydney and UNSW researcher Brendan Wilde, Cornwell created an algorithm that could be automated to detect and measure the size of leaves. 

The researchers initially started by training their machine learning model with specimens Syzygium are generally known as lillipillies, brush cherries or satinas, and Ficus, a genus of about 850 species of woody trees, shrubs and vines.  

These species were selected partly because of their simple leaf structures, which the researchers thought would be easier for the machine to learn.

First the researchers trained the machine learning model using a relatively small set (35) of Syzgium images.

To identify a leaf in each image, the machine learning – a convolutional neural network, more commonly known as computer vision – is trained on human examples. In this case, Wilde drew outlines around lots of digital images of leaves, and fed those into the system.

The model then “learns what is a leaf, and then it’s able to find leaves very reliably on new images,” Cornwell says.

To undertake its measurements, the machine learning algorithm essentially draws an outline around each leaf, counting the pixels within the outline.

Once the model had been trained, the team then applied it to measuring leaves – leaf area, length and width – in digital images from 1227 Syzgium and 2595 Ficus specimens.

To check for accuracy, 50 images from each of the sets were validated manually by Wilde.

Cornell says even though machine learning made measuring thousands of leaves so much quicker, the accuracy and usefulness of the model’s output relies heavily on human data – from the people who collected the samples, labelled and scanned them, to Wilde’s careful work in training and verifying the machine learning model.

“That human process, the humans using machine learning [are] really, really important,” Cornwell says.

Now that the researchers have developed a leaf size model, they are hoping to work with more complicated leaf shapes and analyse more specimens in the National Herbarium of New South Wales.

They are also interested to see whether the model can work using less controlled images, like photos of plants taken by citizen scientists on various models of mobile phone and in non-standardised ways.

Please login to favourite this article.