Auto machine learning tools to distinguish between two killer whale ecotypes

IF 1.9 3区 生物学 Q2 MARINE & FRESHWATER BIOLOGY Marine Mammal Science Pub Date : 2024-08-29 DOI:10.1111/mms.13175
Mohamed E. Ismail, Ivan D. Fedutin, Erich Hoyt, Tatiana V. Ivkovich, Olga A. Filatova
{"title":"Auto machine learning tools to distinguish between two killer whale ecotypes","authors":"Mohamed E. Ismail,&nbsp;Ivan D. Fedutin,&nbsp;Erich Hoyt,&nbsp;Tatiana V. Ivkovich,&nbsp;Olga A. Filatova","doi":"10.1111/mms.13175","DOIUrl":null,"url":null,"abstract":"<p>The killer whale, despite being considered a single species, exhibits various ecotypes (genetically and ecologically distinct populations), that focus on a specific type of prey (Ford et al., <span>1998</span>, <span>2000</span>; Pitman et al., <span>2011</span>; Pitman &amp; Ensor, <span>2003</span>; Saulitis et al., <span>2000</span>). In the northwestern Pacific, killer whales comprise two ecotypes: residents or R-type (fish-eaters) and transients, also called Bigg's killer whales, or T-type (mammal-eaters) (Filatova et al., <span>2018</span>, <span>2019</span>; Ismail et al., <span>2023</span>). These ecotypes are frequently found in the same areas, but they do not engage in social activities and are reproductively isolated (Filatova, Borisova, et al., <span>2015</span>; Foote et al., <span>2011</span>; Morin et al., <span>2010</span>). This isolation is linked to significant variations in their morphology (Baird &amp; Stacey, <span>1988</span>; Kotik et al., <span>2023</span>), ecology (Bigg, <span>1987</span>), behavior (Morton, <span>1990</span>), acoustic communication (Deecke et al., <span>2005</span>; Filatova, Fedutin, et al., <span>2015</span>; Foote &amp; Nystuen, <span>2008</span>), social structure (Baird &amp; Dill, <span>1996</span>), diet (Borisova et al., <span>2020</span>; Filatova et al., <span>2023</span>; Herman et al., <span>2005</span>), and other aspects. The genetic distinction between the ecotypes has been described both for eastern and western North Pacific (Filatova, Borisova, et al., <span>2015</span>; Hoelzel et al., <span>2007</span>; Morin et al., <span>2010</span>; Parsons et al., <span>2013</span>), but the morphological variation was studied mostly in the eastern North Pacific (Baird &amp; Stacey, <span>1988</span>; Emmons et al., <span>2019</span>; Kotik et al., <span>2023</span>; Perrin et al., <span>2009</span>). Based on these differences, a recent paper suggested to recognize them as different species (Morin et al., <span>2024</span>).</p><p>Even with these differences, Russian fisheries institutes have been refusing to recognize the existence of two separate ecotypes and the need for their separate assessment. For example, Boltnev (<span>2017</span>) claimed that ecotypes are an artifact of research methods or even a figment of the imagination of the scientists who described this phenomenon. For this reason, VNIRO (Russian Federal Research Institute of Fisheries and Oceanography) still estimates the abundance of both killer whale ecotypes as a single population. This is partly due to the fact that morphological differences between ecotypes are not immediately obvious to a non-specialist when observing whales at sea. Unfortunately, to date, there are no automated techniques capable of easily identifying these two ecotypes in photos without the time-consuming process of digitizing fin contours.</p><p>Machine learning (ML), a subfield of artificial intelligence, especially convolutional neural network (CNN), is often used as a preferred model for image processing applications. ML has proven its success in various tasks (Krizhevsky et al., <span>2017</span>), such as image classification (He et al., <span>2016</span>), image segmentation (Long et al., <span>2015</span>), and object recognition (Redmon et al., <span>2016</span>). Deep learning neural networks have been used as a tool in the photo-identification technique on various species of marine mammals, including right whales (<i>Eubalaena</i> spp.; Bogucki et al., <span>2019</span>), humpback whales (<i>Megaptera novaeangliae</i>; Cheeseman et al., <span>2023</span>; Wang et al., <span>2020</span>), common dolphins (<i>Delphinus delphis</i>; Bouma et al., <span>2018</span>), blue whales (<i>Balaenoptera musculus</i>; Ramos-Arredondo et al., <span>2020</span>), killer whales (<i>Orcinus orca</i>; Bergler et al., <span>2021</span>), and common bottlenose dolphins (<i>Tursiops truncatus</i>; Thompson et al., <span>2019</span>). However, the most common application has been individual recognition, rather than ecotype identification. Distinctive features of resident (R-type) and transient (T-type) killer whales are dorsal fin and saddle patch shape: transient killer whales have wider and more triangular dorsal fins and large closed saddle patches, while residents have more rounded fin tips and highly variable saddle patch shape (Ford et al., <span>2000</span>). Emmons et al. (<span>2019</span>) aimed to discern between eastern North Pacific ecotypes using elliptical Fourier analysis. However, this algorithm requires time-consuming image preprocessing, which makes this method impractical. On the other hand, machine learning algorithms demonstrate considerable promise for identification purposes. However, creating an efficient machine learning model using traditional methods has proven to be a formidable challenge due to the complication of the algorithms and the architecture of deep learning convolutional neural networks (CNNs; Rawat &amp; Wang, <span>2017</span>). The progression of auto machine learning (AutoML) technology has reduced this obstacle. AutoML simplifies model creation and improves accuracy (Borkowski et al., <span>2019</span>). Our study aims to use AutoML technology to differentiate between the western North Pacific killer whale ecotypes using raster images obtained through field surveys. This emphasizes the presence of two different ecotypes of killer whales in the western North Pacific Ocean. Using AutoML to detect these differences provides objective validation, ensuring observed variances are accurate and not subjective. Data were obtained in the Northwestern Pacific Ocean from 2000 to 2022 using different types of surveys. These varied from vessel-based cetacean surveys conducted along the coast of Eastern Kamchatka, the Kuril Islands, Sakhalin Island, the Commander Islands, Chukotka, and in the Okhotsk Sea, to camp-based observations with daily small boat surveys performed during the summer months in the Avacha Gulf, and off the Commander Islands and Chukotka (Figure 1).</p><p>The photographs were taken from known groups that were a part of a long-term study by the Far East Russia Orca Project (FEROP), with the ecotype of these groups determined through morphology, behavior, and observations of hunting on specific prey, and confirmed by genetic analysis (Filatova, Borisova, et al., <span>2015</span>). For more details on ecotype identification, see Filatova et al. (<span>2019</span>). The grading of the photographs for the analysis was based on their quality, evaluated through clarity, angle, and the whale's distance from the camera. This required the photographs to be focused, well-lit without shadows, and aligned with the camera's plane. Any photographs containing obstructions, such as reflections, distracting backgrounds, or water splashes that could conceal the focused part, were excluded. Although all photographs contained both the dorsal fin and saddle patch, in many photographs, only a part of the saddle patch was visible. Photographs with more than one animal in the same frame or photographs of the same animal taken within less than one second were excluded. After the selection process, photographs were cropped to center the subject using ACDSee Photo Studio Ultimate and edited in Photoshop CC to adjust brightness and contrast before being converted to grayscale to minimize color distractions. The analysis focused exclusively on photographs of adult females and “others,” explicitly excluding adult males, calves, and juveniles. The term “others” refers to individuals without the distinctive elongated dorsal fin of adult or subadult males. We did not analyze male photographs because the number of good-quality images of T-type males was too small for inclusion in the analysis. In total, 1,084 images (542 for each ecotype) were used for the analysis. Among the R-type photographs, there were 250 individuals, whereas the T-type photographs included 197 individuals.</p><p>Almost the same procedures were implemented on the Google Cloud AutoML platform, with some variations unique to the platform. The Google Cloud AutoML platform represents a service called vertex AI that lets you train and deploy ML models. Google Cloud AutoML Vertex AI is an online and paid service, but the first 3 months are offered as a free service with a USD $300 credit. The default data set split settings were accepted: 80% of the photographs from each ecotype were employed for training, 10% for the validation, whereas the other 10% were used to test the model. A model optimization with higher accuracy was selected, along with a default training node budget set at 8 hr.</p><p>Additionally, to ensure that the performance of our model was not an artifact and its ability to differentiate between the dorsal fins of both R-type and T-type ecotypes was really based on morphological differences between them, we trained another model using randomized groups. Photographs from both ecotypes were mixed, and then they were randomly divided into two separate groups (group 1 and group 2), ensuring that each group contained an equal number of photographs from both ecotypes. Then these randomized groups were also inserted into both platforms and the model was applied. The same photographs and the same numbers were used.</p><p>The model in Edge Impulse was trained using 80% (434) of the photographs for each ecotype, while the remaining 20% (108) were set aside specifically for testing. After the training stage, the model achieved an accuracy rate of 90.8% (Table 1a, Figure 2a). Upon testing, it successfully identified 91.7% of the R-type photographs and accurately recognized 94.4% of the T-type photographs. Taken together, the model attained an accuracy of 93.06%. Additionally, the model was unable to identify 5.6% of the R-type photographs and 2.8% of the T-type photographs, which were classified as uncertain. This could be attributed to several factors, such as photographs containing features that are not distinctly characteristic of any single class but instead share attributes with multiple classes, or simply due to the quality of the photographs (Table 1a, Figure 2b). The randomized groups revealed that the model faced difficulty in distinguishing differences between the two groups. It achieved an accuracy of 51.1% during the training stage and 7.48% during model testing (Table 1b, Figure 2c,d).</p><p>On Google Cloud Platform, 10% of the photographs were used as a validation set to refine and enhance the model's training performance, ensuring its readiness for the testing step. Once training was complete, the model used the test set (10%) to provide the final evaluation metrics. The model achieved an average accuracy of 98.17%, measured by the area under the precision-recall curve (AuPRC; Figure 3a). Precision represents the accuracy of the model in identifying a specific class (R-type or T-type). It ensures that a photo classified as R-type (or T-type) is indeed R-type (or T-type). Recall measures the model's ability to capture all instances of a given class without missing them. The AuPRC indicates how well the model balances precision and recall across different thresholds. When the AuPRC is high, it means that the model is effectively and precisely classifying killer whale photographs into R-type and T-type (Figure 3a,b). The confusion matrix in Figure 4 shows where misclassifications occur and how frequently the model predicts the correct class. Moreover, the randomized group in the Google Cloud Auto ML platform revealed that the model was unable to distinguish the differences between the two groups (Figures 3c,d, 4b). These outcomes confirm the precision of our model.</p><p>The accuracy of the models for both platforms, Edge Impulse and Google Cloud AutoML Vertex AI, is robust and indicates that they performed well. Despite the relatively small data set (1,084 photographs: 542 for each ecotype), that did not include adult males, calves, and juveniles, the trained models were able to identify and learn the patterns and features that differentiate R-type photographs from T-type photographs, enabling us to classify them. This study builds on the findings (Wäldchen &amp; Mäder, <span>2018</span>; Weinstein, <span>2018</span>) that machine learning provides a powerful alternative for image classification to differentiate ecotypes, species, and even subspecies. On the other hand, a study that has employed elliptical Fourier analysis to differentiate between the killer whale ecotypes achieved only 70% accuracy for dorsal fin contours and 58% accuracy for saddle patch contours (Emmons et al., <span>2019</span>). This shows the challenges faced by nonmachine learning approaches in achieving high accuracy in differentiating morphological features. Various machine-learning models were able to classify species in vector mosquitoes, even though there was significant interspecies similarity and intraspecies variation (Park et al., <span>2020</span>). It was noticed that deep learning models achieved high classification accuracy by using morphological characteristics similar to those used by human experts in the classification process. Various studies employed the same approach to distinguish between different types of organisms, including birds, insects, fish, plants, and even invertebrates (see Table 2). Using artificial intelligence (AI) to detect these differences serves as an impartial validation to help ensure that the variances observed by researchers are not just assumptions or subjective interpretations and that AI has a potential to achieve similar or higher accuracy. When choosing between Edge Impulse and Google Cloud AutoML for image classification, both platforms effectively supported the study and presented the results adequately, with minor differences that did not impact the overall outcome. On the other hand, Google Cloud AutoML and other similar platforms, such as Amazon Web Services (AWS) and Microsoft Azure, are free only within certain limits. Exceeding these limits will lead to charges, while the Edge Impulse platform is completely free, without any limits.</p><p>Many authors have suggested that R-type and T-type killer whales need to be considered as different species or subspecies (Baird &amp; Stacey, <span>1988</span>; Morin et al., <span>2010</span>, <span>2024</span>; Reeves et al., <span>2004</span>). The two ecotypes are socially and genetically isolated (Filatova, Borisova, et al., <span>2015</span>; Hoelzel et al., <span>2002</span>, <span>2007</span>; Miller et al., <span>2010</span>; Morin et al., <span>2024</span>; Riesch et al., <span>2012</span>). The present study supports the differentiation of ecotypes, confirming stable morphological differences between them.</p><p>Our results have a high practical value for killer whale management in the western North Pacific because they emphasize the existence of two separate ecotypes in the western North Pacific, which has been denied by Russian fisheries institutes. One of the arguments for this approach has been the lack of reliable morphological differences that would allow to visually distinguish the ecotypes without observing their behavior or performing genetic analysis of biopsy samples. Our work clearly demonstrates that these differences exist and can be used by machine learning tools, meaning that they are real, objective, and reliable. Our current study used only high-quality photographs due to the limitations of machine learning models in handling images of varying quality, which can affect the accuracy of classification and pattern recognition. Future studies will aim to include a wider variety of photo qualities and photographs of adult males to develop a more robust and generalizable model. This will enable inexperienced observers, such as fisheries inspectors or coastguard officers, to use pretrained neural networks to identify the ecotype of killer whales.</p><p>To conclude, this study introduces novel machine learning techniques that are quick and affordable for classifying killer whale ecotypes based on morphometric variables. Additionally, it demonstrates that artificial intelligence can easily distinguish between the Russian R-type and T-type killer whales, despite previous claims that they are indistinguishable (Boltnev, <span>2017</span>). Presenting findings like this aims to reduce the gap between the fields of conservation science and machine learning, while also inspiring the adoption of similar approaches to the study of other species.</p><p><b>Mohamed Elsayed Ismail:</b> Formal analysis; investigation; resources; visualization; writing – original draft; writing – review and editing. <b>Ivan D. Fedutin:</b> Resources. <b>Erich Hoyt:</b> Funding acquisition; writing – original draft; writing – review and editing. <b>Tatiana V. Ivkovich:</b> Resources. <b>Olga A. Filatova:</b> Conceptualization; funding acquisition; supervision; writing – original draft; writing – review and editing.</p>","PeriodicalId":18725,"journal":{"name":"Marine Mammal Science","volume":"41 1","pages":""},"PeriodicalIF":1.9000,"publicationDate":"2024-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/mms.13175","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Marine Mammal Science","FirstCategoryId":"99","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/mms.13175","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MARINE & FRESHWATER BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

The killer whale, despite being considered a single species, exhibits various ecotypes (genetically and ecologically distinct populations), that focus on a specific type of prey (Ford et al., 1998, 2000; Pitman et al., 2011; Pitman & Ensor, 2003; Saulitis et al., 2000). In the northwestern Pacific, killer whales comprise two ecotypes: residents or R-type (fish-eaters) and transients, also called Bigg's killer whales, or T-type (mammal-eaters) (Filatova et al., 2018, 2019; Ismail et al., 2023). These ecotypes are frequently found in the same areas, but they do not engage in social activities and are reproductively isolated (Filatova, Borisova, et al., 2015; Foote et al., 2011; Morin et al., 2010). This isolation is linked to significant variations in their morphology (Baird & Stacey, 1988; Kotik et al., 2023), ecology (Bigg, 1987), behavior (Morton, 1990), acoustic communication (Deecke et al., 2005; Filatova, Fedutin, et al., 2015; Foote & Nystuen, 2008), social structure (Baird & Dill, 1996), diet (Borisova et al., 2020; Filatova et al., 2023; Herman et al., 2005), and other aspects. The genetic distinction between the ecotypes has been described both for eastern and western North Pacific (Filatova, Borisova, et al., 2015; Hoelzel et al., 2007; Morin et al., 2010; Parsons et al., 2013), but the morphological variation was studied mostly in the eastern North Pacific (Baird & Stacey, 1988; Emmons et al., 2019; Kotik et al., 2023; Perrin et al., 2009). Based on these differences, a recent paper suggested to recognize them as different species (Morin et al., 2024).

Even with these differences, Russian fisheries institutes have been refusing to recognize the existence of two separate ecotypes and the need for their separate assessment. For example, Boltnev (2017) claimed that ecotypes are an artifact of research methods or even a figment of the imagination of the scientists who described this phenomenon. For this reason, VNIRO (Russian Federal Research Institute of Fisheries and Oceanography) still estimates the abundance of both killer whale ecotypes as a single population. This is partly due to the fact that morphological differences between ecotypes are not immediately obvious to a non-specialist when observing whales at sea. Unfortunately, to date, there are no automated techniques capable of easily identifying these two ecotypes in photos without the time-consuming process of digitizing fin contours.

Machine learning (ML), a subfield of artificial intelligence, especially convolutional neural network (CNN), is often used as a preferred model for image processing applications. ML has proven its success in various tasks (Krizhevsky et al., 2017), such as image classification (He et al., 2016), image segmentation (Long et al., 2015), and object recognition (Redmon et al., 2016). Deep learning neural networks have been used as a tool in the photo-identification technique on various species of marine mammals, including right whales (Eubalaena spp.; Bogucki et al., 2019), humpback whales (Megaptera novaeangliae; Cheeseman et al., 2023; Wang et al., 2020), common dolphins (Delphinus delphis; Bouma et al., 2018), blue whales (Balaenoptera musculus; Ramos-Arredondo et al., 2020), killer whales (Orcinus orca; Bergler et al., 2021), and common bottlenose dolphins (Tursiops truncatus; Thompson et al., 2019). However, the most common application has been individual recognition, rather than ecotype identification. Distinctive features of resident (R-type) and transient (T-type) killer whales are dorsal fin and saddle patch shape: transient killer whales have wider and more triangular dorsal fins and large closed saddle patches, while residents have more rounded fin tips and highly variable saddle patch shape (Ford et al., 2000). Emmons et al. (2019) aimed to discern between eastern North Pacific ecotypes using elliptical Fourier analysis. However, this algorithm requires time-consuming image preprocessing, which makes this method impractical. On the other hand, machine learning algorithms demonstrate considerable promise for identification purposes. However, creating an efficient machine learning model using traditional methods has proven to be a formidable challenge due to the complication of the algorithms and the architecture of deep learning convolutional neural networks (CNNs; Rawat & Wang, 2017). The progression of auto machine learning (AutoML) technology has reduced this obstacle. AutoML simplifies model creation and improves accuracy (Borkowski et al., 2019). Our study aims to use AutoML technology to differentiate between the western North Pacific killer whale ecotypes using raster images obtained through field surveys. This emphasizes the presence of two different ecotypes of killer whales in the western North Pacific Ocean. Using AutoML to detect these differences provides objective validation, ensuring observed variances are accurate and not subjective. Data were obtained in the Northwestern Pacific Ocean from 2000 to 2022 using different types of surveys. These varied from vessel-based cetacean surveys conducted along the coast of Eastern Kamchatka, the Kuril Islands, Sakhalin Island, the Commander Islands, Chukotka, and in the Okhotsk Sea, to camp-based observations with daily small boat surveys performed during the summer months in the Avacha Gulf, and off the Commander Islands and Chukotka (Figure 1).

The photographs were taken from known groups that were a part of a long-term study by the Far East Russia Orca Project (FEROP), with the ecotype of these groups determined through morphology, behavior, and observations of hunting on specific prey, and confirmed by genetic analysis (Filatova, Borisova, et al., 2015). For more details on ecotype identification, see Filatova et al. (2019). The grading of the photographs for the analysis was based on their quality, evaluated through clarity, angle, and the whale's distance from the camera. This required the photographs to be focused, well-lit without shadows, and aligned with the camera's plane. Any photographs containing obstructions, such as reflections, distracting backgrounds, or water splashes that could conceal the focused part, were excluded. Although all photographs contained both the dorsal fin and saddle patch, in many photographs, only a part of the saddle patch was visible. Photographs with more than one animal in the same frame or photographs of the same animal taken within less than one second were excluded. After the selection process, photographs were cropped to center the subject using ACDSee Photo Studio Ultimate and edited in Photoshop CC to adjust brightness and contrast before being converted to grayscale to minimize color distractions. The analysis focused exclusively on photographs of adult females and “others,” explicitly excluding adult males, calves, and juveniles. The term “others” refers to individuals without the distinctive elongated dorsal fin of adult or subadult males. We did not analyze male photographs because the number of good-quality images of T-type males was too small for inclusion in the analysis. In total, 1,084 images (542 for each ecotype) were used for the analysis. Among the R-type photographs, there were 250 individuals, whereas the T-type photographs included 197 individuals.

Almost the same procedures were implemented on the Google Cloud AutoML platform, with some variations unique to the platform. The Google Cloud AutoML platform represents a service called vertex AI that lets you train and deploy ML models. Google Cloud AutoML Vertex AI is an online and paid service, but the first 3 months are offered as a free service with a USD $300 credit. The default data set split settings were accepted: 80% of the photographs from each ecotype were employed for training, 10% for the validation, whereas the other 10% were used to test the model. A model optimization with higher accuracy was selected, along with a default training node budget set at 8 hr.

Additionally, to ensure that the performance of our model was not an artifact and its ability to differentiate between the dorsal fins of both R-type and T-type ecotypes was really based on morphological differences between them, we trained another model using randomized groups. Photographs from both ecotypes were mixed, and then they were randomly divided into two separate groups (group 1 and group 2), ensuring that each group contained an equal number of photographs from both ecotypes. Then these randomized groups were also inserted into both platforms and the model was applied. The same photographs and the same numbers were used.

The model in Edge Impulse was trained using 80% (434) of the photographs for each ecotype, while the remaining 20% (108) were set aside specifically for testing. After the training stage, the model achieved an accuracy rate of 90.8% (Table 1a, Figure 2a). Upon testing, it successfully identified 91.7% of the R-type photographs and accurately recognized 94.4% of the T-type photographs. Taken together, the model attained an accuracy of 93.06%. Additionally, the model was unable to identify 5.6% of the R-type photographs and 2.8% of the T-type photographs, which were classified as uncertain. This could be attributed to several factors, such as photographs containing features that are not distinctly characteristic of any single class but instead share attributes with multiple classes, or simply due to the quality of the photographs (Table 1a, Figure 2b). The randomized groups revealed that the model faced difficulty in distinguishing differences between the two groups. It achieved an accuracy of 51.1% during the training stage and 7.48% during model testing (Table 1b, Figure 2c,d).

On Google Cloud Platform, 10% of the photographs were used as a validation set to refine and enhance the model's training performance, ensuring its readiness for the testing step. Once training was complete, the model used the test set (10%) to provide the final evaluation metrics. The model achieved an average accuracy of 98.17%, measured by the area under the precision-recall curve (AuPRC; Figure 3a). Precision represents the accuracy of the model in identifying a specific class (R-type or T-type). It ensures that a photo classified as R-type (or T-type) is indeed R-type (or T-type). Recall measures the model's ability to capture all instances of a given class without missing them. The AuPRC indicates how well the model balances precision and recall across different thresholds. When the AuPRC is high, it means that the model is effectively and precisely classifying killer whale photographs into R-type and T-type (Figure 3a,b). The confusion matrix in Figure 4 shows where misclassifications occur and how frequently the model predicts the correct class. Moreover, the randomized group in the Google Cloud Auto ML platform revealed that the model was unable to distinguish the differences between the two groups (Figures 3c,d, 4b). These outcomes confirm the precision of our model.

The accuracy of the models for both platforms, Edge Impulse and Google Cloud AutoML Vertex AI, is robust and indicates that they performed well. Despite the relatively small data set (1,084 photographs: 542 for each ecotype), that did not include adult males, calves, and juveniles, the trained models were able to identify and learn the patterns and features that differentiate R-type photographs from T-type photographs, enabling us to classify them. This study builds on the findings (Wäldchen & Mäder, 2018; Weinstein, 2018) that machine learning provides a powerful alternative for image classification to differentiate ecotypes, species, and even subspecies. On the other hand, a study that has employed elliptical Fourier analysis to differentiate between the killer whale ecotypes achieved only 70% accuracy for dorsal fin contours and 58% accuracy for saddle patch contours (Emmons et al., 2019). This shows the challenges faced by nonmachine learning approaches in achieving high accuracy in differentiating morphological features. Various machine-learning models were able to classify species in vector mosquitoes, even though there was significant interspecies similarity and intraspecies variation (Park et al., 2020). It was noticed that deep learning models achieved high classification accuracy by using morphological characteristics similar to those used by human experts in the classification process. Various studies employed the same approach to distinguish between different types of organisms, including birds, insects, fish, plants, and even invertebrates (see Table 2). Using artificial intelligence (AI) to detect these differences serves as an impartial validation to help ensure that the variances observed by researchers are not just assumptions or subjective interpretations and that AI has a potential to achieve similar or higher accuracy. When choosing between Edge Impulse and Google Cloud AutoML for image classification, both platforms effectively supported the study and presented the results adequately, with minor differences that did not impact the overall outcome. On the other hand, Google Cloud AutoML and other similar platforms, such as Amazon Web Services (AWS) and Microsoft Azure, are free only within certain limits. Exceeding these limits will lead to charges, while the Edge Impulse platform is completely free, without any limits.

Many authors have suggested that R-type and T-type killer whales need to be considered as different species or subspecies (Baird & Stacey, 1988; Morin et al., 2010, 2024; Reeves et al., 2004). The two ecotypes are socially and genetically isolated (Filatova, Borisova, et al., 2015; Hoelzel et al., 2002, 2007; Miller et al., 2010; Morin et al., 2024; Riesch et al., 2012). The present study supports the differentiation of ecotypes, confirming stable morphological differences between them.

Our results have a high practical value for killer whale management in the western North Pacific because they emphasize the existence of two separate ecotypes in the western North Pacific, which has been denied by Russian fisheries institutes. One of the arguments for this approach has been the lack of reliable morphological differences that would allow to visually distinguish the ecotypes without observing their behavior or performing genetic analysis of biopsy samples. Our work clearly demonstrates that these differences exist and can be used by machine learning tools, meaning that they are real, objective, and reliable. Our current study used only high-quality photographs due to the limitations of machine learning models in handling images of varying quality, which can affect the accuracy of classification and pattern recognition. Future studies will aim to include a wider variety of photo qualities and photographs of adult males to develop a more robust and generalizable model. This will enable inexperienced observers, such as fisheries inspectors or coastguard officers, to use pretrained neural networks to identify the ecotype of killer whales.

To conclude, this study introduces novel machine learning techniques that are quick and affordable for classifying killer whale ecotypes based on morphometric variables. Additionally, it demonstrates that artificial intelligence can easily distinguish between the Russian R-type and T-type killer whales, despite previous claims that they are indistinguishable (Boltnev, 2017). Presenting findings like this aims to reduce the gap between the fields of conservation science and machine learning, while also inspiring the adoption of similar approaches to the study of other species.

Mohamed Elsayed Ismail: Formal analysis; investigation; resources; visualization; writing – original draft; writing – review and editing. Ivan D. Fedutin: Resources. Erich Hoyt: Funding acquisition; writing – original draft; writing – review and editing. Tatiana V. Ivkovich: Resources. Olga A. Filatova: Conceptualization; funding acquisition; supervision; writing – original draft; writing – review and editing.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
自动机器学习工具区分两种虎鲸生态类型
一旦训练完成,该模型使用测试集(10%)来提供最终的评估指标。该模型的平均准确率为98.17%,通过精确召回率曲线下面积(AuPRC;图3 a)。精度表示模型识别特定类别(r型或t型)的准确性。它确保了被归类为r型(或t型)的照片确实是r型(或t型)。召回度量模型捕获给定类的所有实例而不丢失它们的能力。AuPRC表示模型在不同阈值之间平衡精度和召回的程度。当AuPRC较高时,意味着模型有效且精确地将虎鲸照片分为r型和t型(图3a,b)。图4中的混淆矩阵显示了错误分类发生的位置以及模型预测正确类别的频率。此外,谷歌Cloud Auto ML平台随机分组显示,模型无法区分两组之间的差异(图3c,d, 4b)。这些结果证实了我们模型的精确性。Edge Impulse和谷歌Cloud AutoML Vertex AI这两个平台的模型精度都很稳健,表明它们表现良好。尽管数据集相对较小(1084张照片:每个生态型542张),而且不包括成年雄性、幼崽和幼崽,但经过训练的模型能够识别和学习区分r型照片和t型照片的模式和特征,使我们能够对它们进行分类。这项研究建立在研究结果的基础上(Wäldchen &amp;马德尔,2018;Weinstein, 2018),机器学习为区分生态型、物种甚至亚种的图像分类提供了强大的替代方案。另一方面,一项采用椭圆傅立叶分析来区分虎鲸生态型的研究,在背鳍轮廓上的准确率仅为70%,在鞍斑轮廓上的准确率为58%(埃蒙斯等人,2019)。这显示了非机器学习方法在实现形态学特征区分的高精度方面所面临的挑战。尽管存在显著的种间相似性和种内变异,但各种机器学习模型都能够对媒介蚊子进行物种分类(Park et al., 2020)。值得注意的是,深度学习模型通过使用与人类专家在分类过程中使用的相似的形态学特征来实现高分类精度。各种研究采用相同的方法来区分不同类型的生物,包括鸟类、昆虫、鱼类、植物,甚至无脊椎动物(见表2)。使用人工智能(AI)检测这些差异可以作为一种公正的验证,帮助确保研究人员观察到的差异不仅仅是假设或主观解释,而且人工智能有可能达到类似或更高的准确性。当选择Edge Impulse和谷歌Cloud AutoML进行图像分类时,这两个平台都有效地支持了研究并充分呈现了结果,两者之间的微小差异并不影响总体结果。另一方面,谷歌Cloud AutoML和其他类似的平台,如亚马逊网络服务(AWS)和微软Azure,仅在一定范围内是免费的。超过这些限制将导致收费,而Edge Impulse平台是完全免费的,没有任何限制。许多作者认为r型和t型虎鲸需要被视为不同的物种或亚种(Baird &amp;斯泰西,1988;Morin等,2010,2024;Reeves et al., 2004)。这两种生态型在社会和遗传上是分离的(Filatova, Borisova等,2015;Hoelzel et al., 2002, 2007;Miller et al., 2010;Morin et al., 2024;Riesch et al., 2012)。本研究支持生态型的分化,证实了生态型之间稳定的形态差异。我们的研究结果对北太平洋西部虎鲸的管理具有很高的实用价值,因为它们强调了北太平洋西部存在两种不同的生态型,这一点一直被俄罗斯渔业研究所否认。这种方法的一个论点是缺乏可靠的形态学差异,这将允许在不观察其行为或对活检样本进行遗传分析的情况下,从视觉上区分生态型。我们的工作清楚地表明,这些差异是存在的,并且可以被机器学习工具所利用,这意味着它们是真实、客观和可靠的。由于机器学习模型在处理不同质量的图像方面的局限性,我们目前的研究只使用了高质量的照片,这可能会影响分类和模式识别的准确性。未来的研究将包括更广泛的照片质量和成年男性的照片,以建立一个更强大和可推广的模型。 这将使没有经验的观察者,如渔业检查员或海岸警卫队官员,使用预训练的神经网络来识别虎鲸的生态类型。总而言之,本研究引入了新的机器学习技术,这些技术可以基于形态计量变量对虎鲸生态型进行快速和负担得起的分类。此外,它表明人工智能可以很容易地区分俄罗斯r型和t型虎鲸,尽管之前声称它们是无法区分的(Boltnev, 2017)。提出这样的发现旨在缩小保护科学和机器学习领域之间的差距,同时也鼓励采用类似的方法来研究其他物种。Mohamed Elsayed Ismail:形式分析;调查;资源;可视化;写作——原稿;写作——审阅和编辑。Ivan D. Fedutin:资源。埃里希·霍伊特:融资收购;写作——原稿;写作——审阅和编辑。塔蒂亚娜·伊夫科维奇:资源。Olga A. Filatova:概念化;资金收购;监督;写作——原稿;写作——审阅和编辑。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Marine Mammal Science
Marine Mammal Science 生物-动物学
CiteScore
4.80
自引率
8.70%
发文量
89
审稿时长
6-12 weeks
期刊介绍: Published for the Society for Marine Mammalogy, Marine Mammal Science is a source of significant new findings on marine mammals resulting from original research on their form and function, evolution, systematics, physiology, biochemistry, behavior, population biology, life history, genetics, ecology and conservation. The journal features both original and review articles, notes, opinions and letters. It serves as a vital resource for anyone studying marine mammals.
期刊最新文献
Applying the Ecological Trap Concept to Cetaceans Predicting Individual Body Length, Volume, and Mass From Dominant Stroke Cycle Frequency in Sperm Whales: Empirical Cross-Validation and Prediction Functions Predicting Individual Body Length, Volume, and Mass From Dominant Stroke Cycle Frequency in Sperm Whales: Empirical Cross-Validation and Prediction Functions Stock Identity of Stranded Tamanend's Bottlenose Dolphins (Tursiops erebennus) With Evidence of Fisheries Interaction in Virginia, North Carolina, and South Carolina, 1996–2019 Stock Identity of Stranded Tamanend's Bottlenose Dolphins (Tursiops erebennus) With Evidence of Fisheries Interaction in Virginia, North Carolina, and South Carolina, 1996–2019
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1