农业与深度学习：改善土壤和农作物产量

由柏拉图重新发布

关注： 0

介绍

对于许多印度人来说，农业不仅仅是一份工作；这是一种生活方式。这是他们维持生计并为印度经济做出巨大贡献的手段。确定粘土、沙子和淤泥颗粒各自比例的土壤类型对于选择合适的作物和识别杂草的生长非常重要。发现深度学习在农业中的潜力。了解土壤类型和杂草检测对印度的重要性。

深入学习是一项在各个领域都有帮助的新兴技术。深度学习已广泛应用于各种规模的智能农业，包括田间监测、田间操作、机器人技术、预测土壤、水、气候条件以及景观级土地和作物类型监测。我们可以将土壤照片输入深度学习架构，引导其学习检测特征，然后使用深度学习架构对土壤进行分类。

在本博客中，我们将讨论土壤在农业中的重要性。我们将使用机器学习和深度学习模型对土壤进行分类。

学习目标

您将了解土壤在农业中的重要性。
您将了解机器学习算法如何对土壤类型进行分类。
您将在农业中实施深度学习模型来对土壤类型进行分类。
探索多堆栈集成学习的概念，以提高我们预测的准确性。

这篇文章是作为数据科学博客马拉松。

土壤在农业中的作用

植物和动物排出的有机物、矿物质、气体、液体和其他物质形成了重要的土壤，是农业的基础。农业的基础在于来自植物和动物的气体、矿物质、有机物和其他物质，形成土壤系统。

印度经济纯粹依赖农业；土壤对农作物很重要，由于其肥力，土壤会导致杂草的生长。

水分和温度是影响土壤中孔隙和颗粒形成的物理变量，影响根系生长、水渗透和植物出苗速度。

但土壤主要含有沙子和粘土颗粒。在勘探现场普遍存在的土壤颗粒中，粘土含量丰富。粘土颗粒在表面的可用性是由于提供了丰富的营养。泥炭和壤土几乎不存在。粘土型土壤之间宽敞，水分被保留在其中。

数据集

卡格尔链接

特征提取是构建良好深度学习模型的主要步骤之一。确定构建机器学习算法可能必需的特征非常重要。我们将使用 马哈塔斯 用于提取 Haralick 特征的库，这些特征具有图像的空间和纹理信息。

我们将使用 skimage 库将图像转换为灰度并提取对目标检测有用的梯度直方图（HOG）特征。最后，我们将特征值连接到一个数组中，然后将它们用于机器学习和深度学习算法。

import mahotas as mh
from skimage import color, feature, io
import numpy as np

# Function to extract features from an image
def extract_features(image_path):
    img = io.imread(image_path)
    gray_img = color.rgb2gray(img)  # Converting image to grayscale
    
    # Converting the grayscale image to integer type
    gray_img_int = (gray_img * 255).astype(np.uint8)
    
    # Extracting Haralick features using mahotas
    haralick_features = mh.features.haralick(gray_img_int).mean(axis=0)
    
    # Extracting Histogram of Gradients (HOG) features
    hog_features, _ = feature.hog(gray_img, visualize=True)
    
    # Printing the first few elements of each feature array
    print("Haralick Features:", haralick_features[:5])
    print("HOG Features:", hog_features[:5])
    
    # Concatenating the features into a single array
    all_features = np.concatenate((haralick_features, hog_features))
    
    return all_features

image_path = '/kaggle/input/soil-classification-dataset/Soil-Dataset/Yellow Soil/20.jpg'
features = extract_features(image_path)
print("Extracted Features:", features)

土壤分类中的机器学习算法

现在，让我们使用从 Kaggle 获得的土壤图像构建一个机器学习模型。

首先，我们将导入所有库，然后构建一个名为的函数 提取特征 从图像中提取特征。然后导入并处理图像，其中包括转换为灰度，然后我们获得这些特征。然后，在为每个图像提取特征后，使用以下方法对标签进行编码 标签编码器。

import os
import numpy as np
import mahotas as mh
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.ensemble import RandomForestClassifier, StackingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.neighbors import KNeighborsClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score, classification_report
from skimage import color, feature, io

# Function to extract features from an image
def extract_features(image_path):
    img = io.imread(image_path)
    gray_img = color.rgb2gray(img)  # Converting image to grayscale
    gray_img_int = (gray_img * 255).astype(np.uint8)
    haralick_features = mh.features.haralick(gray_img_int).mean(axis=0)
    hog_features, _ = feature.hog(gray_img, visualize=True)
    hog_features_flat = hog_features.flatten()  # Flattening the HOG features
    # Ensuring both sets of features have the same length
    hog_features_flat = hog_features_flat[:haralick_features.shape[0]]
    return np.concatenate((haralick_features, hog_features_flat))

data_dir = "/kaggle/input/soil-classification-dataset/Soil-Dataset"

image_paths = []
labels = []

class_indices = {'Black Soil': 0, 'Cinder Soil': 1, 'Laterite Soil': 2, 
'Peat Soil': 3, 'Yellow Soil': 4}

for soil_class, class_index in class_indices.items():
    class_dir = os.path.join(data_dir, soil_class)
    class_images = [os.path.join(class_dir, image) for image in os.listdir(class_dir)]
    image_paths.extend(class_images)
    labels.extend([class_index] * len(class_images))

# Extracting features from images
X = [extract_features(image_path) for image_path in image_paths]

# Encoding labels
le = LabelEncoder()
y = le.fit_transform(labels)

# Splitting the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initializing and training a Random Forest Classifier
rf_classifier = RandomForestClassifier(n_estimators=100, random_state=42)
rf_classifier.fit(X_train, y_train)

# Making predictions
y_pred_rf = rf_classifier.predict(X_test)

# Evaluating the Random Forest model
accuracy_rf = accuracy_score(y_test, y_pred_rf)
report_rf = classification_report(y_test, y_pred_rf)

print("Random Forest Classifier:")
print("Accuracy:", accuracy_rf)
print("Classification Report:n", report_rf)

深度神经网络

它的工作原理基于计算单元和神经元的数量。每个神经元接受输入并提供输出。它用于提高准确性并做出更好的预测，而机器学习算法依赖于解释数据，并根据它们做出决策。

另请阅读: 深度学习和神经网络入门指南

现在，让我们构建使用 Keras 的 Sequential API 定义的模型。该模型将具有 Conv2D 卷积层、MaxPooling2D、扁平化层 Flatten 和密集层 Dense。

最后，使用以下命令编译模型 Adam 优化器和分类交叉熵损失。

import tensorflow as tf
from tensorflow.keras.preprocessing import image_dataset_from_directory

data_dir = "/kaggle/input/soil-classification-dataset/Soil-Dataset"

# Setting up data generators
batch_size = 32
image_size = (224, 224)

# Using image_dataset_from_directory to load and preprocess the images
train_dataset = image_dataset_from_directory(
    data_dir,
    labels='inferred',
    label_mode='categorical',
    validation_split=0.2,
    subset='training',
    seed=42,
    image_size=image_size,
    batch_size=batch_size,
)

validation_dataset = image_dataset_from_directory(
    data_dir,
    labels='inferred',
    label_mode='categorical',
    validation_split=0.2,
    subset='validation',
    seed=42,
    image_size=image_size,
    batch_size=batch_size,
)

# Displaying the class indices
print("Class indices:", train_dataset.class_names)

model = tf.keras.Sequential([
    tf.keras.layers.Conv2D(32, (3, 3), activation='relu', input_shape=(224, 224, 3)),
    tf.keras.layers.MaxPooling2D(2, 2),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.Dense(len(train_dataset.class_names), activation='softmax')
])

model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

# Training the model
epochs = 10
history = model.fit(train_dataset, epochs=epochs, validation_data=validation_dataset)

import numpy as np
from tensorflow.keras.preprocessing import image

# Function to load and preprocess an image for prediction
def load_and_preprocess_image(img_path):
    img = image.load_img(img_path, target_size=image_size)
    img_array = image.img_to_array(img)
    img_array = np.expand_dims(img_array, axis=0)
    img_array /= 255.0  
    return img_array

image_path = '/kaggle/input/soil-classification-dataset/Soil-Dataset/Peat Soil/13.jpg'
new_image = load_and_preprocess_image(image_path)

# Making predictions
predictions = model.predict(new_image)
predicted_class = np.argmax(predictions[0])

# Getting the class label based on the class indices
class_labels = {0: 'Black Soil', 1: 'Cinder Soil', 2: 'Laterite Soil',
 3: 'Peat Soil', 4: 'Yellow Soil'}
predicted_label = class_labels[predicted_class]

# Displaying the prediction
print("Predicted Class:", predicted_class)
print("Predicted Label:", predicted_label)

正如您所看到的，预测的类别为 0，即黑土。因此，我们的模型正确地对土壤类型进行了分类。

提出的多堆栈集成学习模型架构

堆叠分类器 使用 baseClassifiers 和 a 进行初始化逻辑回归元分类器 最终估计器。这结合了基本分类器的输出来做出最终预测。然后，经过训练和预测，计算准确率。

base_classifiers = [
    ('rf', RandomForestClassifier(n_estimators=100, random_state=42)),
    ('knn', KNeighborsClassifier(n_neighbors=5)),
    ('svm', SVC(kernel='rbf', C=1.0, probability=True)),
    ('nb', GaussianNB())
]

# Initializing the stacking classifier with a logistic regression meta-classifier
stacking_classifier = StackingClassifier(estimators=base_classifiers, 
final_estimator=LogisticRegression())

# Training the stacking classifier
stacking_classifier.fit(X_train, y_train)

# Making predictions with Stacking Classifier
y_pred_stacking = stacking_classifier.predict(X_test)

# Evaluating the Stacking Classifier model
accuracy_stacking = accuracy_score(y_test, y_pred_stacking)
report_stacking = classification_report(y_test, y_pred_stacking)

print("nStacking Classifier:")
print("Accuracy:", accuracy_stacking)
print("Classification Report:n", report_stacking)

结论

土壤是获得好作物的重要因素。了解生产特定作物所需的土壤类型非常重要。因此，对土壤类型进行分类变得很重要。由于手动对土壤类型进行分类是一项耗时的任务，因此使用深度学习模型对其进行分类变得很容易。有许多机器学习模型和深度学习模型来实现这个问题陈述。选择最好的一个取决于数据集中存在的数据的质量和数量以及手头的问题陈述。选择最佳算法的另一种方法是评估每个算法。我们可以通过测量准确度来做到这一点，即测量它们对土壤进行正确分类的程度。最后，我们实现了一个 Multi-Stacking 集成模型，使用多个模型来构建最佳模型。