DEFI is Here! Wormhole Bridge and YuzuSwap now live on Oasis Network Mainnet.

Community translations | Indonesian |Portugese |Romanian |Russian | French | Greek | Italian |Filipino |Kazakh | Spanish |Turkish |Ukrainian |Bengali |Chinese| DeFi is live on Oasis! It is a massive…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




Preprocessing CSV Data before Applying a Machine Learning Model

The raw CSV data we receive may contain a lot of independent variables or factors because of which it may overfit the training data, for which we might need to apply attribute subset selection as a data reduction technique. Often the raw data has features that aren’t scaled, because of which the model cannot achieve the optimal accuracy. There are a large number of reasons to understand the data preprocessing, so let’s begin…

Below is one of a function me and my team developed to help reduce your efforts.

2. Feature Scaling: One of the most important data preprocessing steps is Feature scaling. The feature scaling can be done in various ways like MinMaxScaler, MaxAbsScaler, RobustScaler, Quantile Transformer, Power Transformer depending on the model and data.

We created a function that could perform all possible methods of feature scaling and normalization techniques.

Standard Scaler scales the data between 0 and 1, MinMaxScaler scales the data within a range that you can mention and is by default between 0 and 1. MaxAbsScaler scales the data between -1 and 1 by default and Robust Scaler is used for non-sparse data with many outliers. There are other scaling methods like Power Transformer and Quantile Transformer.

3. Data Encoding: The categorical data cannot be readily read by a machine learning model, and has to be encoded via one of the encoding techniques like ordinal encoding or one-hot encoding.

4. Discretization: Sometimes we need to convert continuous data to categorical data, for which we use the technique of discretization. Personally, I haven’t encountered the need to perform this technique much often, but we developed a function if the readers find it useful.

5. Feature Selection: Data with a lot of features can cause overfitting, and can be addressed with feature selection, which helps in the reduction of features to only those which are more important to the output variable.

This function has various methods of feature selection based on the method of selection of the features.

Feature Selection methods like Variance Threshold selects features based on the variance with the output variable, whereas there are various univariate methods like SelectKBest. SelectPercentile, SelectFpr, SelectFdr, SelectFwe and a generalized function.

Add a comment

Related posts:

Paris Photo Chronicles

Many tried their hand at self-coiffing during the spring 2020 lockdown, with varying degrees of success, as countless scary selfies attest. Having decided to “go natural” myself, and just shave…

Barbecued Cookies

The barbeque serves as a modern-day summer kitchen. Cook your pizza and bake your chocolate chip cookies outside to keep your house cool during a prairie heatwave.

The 3 types of AI chatbots and how to determine the best one for your healthcare product

I could never convince my mother to go see a doctor whenever she felt sick. When we were growing up, my mother would rush us to the hospital if any of us sneezed or coughed in sleep. But when tables…