R Split Continuous Variable Into Bins
R language has various data types, and the most common data type is Vector. However, merging and splitting is a common operation in any programming language, and today, we will see how to split vector and data frames into various groups in R.
split in R
The unsplit() function in R does the reverse of the split() function. The value returned from the split() function is a list of vectors containing the groups' values.
Syntax
split(x, f, drop = FALSE, ...) split(x, f, drop = FALSE, sep = ".", lex.order = FALSE, ...)
Parameters
Thexis a vector or data frame to be divided into groups.
Thefis a 'factor' because as.factor(f) defines the grouping or a list of such factors, and their interaction is used for the grouping.
Thedropis a logical argument suggesting if the levels that do not occur should be dropped.
The sepis a separator, a character string, passed to the interaction where f is a list.
Thelex.orderis a logical argument that passed to interaction when f is a list.
Example
Suppose you have a named vector, where the name of each element corresponds to the group to which the element belongs.
Hence, you can split a vector into two vectors where items are of the same group, passing the names of the vector with the names function to argument f.
Let's define a named vector using the c() function.
rv <- c(x = 3, y = 5, x = 1, x = 4, y = 3) rv
Output
To divide into groups, use the split() function. We will divide the data into the x and y groups.
rv <- c(x = 3, y = 5, x = 1, x = 4, y = 3) rv data <- split(rv, f = names(rv)) data
Output
x y x x y 3 5 1 4 3 $x x x x 3 1 4 $y y y 5 3
You can see that our vector is divided by its groups defined by the names.
You can also pass a character vector as a parameter to f to indicate the corresponding groups of each element or directly a factor object.
rv <- c("Mando1", "Mando2", "Mando1", "Mando1", "Mando2") rv data <- split(rv, f = factor(rv)) data
Output
[1] "Mando1" "Mando2" "Mando1" "Mando1" "Mando2" $Mando1 [1] "Mando1" "Mando1" "Mando1" $Mando2 [1] "Mando2" "Mando2"
Split data in Multiple groups in R
To split the data into multiple groups, use the input of the argumentf as a list.
rv <- c(x = 3, y = 5, x = 1, x = 4, y = 3) rv1 <- c("Mando1", "Mando2", "Mando1", "Mando1", "Mando2") rv1 rv2 <- c("DarkTrooper1", "DarkTrooper2", "DarkTrooper2", "DarkTrooper1", "DarkTrooper1") rv2 data <- split(rv, f = list(rv1, rv2)) data
Output
[1] "Mando1" "Mando2" "Mando1" "Mando1" "Mando2" [1] "DarkTrooper1" "DarkTrooper2" "DarkTrooper2" "DarkTrooper1" "DarkTrooper1" $Mando1.DarkTrooper1 x x 3 4 $Mando2.DarkTrooper1 y 3 $Mando1.DarkTrooper2 x 1 $Mando2.DarkTrooper2 y 5
You can see that by default, the group interactions are separated with a dot and that the output contains all possible groups even when there are no observations in some of them.
However, you can customize that with the sep and drop arguments, respectively. See the following code.
rv <- c(x = 3, y = 5, x = 1, x = 4, y = 3) rv1 <- c("Mando1", "Mando2", "Mando1", "Mando1", "Mando2") rv2 <- c("DarkTrooper1", "DarkTrooper2", "DarkTrooper2", "DarkTrooper1", "DarkTrooper1") data <- split(rv, f = list(rv1, rv2), drop = TRUE, sep = ": ") data
Output
$`Mando1: DarkTrooper1` x x 3 4 $`Mando2: DarkTrooper1` y 3 $`Mando1: DarkTrooper2` x 1 $`Mando2: DarkTrooper2` y 5
Splitting the data frame in R
To split the data frame in R, use the split() function. You can split a data set into subsets based on one or more variables representing groups of the data. R-lang comes with some inbuilt data sets, which we will use in this example.
Let's use the R inbuilt dataset calledToothGrowth.
data("ToothGrowth") head(ToothGrowth)
Output
len supp dose 1 4.2 VC 0.5 2 11.5 VC 0.5 3 7.3 VC 0.5 4 5.8 VC 0.5 5 6.4 VC 0.5 6 10.0 VC 0.5
The head() function returns the first six rows of the dataset.
You can use the split() function to split the data frame into groups based on the len variable.
data("ToothGrowth") df <- head(ToothGrowth) data <- split(df, f = df$len) data
Output
$`4.2` len supp dose 1 4.2 VC 0.5 $`5.8` len supp dose 4 5.8 VC 0.5 $`6.4` len supp dose 5 6.4 VC 0.5 $`7.3` len supp dose 3 7.3 VC 0.5 $`10` len supp dose 6 10 VC 0.5 $`11.5` len supp dose 2 11.5 VC 0.5
You can see from the output that we have divided the dataset into subsets that meet different combinations of groups simultaneously. As an example, you can create a split of the sample data frame with len columns. This will create four subsets with all possible combinations of the groups.
If you want to divide a data frame based on more columns or groups, then pass the listas a value to the f. For example, see the following code snippet.
split(df, f = list(df$len, df$dose))
To recover the original data frame from split() function, use the unsplit() function. The syntax for unsplit() function is the following.
Conclusion
To split the vector or data frame in R, use thesplit()function. To recover the split vector or data frame, use theunsplit()method.
See also
How to add column in R data frame
How to add vectors in R
R append to list
Krunal Lathiya is an Information Technology Engineer by education and web developer by profession. He has worked with many back-end platforms, including Node.js, PHP, and Python. In addition, Krunal has excellent knowledge of Data Science and Machine Learning, and he is an expert in R Language. Krunal has written many programming blogs, which showcases his vast expertise in this field.
Source: https://r-lang.com/r-split-function-how-to-split-vector-and-data-frame/
0 Response to "R Split Continuous Variable Into Bins"
Post a Comment