R Split Continuous Variable Into Bins

R language has various data types, and the most common data type is Vector. However, merging and splitting is a common operation in any programming language, and today, we will see how to split vector and data frames into various groups in R.

split in R

The split() is a built-in R function that divides the Vector or data frame into the groups defined by the function. It accepts the vector or data frame as an argument and returns the data into groups.

The unsplit() function in R does the reverse of the split() function. The value returned from the split() function is a list of vectors containing the groups' values.

Syntax

                          split(x, f, drop = FALSE, ...) split(x, f, drop = FALSE, sep = ".", lex.order = FALSE, ...)                      

Parameters

Thexis a vector or data frame to be divided into groups.

Thefis a 'factor' because as.factor(f) defines the grouping or a list of such factors, and their interaction is used for the grouping.

Thedropis a logical argument suggesting if the levels that do not occur should be dropped.

The sepis a separator, a character string, passed to the interaction where f is a list.

Thelex.orderis a logical argument that passed to interaction when f is a list.

Example

Suppose you have a named vector, where the name of each element corresponds to the group to which the element belongs.

Hence, you can split a vector into two vectors where items are of the same group, passing the names of the vector with the names function to argument f.

Let's define a named vector using the c() function.

                          rv <- c(x = 3, y = 5, x = 1, x = 4, y = 3)              rv                      

Output

To divide into groups, use the split() function. We will divide the data into the x and y groups.

                          rv <- c(x = 3, y = 5, x = 1, x = 4, y = 3)              rv              data <- split(rv, f = names(rv))              data                      

Output

                          x y x x y              3 5 1 4 3              $x              x x x              3 1 4              $y              y y              5 3                      

You can see that our vector is divided by its groups defined by the names.

You can also pass a character vector as a parameter to f to indicate the corresponding groups of each element or directly a factor object.

                          rv <- c("Mando1", "Mando2", "Mando1", "Mando1", "Mando2")              rv              data <- split(rv, f = factor(rv))              data                      

Output

                          [1] "Mando1" "Mando2" "Mando1" "Mando1" "Mando2"              $Mando1              [1] "Mando1" "Mando1" "Mando1"              $Mando2              [1] "Mando2" "Mando2"                      

Split data in Multiple groups in R

To split the data into multiple groups, use the input of the argumentf  as a list.

                          rv <- c(x = 3, y = 5, x = 1, x = 4, y = 3)              rv1 <- c("Mando1", "Mando2", "Mando1", "Mando1", "Mando2")              rv1              rv2 <- c("DarkTrooper1", "DarkTrooper2", "DarkTrooper2", "DarkTrooper1", "DarkTrooper1")              rv2              data <- split(rv, f = list(rv1, rv2))              data                      

Output

                          [1] "Mando1" "Mando2" "Mando1" "Mando1" "Mando2"              [1] "DarkTrooper1" "DarkTrooper2" "DarkTrooper2" "DarkTrooper1" "DarkTrooper1"              $Mando1.DarkTrooper1              x x              3 4              $Mando2.DarkTrooper1              y              3              $Mando1.DarkTrooper2              x              1              $Mando2.DarkTrooper2              y              5                      

You can see that by default, the group interactions are separated with a dot and that the output contains all possible groups even when there are no observations in some of them.

However, you can customize that with the sep and drop arguments, respectively. See the following code.

                          rv <- c(x = 3, y = 5, x = 1, x = 4, y = 3)              rv1 <- c("Mando1", "Mando2", "Mando1", "Mando1", "Mando2")              rv2 <- c("DarkTrooper1", "DarkTrooper2", "DarkTrooper2", "DarkTrooper1", "DarkTrooper1")              data <- split(rv, f = list(rv1, rv2), drop = TRUE, sep = ": ")              data                      

Output

                          $`Mando1: DarkTrooper1`              x x              3 4              $`Mando2: DarkTrooper1`              y              3              $`Mando1: DarkTrooper2`              x              1              $`Mando2: DarkTrooper2`              y              5                      

Splitting the data frame in R

To split the data frame in R, use the split() function. You can split a data set into subsets based on one or more variables representing groups of the data. R-lang comes with some inbuilt data sets, which we will use in this example.

Let's use the R inbuilt dataset calledToothGrowth.

                          data("ToothGrowth")              head(ToothGrowth)                      

Output

                          len  supp dose 1  4.2  VC   0.5 2 11.5  VC   0.5 3 7.3   VC   0.5 4 5.8   VC   0.5 5 6.4   VC   0.5 6 10.0  VC   0.5          

The head() function returns the first six rows of the dataset.

You can use the split() function to split the data frame into groups based on the len variable.

                          data("ToothGrowth")              df <- head(ToothGrowth)              data <- split(df, f = df$len)              data                      

Output

                          $`4.2`                              len supp dose              1  4.2  VC  0.5              $`5.8`                              len supp dose              4  5.8  VC  0.5              $`6.4`                              len supp dose              5  6.4 VC 0.5              $`7.3`                              len supp dose              3  7.3  VC 0.5              $`10`                              len supp dose              6  10   VC  0.5              $`11.5`                              len supp dose              2  11.5 VC 0.5                      

You can see from the output that we have divided the dataset into subsets that meet different combinations of groups simultaneously. As an example, you can create a split of the sample data frame with len columns. This will create four subsets with all possible combinations of the groups.

If you want to divide a data frame based on more columns or groups, then pass the listas a value to the f. For example, see the following code snippet.

                          split(df, f = list(df$len, df$dose))                      

To recover the original data frame from split() function, use the unsplit() function. The syntax for unsplit() function is the following.

Conclusion

To split the vector or data frame in R, use thesplit()function. To recover the split vector or data frame, use theunsplit()method.

See also

How to add column in R data frame

How to add vectors in R

R append to list

simmonsculd1982.blogspot.com

Source: https://r-lang.com/r-split-function-how-to-split-vector-and-data-frame/

0 Response to "R Split Continuous Variable Into Bins"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel