気に食わないRの組み込み関数を上書きして使いやすくした

2012-01-08

(

)

個人的に rank, sort, names<- が気に食わなかったので上書きしてみました。
abicky/R_funcs - GitHub

rank

rank は昇順に並べた場合の順位しか返せないです。降順に並べた場合の順位を返そうとすると、数値であればマイナスを付けた上で rank に渡し、その他の型であれば xtfrm を使って数値に変換する必要があります。
sort や order に decreasing という引数があるんだから rank にも decreasing が欲しいですよね！

> source("rank.R")
> rank(1:5)
[1] 1 2 3 4 5
> rank(1:5, decreasing = TRUE)
[1] 5 4 3 2 1
> rank(letters[1:5])
[1] 1 2 3 4 5
> rank(letters[1:5], decreasing = TRUE)
[1] 5 4 3 2 1

拙作の usage　オブジェクトで簡易ドキュメントを確認すると、見ての通り decreasing という引数を追加しているだけです。

> usage(rank)
Description:

     return the ranks of the values

Usage:

     rank(x, na.last = TRUE,
          ties.method = c("average", "first", "random", "max", "min"),
          decreasing = FALSE)

Arguments:

       x: same as rank function (See also: base::rank)

  na.last: same as rank function (See also: base::rank)

ties.method: same as rank function (See also: base::rank)

decreasing: logical, whether or not the values should be ranked in decreasing order

sort にデータフレームを渡すと order にデータフレームが渡り、order はデータフレームを unlist したような感じのベクトルを返します。
そのベクトル（複数行のデータフレームだと列数より長さが大きい）を元にデータフレームにアクセスするので、当然 “undefined columns selected” というエラーになります。
データフレームをソートする際はいちいち df[order(with(df, order(field1, field2, …))), ] のようにするのが一般的だと思いますが、面倒くさいです。
もっと手軽にソートしたいですよね！

> source("sort.data.frame.R")
> head(sort(iris))  # 引数なしだと全フィールドで昇順にソートします
   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
        4.3         3.0          1.1         0.1  setosa
         4.4         2.9          1.4         0.2  setosa
        4.4         3.0          1.3         0.2  setosa
        4.4         3.2          1.3         0.2  setosa
        4.5         2.3          1.3         0.3  setosa
         4.6         3.1          1.5         0.2  setosa
> head(sort(iris, decreasing = TRUE))  # 降順にもソートできる
    Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
        7.9         3.8          6.4         2.0 virginica
        7.7         3.8          6.7         2.2 virginica
        7.7         3.0          6.1         2.3 virginica
        7.7         2.8          6.7         2.0 virginica
        7.7         2.6          6.9         2.3 virginica
        7.6         3.0          6.6         2.1 virginica
> head(sort(iris, order.by = c(Species, -Sepal.Length)))  # マイナスをつけてフィールドを指定すると逆順にソートする
   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
        5.8         4.0          1.2         0.2  setosa
        5.7         4.4          1.5         0.4  setosa
        5.7         3.8          1.7         0.3  setosa
        5.5         4.2          1.4         0.2  setosa
        5.5         3.5          1.3         0.2  setosa
         5.4         3.9          1.7         0.4  setosa

簡易ドキュメントはこんな感じです。

> usage(sort.data.frame)
Description:

     Sorting Data Frames

Usage:

     sort(x, decreasing = FALSE, na.last = TRUE, order.by)

Arguments:

       x: a data frame

decreasing: same as sort function (See also: base::sort)

 na.last: same as sort function (See also: base::sort)

order.by: expression, indicating columns to order a data frame by the columns

Examples:
     # sort in ascending order by all columns
     sort(iris)

     # sort in decreasing order by all columns
     sort(iris, decreasing = TRUE)

     # sort in ascending order by Species
     # and sort in decreasing order by Sepal.Length
     sort(iris, order.by = c(Species, -Sepal.Length))

names

names<- はある特定の要素の名前だけ変更したい場合に、インデックスでアクセスしてしまうと要素の順番が変わった時にいちいち修正しなければなりません。
メンテナンス性が悪いですよね！
なのでこんな感じに変更したらどうでしょう？

> source("names.R")
> x <- 1:5; names(x) <- letters[1:5]  # 引数なしだと組み込み関数と同じ
> x
a b c d e 
1 2 3 4 5 
> names(x, "b") <- "hoge"  # b を hoge に変更
> x
   a hoge    c    d    e 
   1    2    3    4    5 
> names(x, c("c", "e")) <- c("fuga", "piyo")  # c を fuga に、e を piyo に変更
> x
   a hoge fuga    d piyo 
   1    2    3    4    5 

簡易ドキュメントはこんな感じです。

> usage(`names<-`)
Description:

     Set the names of an object.

Usage:

     names(x) <- value
     names(x, name) <- value

Arguments:

       x: an R object.

    name: a character vector of names to be updated

   value: a character vector of new names

以上、気に食わない関数たちを上書きしてみました！
組み込み関数と同じ名前で関数を定義するのは混乱を招くので良くないですが、
個人的にはこれぐらい組み込み関数レベルでサポートしてほしいなぁと思います！！