Notes on bioinformatics and data mining by G. Corey Shan
Merge columns by shared string
2 min read
Imagine there is a data frame demo as follows and we need to combine columns which hold the same tag. What should we do?
More specifically, if we want to convert binary_var_dat to gene_mat, what procedures can we take?
Here is my solution:
Split each column name by its delimiter;
Replace column names with their splited string;
Merge columns with the same column name.
To be honest, it almost cost me half an hour on this problem. The reason I spent so much time is that I have been suspecting that data frame in R should only possess unique row names and column names. Surprisingly, the data frame naming criterion is very flexible — unique row names and flexible column names.
In specific circumstances, if you want duplicated row names, just transpose the data frame and do whatever you want. I like this design philosophy.