如何在R中分隔带有特殊字符的字符串?

当我们处理文本数据时,很难使它整洁,而这种类型的数据最基本的问题之一就是用一些独特的字符(例如特殊字符)来分隔值。为此,我们可以使用strsplit函数,该函数使在文本值之间进行分隔变得容易。查看以下示例以了解如何完成此操作。

示例

x1<-"A-B-C-D-E-F-G-H-I-J-K-L-M-N-O-P-Q-R-S-T-U-V-W-X-Y-Z"
x1

输出结果

[1] "A-B-C-D-E-F-G-H-I-J-K-L-M-N-O-P-Q-R-S-T-U-V-W-X-Y-Z"

示例

strsplit(x1,"[-]")

输出结果

[[1]] [1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q" "R" "S"
[20] "T" "U" "V" "W" "X" "Y" "Z"

示例

x2<-"AK:AL:AR:AS:AZ:CA:CO:CT:DC:DE:FL:GA:GU:HI:IA:ID:IL:IN:KS:KY:LA:MA:MD:ME:MI:MN:MO:MP:MS:MT:NC:ND:NE:NH:NJ:NM:NV:NY:OH:OK:OR:PA:PR:RI:SC:SD:TN:TX:UM:UT:VA:VI:VT:WA:WI:WV:WY"
x2

输出结果

[1] "AK:AL:AR:AS:AZ:CA:CO:CT:DC:DE:FL:GA:GU:HI:IA:ID:IL:IN:KS:KY:LA:MA:M
D:ME:MI:MN:MO:MP:MS:MT:NC:ND:NE:NH:NJ:NM:NV:NY:OH:OK:OR:PA:PR:RI:SC:SD:TN:TX:UM:UT:VA:VI:VT:WA:WI:WV:WY"

示例

strsplit(x2,"[:]")

输出结果

[[1]] [1] "AK" "AL" "AR" "AS" "AZ" "CA" "CO" "CT" "DC" "DE" "FL" "GA" "GU" "HI" "IA"
[16] "ID" "IL" "IN" "KS" "KY" "LA" "MA" "MD" "ME" "MI" "MN" "MO" "MP" "MS" "MT" 
[31] "NC" "ND" "NE" "NH" "NJ" "NM" "NV" "NY" "OH" "OK" "OR" "PA" "PR" "RI" "SC"
[46] "SD" "TN" "TX" "UM" "UT" "VA" "VI" "VT" "WA" "WI" "WV" "WY"

示例

x3<-"AK/AL/AR/AS/AZ/CA/CO/CT/DC/DE/FL/GA/GU/HI/IA/ID/IL/IN/KS/KY/LA/MA/MD/ME/MI/MN/MO/MP/MS/MT/NC/ND/NE/NH/NJ/NM/NV/NY/OH/OK/OR/PA/PR/RI/SC/SD/TN/TX/UM/UT/VA/VI/VT/WA/WI/WV/WY"
x3

输出结果

[1] "AK/AL/AR/AS/AZ/CA/CO/CT/DC/DE/FL/GA/GU/HI/IA/ID/IL/IN/KS/KY/LA/MA/MD/ME/MI/MN/MO/MP/MS/MT/NC/ND/NE/NH/NJ/NM/NV/NY/OH/OK/OR/PA/PR/RI/SC/SD/TN/TX/UM/UT/VA/VI/VT/WA/WI/WV/WY"

示例

strsplit(x3,"[/]")

输出结果

[[1]] [1] "AK" "AL" "AR" "AS" "AZ" "CA" "CO" "CT" "DC" "DE" "FL" "GA" "GU" "HI" "IA"
[16] "ID" "IL" "IN" "KS" "KY" "LA" "MA" "MD" "ME" "MI" "MN" "MO" "MP" "MS" "MT"
[31] "NC" "ND" "NE" "NH" "NJ" "NM" "NV" "NY" "OH" "OK" "OR" "PA" "PR" "RI" "SC"
[46] "SD" "TN" "TX" "UM" "UT" "VA" "VI" "VT" "WA" "WI" "WV" "WY"

示例

x4<-"AK~AL~AR~AS~AZ~CA~CO~CT~DC~DE~FL~GA~GU~HI~IA~ID~IL~IN~KS~KY~LA~MA~MD~ME~MI~MN~MO~MP~MS~MT~NC~ND~NE~NH~NJ~NM~NV~NY~OH~OK~OR~PA~PR~RI~SC~SD~TN~TX~UM~UT~VA~VI~VT~WA~WI~WV~WY"
x4

输出结果

[1] "AK~AL~AR~AS~AZ~CA~CO~CT~DC~DE~FL~GA~GU~HI~IA~ID~IL~IN~KS~KY~LA~MA~MD~ME~MI~MN~MO~MP~MS~MT~NC~ND~NE~NH~NJ~NM~NV~NY~OH~OK~OR~PA~PR~RI~SC~SD~TN~TX~UM~UT~VA~VI~VT~WA~WI~WV~WY"

示例

strsplit(x4,"[~]")

输出结果

[[1]] [1] "AK" "AL" "AR" "AS" "AZ" "CA" "CO" "CT" "DC" "DE" "FL" "GA" "GU" "HI" "IA"
[16] "ID" "IL" "IN" "KS" "KY" "LA" "MA" "MD" "ME" "MI" "MN" "MO" "MP" "MS" "MT"
[31] "NC" "ND" "NE" "NH" "NJ" "NM" "NV" "NY" "OH" "OK" "OR" "PA" "PR" "RI" "SC"
[46] "SD" "TN" "TX" "UM" "UT" "VA" "VI" "VT" "WA" "WI" "WV" "WY"

示例

x5<-"AK*AL*AR*AS*AZ*CA*CO*CT*DC*DE*FL*GA*GU*HI*IA*ID*IL*IN*KS*KY*LA*MA*MD*ME*MI*MN*MO*MP*MS*MT*NC*ND*NE*NH*NJ*NM*NV*NY*OH*OK*OR*PA*PR*RI*SC*SD*TN*TX*UM*UT*VA*VI*VT*WA*WI*WV*WY"
x5

输出结果

[1] "AK*AL*AR*AS*AZ*CA*CO*CT*DC*DE*FL*GA*GU*HI*IA*ID*IL*IN*KS*KY*LA*MA*MD*ME*MI*MN*MO*MP*MS*MT*NC*ND*NE*NH*NJ*NM*NV*NY*OH*OK*OR*PA*PR*RI*SC*SD*TN*TX*UM*UT*VA*VI*VT*WA*WI*WV*WY"

示例

strsplit(x5,"[*]")

输出结果

[[1]] [1] "AK" "AL" "AR" "AS" "AZ" "CA" "CO" "CT" "DC" "DE" "FL" "GA" "GU" "HI" "IA"
[16] "ID" "IL" "IN" "KS" "KY" "LA" "MA" "MD" "ME" "MI" "MN" "MO" "MP" "MS" "MT"
[31] "NC" "ND" "NE" "NH" "NJ" "NM" "NV" "NY" "OH" "OK" "OR" "PA" "PR" "RI" "SC"
[46] "SD" "TN" "TX" "UM" "UT" "VA" "VI" "VT" "WA" "WI" "WV" "WY"

示例

x6<-c("AK*AL*AR*AS*AZ*CA","CO*CT*DC*DE*FL*GA","GU*HI*IA*ID*IL*IN*KS","KY*LA*MA*MD*ME*MI","MN*MO*MP*MS*MT*NC","ND*NE*NH*NJ*NM*NV","NY*OH*OK*OR*PA*PR","RI*SC*SD*TN*TX*UM","UT*VA*VI*VT","WA*WI*WV*WY")
x6

输出结果

[1] "AK*AL*AR*AS*AZ*CA" "CO*CT*DC*DE*FL*GA" "GU*HI*IA*ID*IL*IN*KS"
[4] "KY*LA*MA*MD*ME*MI" "MN*MO*MP*MS*MT*NC" "ND*NE*NH*NJ*NM*NV"
[7] "NY*OH*OK*OR*PA*PR" "RI*SC*SD*TN*TX*UM" "UT*VA*VI*VT"
[10] "WA*WI*WV*WY"

示例

strsplit(x6,"[*]")

输出结果

[[1]] [1] "AK" "AL" "AR" "AS" "AZ" "CA"
[[2]] [1] "CO" "CT" "DC" "DE" "FL" "GA"
[[3]] [1] "GU" "HI" "IA" "ID" "IL" "IN" "KS"
[[4]] [1] "KY" "LA" "MA" "MD" "ME" "MI"
[[5]] [1] "MN" "MO" "MP" "MS" "MT" "NC"
[[6]] [1] "ND" "NE" "NH" "NJ" "NM" "NV"
[[7]] [1] "NY" "OH" "OK" "OR" "PA" "PR"
[[8]] [1] "RI" "SC" "SD" "TN" "TX" "UM"
[[9]] [1] "UT" "VA" "VI" "VT"
[[10]] [1] "WA" "WI" "WV" "WY"