Splitting Alpha-Numeric Codes

- 5 mins

This past week I was introduced to an interesting problem, splitting an alpha-numeric code into the characters and digits. For instance, 'AA11' => c('AA', '11'). Below is a sloppy but functional implementation.

Starting Out

Turn alpha-numeric code to a vector of two strings (the alpha and numeric).

str_test <- 'QQ1'
c(substr(str_test, 1, 2), substr(str_test, 3, 3))

This is the solution to the problem. However we need to generalize this solution in order to apply this to an entire data set.

Part 1 - String Splitting Logic

Let’s figure out the logic first.

  1. Find where the numeric portion starts. Heads up, the '\\d' is a regular expression (regex). This is a pattern that will select the digits in a string.
  2. Get the alpha portion and the numeric portion
# 1
start_digits <- stringr::str_locate(str_test, '\\d')[[1]]
# 2
alpha <- substr(str_test, 1, start_digits - 1)
nums <- substr(str_test, start_digits, stringr::str_length(str_test))

c(alpha, nums)

Now Encapsulate in a Function

With the logic figured out we can use it to create a function

alphanum_split <- function(str) {
  start_digits <- stringr::str_locate(str, "\\d")[[1]]
  alpha <- substr(str, 1, start_digits - 1)
  nums <- substr(str, start_digits, stringr::str_length(str))

  return(c(alpha, nums))
}

# test
alphanum_split(str_test)

Use Function on Vector

Now that we know that the function works let’s use it on our a vector of alphanumeric strings. This will use lapply to loop over all the strings in the vector and return a list.

strs <- c("A1", "AA1", "AA11", "AAA111", "AAA1", "AAA11")
lapply(strs, function(x) alphanum_split(x))

Part 2 - Convert Alpha to Numbers

An additional portion of this problem converting the characters into numbers. The original data is a grid with each grid element labeled as ‘A1, A2, …’, the goal is to instead label each row with cartesian coordinates. In this case x coordinates are the digits and y coordinates are the characters. The lowest character is 1. This bit is weird, and likely unique to this particular problem.

  1. First create vector of all your alpha characters. If I remember correctly QQQ = 1. I’ve only included some of the alpha codes. These should be in order because we will use the index of each value in the vector.
  2. Replace the characters with numbers.
ex_grid <- lapply(c('QQQ1', 'PPP1', 'QQQ2'), function(x) {
  alphanum_split(x)}
  )
# 1
alphas <- c('QQQ', 'PPP', 'OOO', 'NNN')
names(alphas) <- alphas

# 2
for (i in seq_along(1:length(ex_grid))) {
  ex_grid[[i]][1] <- match(alphas[ex_grid[[i]][1]], alphas)
}

ex_grid

Part 3 - Replace Characters with Numbers

This is a little sloppy but it works right now. The next step is to add each element of the list into an X and Y format.

x <- c()
y <- c()

for (i in seq_along(1:length(ex_grid))) {
  x <- c(x, ex_grid[[i]][2])
  y <- c(y, ex_grid[[i]][1])
}

data.frame(x = x, y = y)

So, this is how to solve your interesting problem. It is a little sloppy but it should work.

rss facebook twitter github youtube mail spotify instagram linkedin google google-plus pinterest medium vimeo stackoverflow reddit quora