mysql – I need to specify some thing like %Home% in the query such that it will return all the rows containing that substring in the specified columns

There are a few ways to accomplish this. There’s the slowest way:

WHERE (page_visited LIKE '%Homepage%') OR (visitor_id LIKE '%Homepage%')

There’s the slow and expensive way:

WHERE CONCAT(page_visited, '|', visitor_id) LIKE '%Homepage%'

Note: The pipe (|) is there to prevent a match being the result of a concatenation.

There’s the inefficient way:

WHERE LOCATE('Homepage', page_visited) + LOCATE('Homepage', visitor_id) > 0

There’s also the individual-lookup way:

SELECT * FROM events
 WHERE page_visited LIKE '%Homepage%'
 UNION ALL
SELECT * FROM events
 WHERE visitor_id LIKE '%Homepage%'

More information would need to be known about the table being queried as well as the columns (Is visitor_id a typo?), the goal of the result set, and how often the query is called in order to recommend any of the more complete solutions.

java – Rabin karp substring search

Below is the Rabin Karp implementation. I am not so sure about it, and there is no where that I can check whether this code really works or not. It will most helpful for extra set of eyes if whether this gives correct output or not.
The code finds the substring (in code it is, pattern) from the string given (in code it is, text). If the substring is found it will return true, otherwise false.

public class RabinKarpSubstring {

    // Check if text contains pattern.
    public boolean isSubstring (String pattern, String text) {
        if (pattern.length() > text.length()) return false;

        int prime = 31;
        int MOD = 1000000009;
        long hashPattern = 0;
        long() primePower = new long(text.length());
        long() hashText = new long(text.length() + 1);
        
        primePower(0) = 1;
        for (int i = 1; i < text.length(); i++) {
            primePower(i) = (primePower(i - 1) * prime) % MOD;
        }

        for (int i = 0; i < text.length(); i++) {
            hashText(i + 1) = (hashText(i) + ((text.charAt(i) - 'a' + 1) * primePower(i))) % MOD;
        }

        for (int i = 0; i < pattern.length(); i++) {
            hashPattern = (hashPattern + ((pattern.charAt(i) - 'a' + 1) * primePower(i))) % MOD;
        }

        for (int i = 0; i < (text.length() - pattern.length() + 1); i++) {
            long currHash = (hashText(i + pattern.length()) + MOD - hashText(i)) % MOD;
            if (currHash == (hashPattern * primePower(i) % MOD)) {
                return true;
            }
        }

        return false;
    }

    public static void main(String() args) {
        System.out.println(new RabinKarpSubstring().isSubstring("aab", "aaaab"));
    }
}

Please do share if anymore information is to added. Thank you.

strings – Replace n or all occurrences of substring in C

I wrote this code as an answer to a question. But I’d like you to have a look at it. This post is basically a copy of the answer I posted

The code does no error checking. It assumes that the output buffer exists and is big enough and that src, orig and new are valid strings. The headers needed:

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <stddef.h>
#include <stdint.h>

The main replace function

// Replace the first occurrence of orig in src with new and write result to dest
// Return pointer to first character after the end of first match in src and
// NULL if no match
const char *replace(char *dest, const char *src,
            const char *orig, const char *new) {
    char *ptr = strstr(src, orig); // First match

    // If no match, we want dest to contain a copy of src
    if(!ptr) {
        strcpy(dest, src);
        return NULL;
    }

    const ptrdiff_t offset = ptr - src; // Calculate offset
    const int origlen = strlen(orig);
    
    strncpy(dest, src, offset); // Copy everything before match
    strcpy(&dest(offset), new); // Copy replacement
    strcpy(&dest(offset + strlen(new)), &src(offset + origlen)); // Copy rest

    return src + offset + origlen;
}

Then we just add a function to handle N occurrences

// Replace maximum n occurrences. Stops when no more matches.
// Returns number of replacements
size_t replaceN(char *dest, const char *src, const char *orig, 
                const char *new, size_t n) {
    size_t ret = 0;

    // Maybe an unnecessary optimization to avoid multiple calls in 
    // loop, but it also adds clarity
    const int newlen = strlen(new);
    const int origlen = strlen(orig);
    
    do {
        const char *ptr = replace(dest, src, orig, new); // Replace

        if(!ptr) return ret; // Quit if no more matches

        // Length of the part of src before first match
        const ptrdiff_t offset = ptr - src;
    
        src = ptr; // Move src past what we have already copied.
    
        ret++;

        dest += offset - origlen + newlen; // Advance pointer to dest to the end

    } while(n > ret);

    return ret;
}

And a simple wrapper for all occurrences. Note that it’s safe to use SIZE_MAX here, because there will never be more occurrences than that.

// Replace all. Returns the number of replacements, because why not?
size_t replaceAll(char *dest, const char *src, const char *orig, const char *new) {
    return replaceN(dest, src, orig, new, SIZE_MAX); 
}

It’s easy to write a wrapper for the allocation. We borrow some code from https://stackoverflow.com/q/9052490/6699433

size_t countOccurrences(const char *str, const char *substr) {
    size_t count = 0;

    while((str = strstr(str, substr)) != NULL) {
       count++;
       str++; // We're standing at the match, so we need to advance
    }

    return count;
}

Then some code to calculate size and allocate a buffer

// Allocate a buffer big enough to hold src with n replacements 
// of orig to new
char *allocateBuffer(const char *src, const char *orig, 
                     const char *new, size_t n) {
    return malloc(strlen(src) + 
                  n * (strlen(new) - strlen(orig)) +
                  1 // Remember the zero terminator
                 );
}

And the two final functions

// Allocates a buffer and replaces max n occurrences of orig with new and
// writes it to the allocated buffer.
// Returns the buffer and NULL if allocation failed
char *replaceNAndAllocate(const char *src, const char *orig, 
                          const char *new, size_t n) {
    const size_t count = countOccurrences(src, orig);

    n = n < count ? n : count; // Min of n and count

    char *buf = allocateBuffer(src, orig, new, n);

    if(!buf) return NULL;

    replaceN(buf, src, orig, new, n);

    return buf;
}

// Allocates a buffer and replaces all occurrences of orig with new and
// writes it to the allocated buffer.
// Returns the buffer and NULL if allocation failed
char *replaceAllAndAllocate(const char *src, const char *orig, const char *new) {
    return replaceNAndAllocate(src, orig, new, SIZE_MAX);
}

And finally, a simple main with a test with multiple occurrences and with original string being a substring of replacement string:

int main(void) {
    char src() = "!!!asdf!!!asdf!!!asdf!!!asdf!!!";
    char orig() = "asdf";
    char new() = "asdfaaaaaaaaasdf";
    
    puts(replaceAllAndAllocate(src, orig, new));
}

No warnings with -Wall -Wextra -pedantic and the output is:

$ ./a.out 
!!!asdfaaaaaaaaasdf!!!asdfaaaaaaaaasdf!!!asdfaaaaaaaaasdf!!!asdfaaaaaaaaasdf!!!

string – Common substring problem with punctuation

I have two texts, the first one is without any punctuation while the second one can have any of these characters in it:

 ("."(period), ","(comma), ":"(colon), ";"(semicolon)...)

I have to find the longest common substring between the two texts (by ignoring any kind of punctuation) but in the output the result must be with the original punctuation of the second text. For instance:

Text1: “I heard about the new game and it is so good I guess”

Text2: “Well, so far so good, I guess.”

If we ignore the punctuation of the second text, the most common substring would be: “so good I guess” and the output should be (“so good, I guess.”), notice that any word can appear more than once, as it happens for “so” in text n.2

I removed the punctuation from the second text and I found the common substring but I can’t find a way to map my result (“so good I guess”) to the original one (“so good, I guess.”) in the second text.

Any help will be highly appreciated. Thank you all in advance!

python – Finding the longest repeating substring with suffix arrays

I found an implementation of suffix arrays here (not sure if it’s the most efficient so if there’s any better I’d be glad to know).

For a given string:

data = '1034408'*10

It generates the following suffix array:

suffix_array = (64, 57, 50, 43, 36, 29, 22, 15, 8, 1, 68, 61, 54, 47, 40, 33, 26, 19, 12, 5, 63, 56, 49, 42, 35, 28, 21, 14, 7, 0, 65, 58, 51, 44, 37, 30, 23, 16, 9, 2, 67, 60, 53, 46, 39, 32, 25, 18, 11, 4, 66, 59, 52, 45, 38, 31, 24, 17, 10, 3, 69, 62, 55, 48, 41, 34, 27, 20, 13, 6)

Having:

len(suffix_array) = 70
max(suffix_array) = 69
suffix_array.index(max(suffix_array)) = 60

Is a suffix tree better suited for this task?

How to check whether substring exists in string using strstr() function in C

This must be a simple problem about char data type and pointers.

void main() {

    const char* a;
    char character = 65;
    a = &character;

    printf("%c n", character); // PRINTS 'A' AS EXPECTED

    if (strstr("ABC", a)) { 
        printf("found n");
    }
    else {
        printf("not foundn"); // goes into else
    }
}

I don’t understand why it doesn’t go into first if statement.

python – Merge two dataframes using multiple substring matches

substring_ref = pd.DataFrame({"keyword_1":("Amazon Web Service -", "Virtual Apps"),  "keyword_2": ("","Service")})

sales = pd.DataFrame({"product_name":("Amazon Web Service - Automation", "Web", "Citrix Virtual Apps Premium OnPrem", "Citrix Virtual Apps Premium Service"), "Value":(10, 90, 500, 340)})

I have two dataframes that I want to merge based on either keyword_1 (if only one keyword is present) OR keyword_1 and keyword_2 both being present in the product_name field of the sales dataframe.

The end result should be this:

keyword_1 keyword_2 product_name Value
Amazon Web Service – Amazon Web Service – Automation 10
Web 90
Citrix Virtual Apps Premium OnPrem 500
Virtual Apps Service Citrix Virtual Apps Premium Service 340

Leetcode 5 Longest Palindromic Substring (python)

Why do I get wrong answer using this code? I try to find all possible substrings and find the longest one after storing them into a list. Thank you for your help!

class Solution:
    def longestPalindrome(self, s: str) -> str:
        
            length  = len(s)
            #get all possible substrings
            combinations = (s(i:j) for i in range(length) for j in range(i+1, length+1))

            #print(combinations)

            rev = s(::-1)
            rev_combinations = (rev(i:j) for i in range(length) for j in range(i+1, length+1))

            #print(rev_combinations)

            pan_l = ()
            for i, c in enumerate(combinations)):
                if combinations(i) == rev_combinations(i):
                    pan_l.append(combinations(i))
            
            if pan_l:
                y = max(pan_l, key=len)
                return y
            else:
                return s(0)
    
            
        
        
        

Automata made by the concatenation of the substring of regular languages

Given:
$$
Comp(L, R) = {text{all possible ways to write: };; a(0..n) . b(0 …m) . a(n + 1 … p) . b(m + 1 … q) … | a in L, b in R }
$$

For example if $A = {xx}$ and $B = {yy}$ the result should be:
$$Comp(A, B) = {aab, aba, baa}$$

Prove that given $A, B$ regular languages $Comp(A, B)$ is regular.

I have tried by assuming that $A$ and $B$ are regular i have a DFA thta accepts A and another one that accepts $B$ and connect every state of A to every state of B via an $epsilon$ transition but it seems that i can skip part of A.