Python – Why do I get an error called "expected string or byte-like object" when I try to use the expression "re.sub" in my code?

I have a CSV file that has 10 columns. my project is to classify the ratings in my file with nlp as good or bad. When I tokenize the column that stores reviews (validation text column) with the re.sub method, an error called "expected string or byte-like object" is thrown.

I have attached my csv file and also the code that I tried in jupyter note book.
Please help me with the completion of this project.

This is my data file

My code is so for the time being and the error is in the line & # 39; re & # 39;

import numpy as np
import pandas as pd
import nltk
import matplotlib

dataset = pd.read_csv("C:/Users/a/Downloads/data.tsv", delimiter = "t", quoting = 1)
dataset.head()

import re
import nltk
from nltk.corpus import stopwords
from nltk.stem.porter import PorterStemmer
corpus = ()
for i in range(0, 1000):
  review = re.sub('(^a-zA-Z)', ' ', dataset('Review Text')(i))
  review = review.lower()
  review = review.split()
  ps = PorterStemmer()
  review = (ps.stem(word) for word in review if not word in 
  set(stopwords.words('english')))
  review = ' '.join(review)
  corpus.append(review)

How do I correct my mistake and go from here? The next steps I want to do are vectorization, training and classification
Please help me

Performance – Program in JAVA to generate a random alphanumeric string

This code generates a pseudo-random alphanumeric string of a certain length.
I would welcome suggestions on how to make it more random. Did I also commit violations of conventions, exceptional cases and the like?
Is there any way to make it faster?

public class Test
{
    public static String getRandomAlphaNum(int length)
    {
        String charstring = "abcdefghijklmnopqrstuvwxyz0123456789";
        String randalphanum = "";
        double randroll;
        char randchar;
        for
        (double i = 0; i < length; i++)
        {
            randroll = Math.random();
            randchar = '@';
            for
            (int j = 1; j <= 36; j++)
            {
                if
                (randroll <= (1.0 / 36.0 * j))
                {
                    randchar = charstring.charAt(j - 1);
                    break;
                }
            }
            randalphanum += randchar;
        }
        return randalphanum;
    }

Python string contains a newline symbol ( n). How can you use regex to replace n with n?

I have a single string

test = " n this nisnnto learn nnn regex n in nn python."

I used re.sub to find and replace n to n, but it does not work

check = re.sub("n", "n", test)

Expected result:

this 
is 
to 
learn
regex
in 
python

There's another way to do the same thing as above, but if I compare it to the time spent, Regex will win

other option

# I need to loop through every single word in the string plus it doesn't change 
  if I have data like this "nthisnnis" 

check = test.replace("n", "n")

java – Compare string with a number

Well, I have a little doubt, how do I compare String to a number?
I have the following:

String weight = st.nextToken();
        if(weight >= 15) {

        }

But it gives me this mistake:

The operator> = is undefined for the argument types String, int

But I would not know how to compare it, I know that token is a number.
(I'm reading from a delimited file, so I'm using StringTokenizer).

c ++ – LZ77 compression (even longest string match)

So I implement the LZ77 compression algorithm. To compress a file type, I use its binary representation and then read it as chars (because 1 char is equal to 1 byte (afaik) to a std::string, The current program version compresses and decompresses files (.txt, .bmp etc.) only in small steps. The size of the raw file in bytes is the size of the uncompressed file. Although I began to wonder if using the byte representation instead of bits is optimal at all:

  1. Is it optimal to use chars (Bytes) instead of single bits? No possible loss of bits?

Is there also a way to compare the file size in bits instead of bytes? (Forgive me for stupid questions)


And now to the actual code Part. Here is the brief information about how LZ77 handles compression:
Sliding window consists of 2 buffers
Example of compression

Here are 2 main functions: compress and findLongestMatch:

  • compress moves char data between 2 buffers and stores coded tuples ⟨Offset, length, nextchar⟩
  • findLongestMatch finds the longest match of lookheadBuffer in
    history buffer
  1. Is there a more elegant and effective way to search for the longest hit?

(Also theoretically, algo should look from right to left, but is there a difference in complexity? offset is actually longerit is still int and still 4 bytes – I convert everyone int in 4 chars (Bytes) for saving to an output binary file)


LZ77::Triplet LZ77::slidingWindow::findLongestPrefix()
{
    // Minimal tuple (if no match >1 is found)
    Triplet n(0, 0, lookheadBuffer(0));

    size_t lookCurrLen = lookheadBuffer.length() - 1;
    size_t histCurrLen = historyBuffer.length();

    // Increasing the substring (match) length on every iteration 
    for (size_t i = 1; i <= std::min(lookCurrLen, histCurrLen); i++)
    {
        // Getting the substring
        std::string s = lookheadBuffer.substr(0, i);

        size_t pos = historyBuffer.find(s);
        if (pos == std::string::npos)
            break;

        if ((historyBuffer.compare(histCurrLen - i, i, s) == 0) && (lookheadBuffer(0) == lookheadBuffer(i)))
            pos = histCurrLen - i;

        // If the longest match is found, check if there are any repeats
       // following the of current longest substring in lookheadBuffer
        int extra = 0;
        if (histCurrLen == pos + i)
        {
            // Check for full following repeats
            while ((lookCurrLen >= i + extra + i) && (lookheadBuffer.compare(i + extra, i, s) == 0))
                extra += i;

            // Check for partial following repeats
            int extraextra = i - 1;
            while (extraextra > 0)
            {
                if ((lookCurrLen >= i + extra + extraextra) && (lookheadBuffer.compare(i + extra, extraextra, s, 0, extraextra) == 0))
                    break;
                extraextra--;
            }

            extra += extraextra;
        }

        // Compare the lengths of matches
        if (n.length <= i + extra)
            n = Triplet(histCurrLen - pos, i + extra, lookheadBuffer(i + extra));
    }

    return n;
}

void LZ77::compress()
{
    do
    {
        if ((window.lookheadBuffer.length() < window.lookBufferMax) && (byteDataString.length() != 0))
        {
            int len = window.lookBufferMax - window.lookheadBuffer.length();
            window.lookheadBuffer.append(byteDataString, 0, len);
            byteDataString.erase(0, len);
        }

        LZ77::Triplet tiplet = window.findLongestPrefix();

        // Move the used part of lookheadBuffer to historyBuffer
        window.historyBuffer.append(window.lookheadBuffer, 0, tiplet.length + 1);
        window.lookheadBuffer.erase(0, tiplet.length + 1);

        // If historyBuffer's size exceeds max, delete oldest part 
        if (window.historyBuffer.length() > window.histBufferMax)
            window.historyBuffer.erase(0, window.historyBuffer.length() - window.histBufferMax);

        encoded.push_back(tiplet);

    } while (window.lookheadBuffer.length());
} 

Accessories features:

int intFromBytes(std::istream& is)
{
    char bytes(4);
    for (int i = 0; i < 4; ++i)
        is.get(bytes(i));

    int integer;
    std::memcpy(&integer, &bytes, 4);
    return integer;
}


void intToBytes(std::ostream& os, int value)
{
    char bytes(4);
    std::memcpy(&bytes, &value, 4);
    os.write(bytes, 4);
}

struct Triplet
{
    int offset;
    int length;
    char next;
}

How do I convert URL to String in Swing?

In my code, I try to read the URL of a previous image that I've successfully saved to Storage Firebase. An error has occurred at the statement. Let url1 = url as! string
I need to convert URL to string, but the cast does not work
The pressure instruction is ok. I see the correct URL when printing the log screen (URL? .AbsoluteURL as Any).

   let armazenamento = Storage.storage().reference()
    let imagens = armazenamento.child("imagens")
    if let imagemSelecionada=imagem.image {
        if let imagemDados = imagemSelecionada.jpegData(compressionQuality: 0.1) {
            imagens.child("(self.idImagem).jpg").putData(imagemDados, metadata: nil, completion: {(metaDados, erro) in
                if erro == nil {
                    print("Sucess upload")
                    imagens.child("(self.idImagem).jpg").downloadURL(completion: { (url, erro) in
                        if(erro == nil)
                        {
                            print(url?.absoluteURL as Any )
                            let url1 = url as! String
                            self.performSegue(withIdentifier: "selecionarUsuarioSegue", sender: url1)
                        }else{
                            print(err!);
                        }
                    })

Reductions – Decide if the language of the Turing machine contains an a or a b string

During the school exercises we worked on decision problems and there was one that I do not really understand. We received solutions and explanations for this exercise, but I need more guidance.

The problem is:

Prove it
$ L = { | M $ is Turing machine, $ | L (M) cap {a, b } | = 1 } notin REC $

Proof idea:

  1. Reduction of retention problems
  2. machine $ M_x $ (Output of the reduction function) checks whether at its input $ a $, stores this information
  3. if $$ (Instance of HP) has no required structure, $ M_x $ Rejects
  4. $ M_x $ simulated $ M_h $ on $ W_h $, if $ M_h $ accepted, $ M_x $ accepted if it had on his input $ a $otherwise refuses … if $ M_h $ is in the loop $ M_x $ is also in the loop

then

  • $ L (M) = emptyset $ if $$ has damaged or damaged structure $ M_h $ is in the loop on the word $ W_h $
  • $ L (M) = {a } $ if $$ has the right structure and $ M_h $ stops

How I see it:

  • $ M_x $ stores information of its input
    • if $ M_h $ holds, $ M_x $ Checks stored information about inputs, if any $ a $ or $ b $ accept, otherwise refuse

then

  • $ L (M) = emptyset $ if $$ has structure or damaged $ M_h $ is in the loop on the word $ W_h $
  • $ L (M) = {a } $ if $$ has the right structure and $ M_h $ stops with input $ a $
  • $ L (M) = {b } $ if $$ has the right structure and $ M_h $ stops with input $ b $

I do not understand what happened to it $ b $,

converting – fasm converts hexadecimal byte into a binary string

This fasm x64 (Linux) code seems to be very crude and repetitive, but it does the job. How could I do this task in a more idiomatic way?

format ELF64 executable 3

segment readable executable

entry $

prompt_user:
mov edx,prompt_len  
lea rsi,(prompt) ;<------ String
mov edi,1       ; STDOUT
mov eax,1       ; sys_write
syscall

get_user_input:
mov rax, 0 ; sys_read
mov rdi, 0 ; STDIN
lea rsi, (bit_string) ; <----- defined memory location
mov rdx, 2 ; <---- # of chars to read
syscall

;DEBUG CODE:
;lea rax, (bit_string)
;mov (rax), byte 'A'
;mov (rax+1), byte 'B'
lea rax, (bit_string)
cmp (rax), byte '0'
je load_0
cmp (rax), byte '1'
je load_1
cmp (rax), byte '2'
je load_2
cmp (rax), byte '3'
je load_3
cmp (rax), byte '4'
je load_4
cmp (rax), byte '5'
je load_5
cmp (rax), byte '6'
je load_6
cmp (rax), byte '7'
je load_7
cmp (rax), byte '8'
je load_8
cmp (rax), byte '9'
je load_9
cmp (rax), byte 'A'
je load_A
cmp (rax), byte 'B'
je load_B
cmp (rax), byte 'C'
je load_C
cmp (rax), byte 'D'
je load_D
cmp (rax), byte 'E'
je load_E
cmp (rax), byte 'F'
je load_F
load_0:
lea rsi, (n0)
jmp print_nibble
load_1:
lea rsi, (n1)
jmp print_nibble
load_2:
lea rsi, (n2)
jmp print_nibble
load_3:
lea rsi, (n3)
jmp print_nibble
load_4:
lea rsi, (n4)
jmp print_nibble
load_5:
lea rsi, (n5)
jmp print_nibble
load_6:
lea rsi, (n6)
jmp print_nibble
load_7:
lea rsi, (n7)
jmp print_nibble
load_8:
lea rsi, (n8)
jmp print_nibble
load_9:
lea rsi, (n9)
jmp print_nibble
load_A:
lea rsi, (nA)
jmp print_nibble
load_B:
lea rsi, (nB)
jmp print_nibble
load_C:
lea rsi, (nC)
jmp print_nibble
load_D:
lea rsi, (nD)
jmp print_nibble
load_E:
lea rsi, (nE)
jmp print_nibble
load_F:
lea rsi, (nF)
jmp print_nibble

nibble_two:
lea rax, (bit_string)
cmp (rax+1), byte '0'
je load_0
cmp (rax+1), byte '1'
je load_1
cmp (rax+1), byte '2'
je load_2
cmp (rax+1), byte '3'
je load_3
cmp (rax+1), byte '4'
je load_4
cmp (rax+1), byte '5'
je load_5
cmp (rax+1), byte '6'
je load_6
cmp (rax+1), byte '7'
je load_7
cmp (rax+1), byte '8'
je load_8
cmp (rax+1), byte '9'
je load_9
cmp (rax+1), byte 'A'
je load_A
cmp (rax+1), byte 'B'
je load_B
cmp (rax+1), byte 'C'
je load_C
cmp (rax+1), byte 'D'
je load_D
cmp (rax+1), byte 'E'
je load_E
cmp (rax+1), byte 'F'
je load_F

print_intro:
mov edx,intro_len   
lea rsi,(intro) 
mov edi,1       ; STDOUT
mov eax,1       ; sys_write
syscall
ret

print_nibble:
cmp (position), 0 ; If were on first iter, then print intro
jne skip_intro
push rsi
call print_intro
pop rsi

skip_intro:
mov rax, 1
cmp al, (position)
jl go_out
mov edx, 4  

mov edi,1       ; STDOUT
mov eax,1       ; sys_write
syscall
add (position), 1
jmp nibble_two

go_out:
mov edx,1   

lea rsi,(nl) 
mov edi,1       ; STDOUT
mov eax,1       ; sys_write
syscall

xor edi,edi     ; exit code 0
mov eax,60      ; sys_exit
syscall

segment readable writeable
prompt db 'Enter a byte in the format: F6', 0xA
prompt_len = $ - prompt
nl db 0xA
intro db 'In binary: ', 0
intro_len = $ - intro
bit_string rb 2
position db 0
n0 db '0000', 0
n1 db '0001', 0
n2 db '0010', 0
n3 db '0011', 0
n4 db '0100', 0
n5 db '0101', 0
n6 db '0110', 0
n7 db '0111', 0
n8 db '1000', 0
n9 db '1001', 0
nA db '1010', 0
nB db '1011', 0
nC db '1100', 0
nD db '1101', 0
nE db '1110', 0
nF db '1111', 0