python 3.x – The classification of multiple classes in large datasets can not be interrupted and split. The normal classification can not be processed

I have a huge (550MB) rental club database here and must predict the type of credit ratings (A, B, C, D, E, F, G). The Dask data structure is:

    Unnamed: 0  Unnamed: 0.1    loan_amnt   funded_amnt funded_amnt_inv term    int_rate    installment annual_inc  issue_d ... addr_state_SD   addr_state_TN   addr_state_TX   addr_state_UT   addr_state_VA   addr_state_VT   addr_state_WA   addr_state_WI   addr_state_WV   addr_state_WY
0   41131   931434  24000   24000   24000.0 0   8.49    757.51  80000.0 2015    ... 0   0   0   0   0   0   1   0   0   0
1   41132   942549  6000    6000    6000.0  0   11.22   197.06  52000.0 2015    ... 0   0   0   0   0   0   0   0   0   0
2   41135   931619  8000    8000    8000.0  0   9.80    257.39  55000.0 2015    ... 0   0   0   0   0   0   0   0   0   0
3   41136   935204  19975   19975   19975.0 1   12.88   453.27  92000.0 2015    ... 0   0   0   0   0   0   0   0   0   0
...

It is therefore a classification problem for several classes. However, when I try to import data with pandas, they seem to freeze. When I try to create a Dask data frame, I can not use train_test_split as I get it:

NotImplementedError: 'DataFrame.iloc' only supports selecting columns. It must be used like 'df.iloc(:, column_indexer)'.

So how can I rate even a small amount of this record?

Here is my code:

import dask.dataframe as dd

# predicting model
import pandas
from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasClassifier
from keras.utils import np_utils
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import KFold
from sklearn.preprocessing import LabelEncoder
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import OneHotEncoder

# define baseline model
def baseline_model():
    # create model
    model = Sequential()
    # input layer
    model.add(Dense(100, input_dim=input_dim, activation='relu', kernel_constraint=maxnorm(3)))
    model.add(Dropout(0.2))

    # hidden layer
    model.add(Dense(60, activation='relu', kernel_constraint=maxnorm(3)))
    model.add(Dropout(0.2))

    # output layer
    model.add(Dense(output_dim, activation='softmax'))
    # Compile model
    model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=('accuracy'))
    return model

from dask.distributed import Client

# load dataset
X = result.loc(:, result.columns != 'TARGET')
Y = result('TARGET')
import sklearn.model_selection
from sklearn.model_selection import train_test_split

from sklearn.externals import joblib
from sklearn.externals.joblib import parallel_backend

client = Client()

with joblib.parallel_backend('dask'):
    print("Before train test split")
    train_X, test_X, train_y, test_y = train_test_split(X,result('TARGET'), test_size = 0.2, random_state = 0)
    print("before one hot encoder 1")
    train_y = pd.get_dummies(train_y)
    print("before one hot encoder 1")
    test_y = pd.get_dummies(test_y)
    print("Before Keras Classifier")
    estimator = KerasClassifier(build_fn=baseline_model, epochs=200, batch_size=5, verbose=1)
    kfold = KFold(n_splits=10, shuffle=True)
    results = cross_val_score(estimator, X, YDummies, cv=kfold)
    print("Baseline: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))

Here is the complete error message:

C:UsersantoiAppDataRoamingPythonPython36site-packagessklearnexternalsjoblib__init__.py:15: DeprecationWarning: sklearn.externals.joblib is deprecated in 0.21 and will be removed in 0.23. Please import this functionality directly from joblib, which can be installed with: pip install joblib. If this warning is raised when loading pickled models, you may need to re-serialize those models with scikit-learn 0.21+.
  warnings.warn(msg, category=DeprecationWarning)
Before train test split
---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
 in 
      6 with joblib.parallel_backend('dask'):
      7     print("Before train test split")
----> 8     train_X, test_X, train_y, test_y = train_test_split(X,result('TARGET'), test_size = 0.2, random_state = 0)
      9     print("before one hot encoder 1")
     10     train_y = pd.get_dummies(train_y)

~AppDataRoamingPythonPython36site-packagessklearnmodel_selection_split.py in train_test_split(*arrays, **options)
   2122 
   2123     return list(chain.from_iterable((safe_indexing(a, train),
-> 2124                                      safe_indexing(a, test)) for a in arrays))
   2125 
   2126 

~AppDataRoamingPythonPython36site-packagessklearnmodel_selection_split.py in (.0)
   2122 
   2123     return list(chain.from_iterable((safe_indexing(a, train),
-> 2124                                      safe_indexing(a, test)) for a in arrays))
   2125 
   2126 

~AppDataRoamingPythonPython36site-packagessklearnutils__init__.py in safe_indexing(X, indices)
    206         # Pandas Dataframes and Series
    207         try:
--> 208             return X.iloc(indices)
    209         except ValueError:
    210             # Cython typed memoryviews internally used in pandas do not support

~AppDataRoamingPythonPython36site-packagesdaskdataframeindexing.py in __getitem__(self, key)
     52         )
     53         if not isinstance(key, tuple):
---> 54             raise NotImplementedError(msg)
     55 
     56         if len(key) > 2:

NotImplementedError: 'DataFrame.iloc' only supports selecting columns. It must be used like 'df.iloc(:, column_indexer)'.

The only idea I have now is to do it with parts like:

for chunk in pd.read_csv(, chunksize=)
    do_processing()
    train_algorithm()

"Not applicable" for simple Cisco IOS OVAL definition processed in the Joval tool

So I'm trying to do lead test for Cisco IOS device. I use SSH and enable the Privilege Exec mode (Joval can even do this automatically). Here is my line_object:

show virtual-service version installed

And here is my line_state:

show virtual-service version installed
csr_mgmt

I know that my Cisco returns:


          
            show virtual-service version installed
            Virtual service csr_mgmt installed version:
 Name : csr_mgmt
 Version : 1.2.1

This must return "true" because there is "csr_mgmt" (I also tried regular expressions, including only. *). However, the result of my test is "Not rated" and for the definition "Not applicable". I'm sick of it on several platforms (Joval and another proprietary solution). I have also reviewed the OVALRepo from MITER and their configurations are the same. Is there something I missed and that is correct?

MS SQL Server outputs 500 errors if too many API requests are processed

We have API requests that insert data into MS SQL Server after certain API requests have been processed. The server issues 500 errors.

When we restart SQL Server, API requests are processed and inserted into the database.
We used

DBCC DROPCLEANBUFFERS
DBCC FLUSHPROCINDB
DBCC FREEPROCCACHE
DBCC FREESYSTEMCACHE

If the cache is not emptied after the restart, the cache is emptied. Is there any command we can use here except "restart"?

Design Pattern – How Are Dynamic Product Configuration Rules Processed?

I need to find a software architecture to support object configuration and validation in a dynamic environment.

Let's say I want to configure a bike that has many physical components available. Components such as tires, frames, gears, brakes, etc. And each of these components has its own attributes, for example:

  • tire
  • transmission
    • material
    • Number of speeds
  • frame

There are also physical and conceptual rules that define how components can be combined together to make a bicycle.
At the moment, I think truth tables are a minor approach to solving this problem where I can specify how components can be related. For example:

|-----------------------------------------|---------------|
|                    Input                | Output        |
| Tires(profile) | Gears (numberOfSpeeds) | Frame (size)  |
|----------------|------------------------|---------------|
| 21 mm          | 4 speeds               | 21"           |
| 22 mm          | 6 speeds               | 22"           |
| 22 mm          | 8 speeds               | 23"           |
|----------------|------------------------|---------------|

This means when the user sends the values ​​for the rule engine Profile = 22 mm and numberOfSpeeds = 6 speeds, then the engine defines the frame Size = 22 ",
However, if the truth table adds a new attribute to support a particular configuration, how can I support all previous conditions without changing them and considering the specific case?

For example, there is a new configuration that is specific to the gear material:

|------------------------------------------------------------|---------------|
|                               Input                        | Output        |
| Tires(profile) | Gears (numberOfSpeeds) | Gears (material) | Frame (size)  |
|----------------|------------------------|------------------|---------------|
| 21 mm          | 4 speeds               |                  | 21"           |
| 22 mm          | 6 speeds               |                  | 22"           |
| 22 mm          | 8 speeds               |                  | 23"           |
| 22 mm          | 8 speeds               | Carbon           | 24"           |
|----------------|------------------------|------------------|---------------|

How can I guarantee the previous rule execution without redesigning it (by adding certain values ​​or new rows for the gear material or frame with less than 24)?

litecoin – Which file / method will be processed to block reward?

I've managed to build an old coin (from LiteCoin sources), but my coin requires me to make changes to the way rewards are made. I looked at the source code to find out where block rewards are handled and what method does all the work. I found this (in which line of which file does the block reward depend?), But that seems to be old, since I do not see a main.cpp file in the litecoin source.

Does anyone know the modern files / methods that handle it? I've already managed to follow the steps to create my Genesis block, and everything seems to work. I am now in the process of making changes. I want to find the method that deals with building a block with a block reward. I will change how the block reward is made.

while I see chainparams.cpp has the ability to change the reward amount. I'm looking for a way to build a new reward system, so I have to look at how a coin is made out of nothing. My coin will not coin coins when creating the block, so I need to change the way this method works.

Settings on the camera for sound, color and flash exposure as well as possibly processed and in between on canon dslr taken over

Photography Stack Exchange is a question and answer site for professional, enthusiastic and amateur photographers. Join them; it only takes a minute:

Log In

How it works:

Everyone can ask a question

The best answers are chosen and rise to the top

Postprocessing – RAW image for the processed output data set

I'm trying to learn how to process RAW images into great final images, and my attempts keep failing. I think it would be really interesting / instructive to see what other people have done. In particular, I would love to find a collection of RAW images + the final image created from this RAW file to better understand the process. Does anyone know such a public record?

Plotting – How is data classified and processed as needed?

I have a list of data describing the coordinates (x, y axes) of points in the plane. For the same axis x several points may be present. If the number of coordinates is odd, just ignore the first or the last point. I want to know the average difference for the same coordinate x.
Part of the data is

data = {{0.1, 1.21109}, {0.1, 1.16829}, {0.1, 1.21109}, {0.1, 1.16829}, {0.1,
1.21109}, {0.1, 1.16829}, {0.1, 1.21109}, {0.1, 1.16829}, {0.1,
1.21109}, {0.1, 1.16829}, {0.1, 1.21109}, {0.1, 1.16829}, {0.1,
1.21109}, {0.1, 1.16829}, {0.1, 1.21109}, {0.1, 1.16829}, {0.15,
1.17271}, {0.15, 1.20571}, {0.15, 1.17271}, {0.15, 1.20571}, {0.15,
1.17271}, {0.15, 1.20571}, {0.15, 1.17271}, {0.15, 1.20571}, {0.15,
1.17271}, {0.15, 1.20571}, {0.15, 1.17271}, {0.15, 1.20571}, {0.15,
1.17271}, {0.15, 1.20571}, {0.15, 1.17271}, {0.15, 1.20571}, {0.15,
1.17271}, {0.15, 1.20571}, {0.15, 1.17271}, {0.15, 1.20571}, {0.15,
1.17271}, {0.15, 1.20571}, {0.15, 1.17271}, {0.15, 1.20571}, {0.2,
1,17552}, {0,2, 1,20246}, {0,2, 1,17552}, {0,2, 1,20246}, {0,2,
1,17552}, {0,2, 1,20246}, {0,2, 1,17552}, {0,2, 1,20246}, {0,2,
1,17552}, {0,2, 1,20246}};

<img src = "https://i.stack.imgur.com/ZWbNp.jpg

Any ideas?

python – Manage Data: Unprocessed data, processed data, and results from processed data

I've created a Python process that automates part of the process of editing support tickets in Salesforce. With selenium I extract all open support tickets and save them in an archive. From this list, I'll do some things (finding duplicate account records and finding additional account information) and save the results in 2 different directories: 1 for support tickets for non-duplicate accounts and another for potential duplicate records. And for each of these directories, I continue to edit them and save the results in another directory pair.

Here's an example of my directory structure:

ticket Processor
| - ticket_archive
| | - all_tickets_06_17_2019.pkl
| | - all_tickets_06_18_2019.pkl
| | - all_tickets_06_19_2019.pkl
| `- all_tickets_06_20_2019.pkl
| - non_duplicates
| | - approved
| | | - processing_06_20_2019.pkl
| | | - processing_06_21_2019.pkl
| | `- processing_06_22_2019.pkl
| | - failures
| | | - failed_06_20_2019.pkl
| | `- failed_06_22_2019.pkl
| `- unprocessed
| | - staged_for_approval_06_19_2019.pkl
| `- staged_for_approval_06_21_2019.pkl
`- potential duplicates
| - potential_duplicates_06_20_2019.pkl
| - potential_duplicates_06_21_2019.pkl
`- Potential_duplicates_06_22_2019.pkl

I'd like to know if there are some software paradigms for managing these ever-growing directories to determine the current state and subset of the data being processed.

My previous approach is to create a class TotalTicketScanner by simply extracting all support numbers (the unique ID for each support ticket) and making a record for each directory of support numbers. I'll make sure there's no new ticket in this pipeline so I do not duplicate my efforts.

I realized that this problem is not clear, and would be glad if someone pointed me to information about such problems

Thank you very much!

Refusal of entry – ESTA refused, visa still being processed

For the visa exemption and the arrival by plane you need an ESTA. Without one you can not board a plane to the USA.

Technically, you could fly to Canada or Mexico and drive across the border to the US. However, if your ESTA has been rejected, it means that the US has a reason why they may not want you in the country. Depending on the circumstances, there is a high probability that you will be rejected at the border.