Algorithms – Shuffle a collection that is too large for memory

I have a set that is too large to hold in memory, but I have a function that I can use to index a value within the set. I'm curious to see if there is a standard way to mix this set. A pseudo-shuffle is acceptable and expected.

I imagine a function that is initialized with some parameters, including the size of the sentence to be mixed, that I can call repeatedly, each time getting unique and pseudorandom indexes that I can use to generate the appropriate value in my sentence . I also imagine that this function is periodic and repeats itself after going through all the values ​​once (so there is a mod relationship).

Thanks a lot.

Terminology: "Sufficiently large absolute constant"

I am currently reading the newspaper: "RANDOM MATRICES: THE DISTRIBUTION OF THE
SMALLEST UNIQUE VALUES "by" Terence Tao and Van Vu "
and came across a terminology that I don't fully understand (rigorously).

In Theorem 1.3, the authors state that $ mathbb {E} (| xi | ^ {C_0}) < infty $ for some sufficiently large absolute constant $ C_0> 0 $. What does "sufficiently large absolute constant" mean? I googled it, but I couldn't find a definition.

postgresql – Fill a table with an X number of rows with default values ​​with a large insert query? Or can it be done somehow by default?

I have a table, let's say calendar

Table Calendar --> idCal (int) , numberOfFields (int)

And each calendar has assigned a number of fields

Table Field --> idField(int), textField(text), idCal (int) *fk

The point is now, every time a user registers, they are assigned a calendar. Once the number of fields is filled in, I select this value and generate an insert query similar to the following:

INSERT INTO Field (textField, idCal) Values ("","idOfTheGeneratedCalendar") , ("","idOfTheGeneratedCalendar") ...... 

Until I have a number of rows that correspond to numberOfFields from the table calendar. Each idField (int) begins with 0 and an automatic increment up to any number of fields takes place

I do this for every user. The point is … is there a better way to do this without building large insert queries, each with around 3000 values, using a for iteration? Should I be concerned?

bash – The hard drive is full, but cannot find large files

I ran a bash script that created a 14G file in the tmp directory. I deleted it, but I can't find the directory or file that is large.

My output for df -h


Filesystem      Size  Used Avail Use% Mounted on
udev            474M     0  474M   0% /dev
tmpfs            99M   11M   88M  11% /run
/dev/vda1        25G   25G     0 100% /
tmpfs           491M     0  491M   0% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
tmpfs           491M     0  491M   0% /sys/fs/cgroup
/dev/vda15      105M  3.9M  101M   4% /boot/efi
/dev/loop0       90M   90M     0 100% /snap/core/7917
/dev/loop1       55M   55M     0 100% /snap/lxd/12211
/dev/loop2       94M   94M     0 100% /snap/core/8935
/dev/loop3       68M   68M     0 100% /snap/lxd/14194
tmpfs            99M     0   99M   0% /run/user/0
/dev/loop4       55M   55M     0 100% /snap/core18/1705
/dev/loop5       49M   49M     0 100% /snap/gtk-common-themes/1474
/dev/loop6      153M  153M     0 100% /snap/chromium/1071
tmpfs            99M     0   99M   0% /run/user/1000

My output for du -sh in the directory /

du: cannot access './proc/19935/task/19935/fd/4': No such file or directory
du: cannot access './proc/19935/task/19935/fdinfo/4': No such file or directory
du: cannot access './proc/19935/fd/3': No such file or directory
du: cannot access './proc/19935/fdinfo/3': No such file or directory
4.7G    .

I can't install ncdu or any other tools because the hard drive is full considering the summarized size after du -sh where the rest of the space is 25 GB

Algorithms – How can you scrape off a large number of websites?

I have a large number of URLs (approx. 150-300). They are all about the same thing. I'm familiar with static and dynamic web scraping content (I'm using Python for the job), but I have absolutely no idea how to deal with this problem. Should I script each site? Is there a proper regex algorithm to find the data you're looking for (the data is just houses in a certain cost range)? Can I create a single script to remove all, and if so, how?

Architecture – structuring large and expandable projects

TLDR with bald head

I want to create a library (I think that's the right term) for my own learning environments for reinforcement (envs for short).. Most Envs are based on self-implemented games that were created either in pure Python or in C ++ with Python bindings. What would be the best way to structure this project so that it is easy to expand, maintain and make the most sense? I want to be able to reuse code, e.g. B. using a general board class for all my board game implementations (e.g. chess, lot, Gomoku). I plan to make it cross-platform with the help of CMake and could even dare to pack it as a Conda package.

On my first search, I found that this layout (and its variations) is popular, and decided to rely on it.

My original plan was To structure my library into projects, create a repo for each project and insert one project as a Git submodule into another. To create the environment for the game
In 2048 the structure would be as follows (CMakeLists omitted):

(The env vector is based on the env-2048, which is based on the game-2048 that uses the general board class.)

general-board
├── external
│   └── Catch2/
├── include
│   └── general-board
│       └── file.h
├── src
│   └── file.cpp
└── tests
    └── tests.cpp

game-2048
├── app
│   └── manual_game.cpp
├── external
│   ├── Catch2
│   └── general-board
├── include
│   └── game-2048
│       └── file.h
├── src
│   └── file.cpp
└── tests
    └── tests.cpp

env-2048
├── external
│   ├── Catch2/
│   └── game-2048/
├── include
│   └── env-2048
│       └── file.h
├── src
│   └── file.cpp
└── tests
    └── tests.cpp 

env-vector <---- this would be on the top, bundling the envs together
├── external
│   ├── Catch2/
│   ├── env-2048/ 
│   ├── env-chess/ <---- another board game
│   └── env-go/ <---- another board game
├── include
│   └── env-vector
│       └── file.h
├── python
│   └── pybind11_magic_here
├── src
│   └── file.cpp
└── tests
    └── tests.cpp

After some implementation, I was worried if the number of sub-modules and redundancy got too large. With this structure, the project above would include the General Board project n times (where n is the number of games based on one board), and the catch2 would be even more included. Seems fishy and prone to errors.

My second idea was to create a large project and to include everything in the project in a "flat" way and not in a "nested" way as before. It would look like this:

(line ending with '/' depicts a folder)
environments_all_in_one
│
├── external
│   └── Catch2/
├── include
│   └── environments_all_in_one
│       └── **not_even_sure_what_to_put_here**      
├── python
│   └── pybind11_magic_here
├── src
│   ├── env_vector
│   ├── envs
│   │   ├── env-2048/
│   │   ├── env-chess/
│   │   └── env-go/
│   ├── games
│   │   ├── game-2048/
│   │   ├── game-chess/
│   │   └── game-go/
│   └── general-board
│       ├── board_abc/
│       ├── board_array/
│       └── board_vector/
└── tests
    └── tests.cpp

In this way, code would not be present more than once and definitely contributes to transparency. However, since I have no experience, I have to ask:

Is there a better way to do this?

Permalinks – How can I edit the image URL in the presented image and product description in large quantities?

I tried a few plugins (free of charge), but none updated the URL in the product image and in the product descriptions.

How can I update in bulk instead of running them individually? If there is a plugin (preferably free) that can edit the image url, please let me know.

Thanks a lot

Java – Does walking through a large record on a map affect performance with every click?

I need to get values ​​from a map by adapter position that I get from a RecyclerView.

As you can see, every time I click on an album cover, I create a new array of album objects.

final Album() albums = new Album(albumMap.size());
      for (Map.Entry e : albumMap.entrySet()){
      albums(i++) = e.getValue();
 }

Then I get the album like this: String selectedAlbum = albums(position).getAlbum();

But what if someone has more than 10,000 albums on their device and every time an album is clicked, a new array of album objects is created and i iterate just to get the album name.

Would this affect performance if there were many albums?

TL; DR Is this code bad?

Complete code

@Override
    public void onClickAlbum(int position, Map albumMap) {
        if (getActivity() != null) {
            int i = 0;
            final Album() albums = new Album(albumMap.size());
            for (Map.Entry e : albumMap.entrySet()){
                albums(i++) = e.getValue();
            }
            String selectedAlbum = albums(position).getAlbum();
            Main.getInstance().setSongsFilteredBy(SongsLibrary.getInstance().getSongsByAlbum(selectedAlbum));
            Intent intent = new Intent(getActivity(), ListSongsActivity.class);
            intent.putExtra("title", selectedAlbum);
            startActivity(intent);
            Toast.makeText(getActivity(), "test: " + selectedAlbum, Toast.LENGTH_SHORT).show();
        }
}