performance – What can cause higher CPU time and duration for a given set of queries in trace(s) ran on two separate environments?

I’m troubleshooting a performance issue in a SQL Server DR environment for a customer. They are running queries that consistently take longer in their environment than our QA environment. After analyzing traces that were performed in both environments with the same parameters/filters and with the same version of SQL Server (2016 SP2) and the exact same database, we observed that both environment were picking the same execution plan(s) for the queries in question, and the number of reads/writes were close in both environments, however the total duration of the process in question and the CPU time logged in the trace were significantly higher in the customer environment. Duration of all processes in our QA environment was around 18 seconds, the customer was over 80 seconds, our CPU time was close to 10 seconds, theirs was also over 80 seconds. Also worth mentioning, both environments are currently configured to MAXDOP 1.

The customer has less memory (~100GB vs 120GB), and slower disks (10k HHD vs SSD) than our QA environment, but but more CPUs. Both environments are dedicated to this activity and should have little/no external load that wouldn’t match. I don’t have all the details on CPU architecture they are using, waiting for some of that information now. The customer has confirmed they have excluded SQL Server and the data/log files from their virus scanning. Obviously there could be a ton of issues in the hardware configuration.

I’m currently waiting to see a recent snapshot of their wait stats and system DMVs, the data we originally received, didn’t appear to have any major CPU, memory or Disk latency pressure. I recently asked them to check to see if the windows power setting was in performance or balanced mode, however I’m not certain that would have the impact we’re seeing or not if the CPUs were being throttled.

My question is, what factors can affect CPU time and ultimately total duration? Is CPU time, as shown in a sql trace, based primarily on the speed of the processors or are their other factors I should be taking in to consideration. The fact that both are generating the same query plans and all other things being as close as possible to equal, makes me think it’s related to the hardware SQL is installed on.

active directory – How to force Group Policy Management Console to send LDAP queries via 636 port

There is an article stating that 636 port can be used by GPMC for secure communications.

Active Directory Certificate Services were installed successfully. Certificate from DC was exported to the workstation. LDP.exe tool as well as ADSIEdit on workstation are able to connect via 636 port.

However traffic analyzer shows that GPMC still uses port 389 for LDAP connections.

How can I force Group Policy Management Console to establish a secure connection via 636 port?

Thanks in advance!

html – Multiple queries in one view Django

Hi I need some help with a little problem, I am new to Django and have not mastered it properly yet. I am trying to render a view in pdf, to put you in situation I am going to explain what I need. It is an application to manage properties. Obviously each client can have multiple properties. The problem is that when listing the properties of a client it only shows me one record.

Here my view:

def pdf_generation(request, *args, **kwargs):
pk = kwargs.get('pk')
client = Client.objects.get(pk=pk)
farm = Farm.objects.filter(pk=pk)
context = {
    'client': client,
    'farm': farm,
}
html_string = render_to_string('client_pdf.html', context)
html = HTML(string=html_string, base_url=request.build_absolute_uri())
pdf = html.write_pdf(
    stylesheets=(CSS(settings.STATIC_ROOT + '/css/pdf.css')))
response = HttpResponse(pdf, content_type='application/pdf')
response('Content-Disposition') = 'inline; filename="export.pdf"'
return response

Here my url

    path('export/<int:pk>', views.pdf_generation, name='client_export_id'),

Here my html code for client

        <tbody>
            <tr>
                <th scope="row">{{ client.id }}</th>
                <th scope="row">{{ client.name }}</th>
                <th scope="row">{{ client.nif }}</th>
                <th scope="row">{{ client.technical }}</th>
                <th scope="row">{{ client.phone }}</th>
                <th scope="row">{{ client.record|date:"j F, Y" }}</th>
            </tr>
        </tbody>

And finally my html code for farm

        <tbody>
            {% for obj in farm %}
            <tr>
                <th scope="row">{{ obj.id }}</th>
                <th scope="row">{{ obj.manager }}</th>
                <th scope="row">{{ obj.farm }}</th>
                <th scope="row">{{ obj.town }}</th>
                <th scope="row">{{ obj.place }}</th>
                <th scope="row">{{ obj.production }}</th>
            </tr>
            {% endfor %}
        </tbody>

postgresql – Best way to solve a lot of queries stuck because of no index

found this table has 800,000 rows and does not have any indexes

Is there any safer and more sufficient way to solve this problem?

Only really re-emptive work:

  • Proper design work up-front so there are no large tables with no indexes, or not common queries that are not well-supported by the existing indexes. This might be out of your hands if you don’t work directly with the development team for the application, but as a DBA you could monitor your databases for potentially worrying structures (i.e. a table with no keys/indexes, regular slow running queries if you log those, …). Recheck after application updates, in case a structure migration has failed silently and left indexes undefined that the developers have added.

  • Load testing on the application to make sure no such issues are likely (you often can’t rule them out entirely, but you can certainly minimise the risk). Again, this may be out of your hands.

but I think this approach is risky

A long index build, especially if done as an offline operation so could hold up all other access to the table, could be problematic, but when this happens adding the index is the only solution. Perhaps go for an online index build if possible – that will take longer but won’t completely block part of the application while it does the job.

Also, if you have a dev copy of the system, test the new indexing changes there before applying to production to avoid making a change that will take ages to apply and have little of the benefit you are needing.

microservices – Where to place an in-memory cache to handle repetitive bursts of database queries from several downstream sources, all within a few milliseconds span

I’m working on a Java service that runs on Google Cloud Platform and utilizes a MySQL database via Cloud SQL. The database stores simple relationships between users, accounts they belong to, and groupings of accounts. Being an “accounts” service, naturally there are many downstreams. And downstream service A may for example hit several other upstream services B, C, D, which in turn might call other services E and F, but because so much is tied to accounts (checking permissions, getting user preferences, sending emails), every service from A to F end up hitting my service with identical, repetitive calls. So in other words, a single call to some endpoint might result in 10 queries to get a user’s accounts, even though obviously that information doesn’t change over a few milliseconds.

So where is it it appropriate to place a cache?

  1. Should downstream service owners be responsible for implementing a cache? I don’t think so, because why should they know about my service’s data, like what can be cached and for how long.

  2. Should I put an in-memory cache in my service, like Google’s Common CacheLoader, in front of my DAO? But, does this really provide anything over MySQL’s caching? (Admittedly I don’t know anything about how databases cache, but I’m sure that they do.)

  3. Should I put an in-memory cache in the Java client? We use gRPC so we have generated clients that all those services A, B, C, D, E, F use already. Putting a cache in the client means they can skip making outgoing calls but only if the service has made this call before and the data can have a long-enough TTL to be useful, e.g. an account’s group is permanent. So, yea, that’s not helping at all with the “bursts,” not to mention the caches living in different zone instances. (I haven’t customized a generated gRPC client yet, but I assume there’s a way.)

I’m leaning toward #2 but my understanding of databases is weak, and I don’t know how to collect the data I need to justify the effort. I feel like what I need to know is: How often do “bursts” of identical queries occur, how are these bursts processed by MySQL (esp. given caching), and what’s the bottom-line effect on downstream performance as a result, if any at all?

I feel experience may answer this question better than finding those metrics myself.

Asking myself, “Why do I want to do this, given no evidence of any bottleneck?” Well, (1) it just seems wrong that there’s so many duplicate queries, (2) it adds a lot of noise in our logs, and (3) I don’t want to wait until we scale to find out that it’s a deep issue.

database – Extracting user field values from dynamic SQL queries

Aim

I have successfully written a fairly long dynamic sql query, however am struggling with a seemingly simple part at the end.

Although, I am able to successfully extract mail and name from the users table, when I try to extract field_first_name it returns the error below.

The users table has a column with the machine name: field_first_name

Code

    $database = Drupal::service('database');

    $select = $database->select('flagging', 'f');
    $select->fields('f', array('uid', 'entity_id'));
    $select->leftJoin('node__field_start_datetime', 'nfds', 'nfds.entity_id = f.entity_id');
    $select->fields('nfds', array('field_start_datetime_value'));
    $select->leftJoin('node_field_data', 'nfd', 'nfd.nid = f.entity_id');
    $select->fields('nfd', array('title'));
    $select->leftJoin('users_field_data', 'ufd', 'ufd.uid = f.uid');
    // TODO extract first name
    $select->fields('ufd', ('mail', 'name', 'field_first_name'));

    $executed = $select->execute();
    $results = $executed->fetchAll(PDO::FETCH_ASSOC);

    $username = $result('name');
    $email = $result('mail');
    $first_name = $result('field_first_name');

Error

DrupalCoreDatabaseDatabaseExceptionWrapper: SQLSTATE(42S22): Column not found: 1054 Unknown column 'ufd.field_first_name' in 'field list': SELECT f.uid AS uid, f.entity_id AS entity_id, nfds.field_start_datetime_value AS field_start_datetime_value, nfd.title AS title, ufd.mail AS mail, ufd.name AS name, ufd.field_first_name AS field_first_name FROM {flagging} f LEFT OUTER JOIN {node__field_start_datetime} nfds ON nfds.entity_id = f.entity_id LEFT OUTER JOIN {node_field_data} nfd ON nfd.nid = f.entity_id LEFT OUTER JOIN {users_field_data} ufd ON ufd.uid = f.uid; Array ( ) in event_notification_cron() (line 63 of /app/modules/custom/event_notification/event_notification.module).

postgresql – How I can find the most resource-intensive queries that has ever run in my database and use a specific table?

In my database I want to check whether my idea on an improvement upon table schema works. Therefore I want so look which are the heaviest select queries ever run into our database and use a specific table that I want to do the changes.

My idea is to check the speed and database load of our heaviest queries in my database, then do the schema changes (or possible query refactoring as well) and rerun them so I can benchmark my idea with real data.

Hence I can have a proof that my changes actually improve search speed.

So how I can search which queries upon a postgresql RDS instance even run and how heavy they are?

Is there an online preprocessing algorithm for Range Minimum Queries (RMQ)?

Is there a linear time online version of the preprocessing RMQ algorithm? That is, an algorithm that allows to update the data structure when appending additional elements at the end of the input array in O(1) (worst case or amortized) time per element (while still allowing answering arbitrary queries in constant time)?

I am aware of the sliding queries algorithm, which is online by nature but restricts the type of allowed queries. I’ve also seen this more general question, which is still unanswered by the time of writing this post.

locking – Insert queries creating locked objects Oracle

In my asynchronous Spring Boot application I started noticing that threads where not returning from save() method – they were not dead, still running but hanging/being blocked during the saving to the database part.

To investigate this, I issued a query to see if there were any locked objects, and indeed there were as many locked objects as there were blocked threads in my application. Also, the queries of the blocked objects were ‘update’ queries which was even more puzzling since the only thing that application does is insert new data – no deletes or updates.

Any ideas what could be causing this? Any suggestions how I could investigate this further?
Thank you.

collision detection – Optimizing a quadtree for circles and circular queries

I’m developing a 2D ant simulation in JavaScript. I’d like to implement a quadtree to store the positions of ants and other markers. I approximate all of these entities as circles, and I’m only querying for circles that fall within the radius of a query circle.

Most of the things I’ve read about quadtrees implement rectangular entities/queries very efficiently. Is there a more efficient way of implementing a quadtree given all entities/queries are circular? Otherwise I would plan to store each circle as a square and perform a circular collision check on top of each query result.