amazon s3 – assets hosted on S3 behind Cloudfront used by several domains, the access-control-allow-origin does not vary

I have the following terraform

# Bucket to put backendphp assets (images/css/js)
#
resource "aws_s3_bucket" "assets" {
  bucket = local.workspace("assets_domain_name")
  acl    = "public-read"

  cors_rule {
    allowed_headers = ("*")
    allowed_methods = ("GET")
    allowed_origins = (
      "https://app.example.com",
      "https://admin.example.com",
      "https://backoffice.example.com"   
    )
    expose_headers  = ("ETag")
    max_age_seconds = 3000
  }

}


# pre-existing policy defined by AWS
data "aws_cloudfront_origin_request_policy" "this" {
  name = "Managed-CORS-S3Origin"
  # not compatible with tags
}

# pre-existing policy defined by AWS
data "aws_cloudfront_cache_policy" "this" {
  name = "Managed-CachingOptimized"
  # not compatible with tags
}

# Cloudfront distribution for "http" to "https" redirections for 'assets' subdomains
resource "aws_cloudfront_distribution" "assets_distribution" {
  origin {
    custom_origin_config {
      http_port              = "80"
      https_port             = "443"
      origin_protocol_policy = "http-only"
      origin_ssl_protocols   = ("TLSv1", "TLSv1.1", "TLSv1.2")
    }

    domain_name = aws_s3_bucket.assets.bucket_regional_domain_name
    origin_id   = aws_s3_bucket.assets.bucket
  }

  enabled             = true
  default_root_object = "index.html"

  default_cache_behavior {
    viewer_protocol_policy = "redirect-to-https"
    allowed_methods        = ("GET", "HEAD")
    cached_methods         = ("GET", "HEAD")
    target_origin_id       = aws_s3_bucket.assets.bucket
    min_ttl                = 0

    origin_request_policy_id = data.aws_cloudfront_origin_request_policy.this.id
    cache_policy_id          = data.aws_cloudfront_cache_policy.this.id
  }


  aliases = (
    aws_s3_bucket.assets.bucket,
  )



  viewer_certificate {
    acm_certificate_arn = aws_acm_certificate.assets_certificate.arn
    ssl_support_method  = "sni-only"
  }

  depends_on = (
    aws_acm_certificate.assets_certificate,
  )
}

the goal is to have the assets stored once , and usable by the different “frontends” of my webapps.

As I use resource integry i.e <link rel="stylesheet" href="https://assets.example.info/build/tailwind.784de744.css" crossorigin="anonymous" integrity="sha384-K5W1t5mSLgPoYODxKuVqxYbaCfZko17QXhZn2cJKeIBgTpmpKoNLRZn+msahlR81"> , it triggers CORS verification

  • if I use the s3 bucket directly, it works the css are loaded on all domains
  • If I use the s3 buckets and i put * as the allowed origins , it works through cloudfront

HOWEVER

  • if i put the list of specific, whitelisted, allowed origin
  • I visit first app.example.com , the css is loaded correctly
  • I vist then admin.example.com the css is not loaded, because cloudfront has cached “allow-origin: https://app.example.com”

it seems it was the error precised by this other question

but that was solved by putting the Origin header as being whitelisted so that cloudfront use it to generate its cache key.

HOWEVER I already do that by using the Managed-CORS-S3Origin

Is there something I’m missing here or is there something that has changed since ?

Developers Focus: Hosted or Cloud? | Forum Promotion

directory – WordPress hosted on subfolder of domain causing customizer to constantly refresh and is requesting jquery from root domain

So I have wordpress installed in a subfolder of my domain: https://example.com/blog

To do this I am reverse proxying ngnix (main app) to apache (wordpress server) and everything is going great.

One weird problem I noticed though is when I go to the customizer it keeps refreshing constantly. It looks like its trying to request jquery from the root https://example.com even though:

Site Address (URL)
WordPress Address (URL)

Are both set to: https://example.com/blog

Any ideas?

ubuntu – PiVPN installation prevents hosted DNS from receiving requests

I had a working setup running PiHole and Unbound as the upstream DNS. I had UFW set up, such that only my home IP could make requests to PiHole (ie. only my home IP packets will be accepted on port 53). This setup worked.

After installing PiVPN (Wireguard), devices on my home network can no longer resolve DNS queries. This is confirmed by the PiHole log not showing any queries being requested. TCPDump shows that these packets are getting to the server however.

Unbound listens on 5353. PiHole listens on 53. Apache is also running on this server on 80 and 443. I use UFW as my firewall.

I have attempted to do ufw allow 53 to allow all traffic to PiHole, that did not fix the issue and neither did ufw disable. I attempted to remove DNSMASQ_LISTENNING=local from the PiHole setupVars.conf (added back) to no avail.

Here’s what I get from pivpn debug

::: Generating Debug Output
::::        PiVPN debug      ::::
=============================================
::::        Latest commit        ::::
commit d7771c251418fa443869397d46f93c5b0c197558
Author: 4s3ti <4s3ti@protonmail.com>
Date:   Sat Feb 6 23:04:11 2021 +0100

    Merge branch test into master
    
    fixes #1234
    ci/cd fixes and improvements
=============================================
::::        Installation settings        ::::
PLAT=Ubuntu
OSCN=focal
USING_UFW=1
IPv4dev=ens3
install_user=pivpn
install_home=/home/pivpn
VPN=wireguard
pivpnPORT=20
pivpnDNS1=10.6.0.1
pivpnDNS2=
pivpnHOST=REDACTED
pivpnPROTO=udp
pivpnDEV=wg0
pivpnNET=10.6.0.0
subnetClass=24
ALLOWED_IPS="0.0.0.0/0, ::0/0"
UNATTUPG=1
INSTALLED_PACKAGES=(wireguard-tools qrencode)
=============================================
::::  Server configuration shown below   ::::
(Interface)
PrivateKey = server_priv
Address = 10.6.0.1/24
ListenPort = 20
=============================================
::::  Client configuration shown below   ::::
::: There are no clients yet
=============================================
::::    Recursive list of files in   ::::
::::    (4m/etc/wireguard shown below    ::::
/etc/wireguard:
configs
keys
wg0.conf

/etc/wireguard/configs:
clients.txt

/etc/wireguard/keys:
server_priv
server_pub
=============================================
::::        Self check       ::::
:: (OK) IP forwarding is enabled
:: (OK) Ufw is enabled
:: (OK) Iptables MASQUERADE rule set
:: (OK) Ufw input rule set
:: (OK) Ufw forwarding rule set
:: (OK) WireGuard is running
:: (OK) WireGuard is enabled (it will automatically start on reboot)
:: (OK) WireGuard is listening on port 20/udp
=============================================
:::: Having trouble connecting? Take a look at the FAQ:
:::: https://github.com/pivpn/pivpn/wiki/FAQ
=============================================
:::: WARNING: This script should have automatically masked sensitive       ::::
:::: information, however, still make sure that PrivateKey, PublicKey      ::::
:::: and PresharedKey are masked before reporting an issue. An example key ::::
:::: that you should NOT see in this log looks like this:                  ::::
:::: YIAoJVsdIeyvXfGGDDadHh6AxsMRymZTnnzZoAb9cxRe                          ::::
=============================================
::::        Debug complete       ::::
::: 
::: Debug output completed above.
::: Copy saved to /tmp/debug.log
::: 

Output of sudo iptables -vnL

Chain INPUT (policy DROP 5 packets, 250 bytes)
 pkts bytes target     prot opt in     out     source               destination         
  183 15110 ACCEPT     all  --  lo     *       0.0.0.0/0            0.0.0.0/0           
 1113  272K ACCEPT     all  --  *      *       0.0.0.0/0            0.0.0.0/0            ctstate RELATED,ESTABLISHED
    0     0 ACCEPT     esp  --  *      *       0.0.0.0/0            0.0.0.0/0           
    0     0 ACCEPT     ah   --  *      *       0.0.0.0/0            0.0.0.0/0           
    0     0 ACCEPT     icmp --  *      *       0.0.0.0/0            0.0.0.0/0            icmptype 8 limit: up to 5/sec burst 5 mode srcip
    0     0 ACCEPT     udp  --  *      *       0.0.0.0/0            0.0.0.0/0            multiport dports 123
    3   148 ACCEPT     tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            tcp dpt:22 ctstate NEW
    0     0 ACCEPT     udp  --  *      *       0.0.0.0/0            172.31.141.238       udp dpt:53
 1800  121K ufw-before-logging-input  all  --  *      *       0.0.0.0/0            0.0.0.0/0           
 1800  121K ufw-before-input  all  --  *      *       0.0.0.0/0            0.0.0.0/0           
   66  3676 ufw-after-input  all  --  *      *       0.0.0.0/0            0.0.0.0/0           
   66  3676 ufw-after-logging-input  all  --  *      *       0.0.0.0/0            0.0.0.0/0           
   66  3676 ufw-reject-input  all  --  *      *       0.0.0.0/0            0.0.0.0/0           
   66  3676 ufw-track-input  all  --  *      *       0.0.0.0/0            0.0.0.0/0           

Chain FORWARD (policy DROP 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination         
    0     0 ACCEPT     all  --  *      *       10.0.8.0/24          10.0.8.0/24         
    0     0 DROP       all  --  *      *       10.0.8.0/24          169.254.0.0/16      
    0     0 ACCEPT     all  --  *      *       0.0.0.0/0            0.0.0.0/0            ctstate RELATED,ESTABLISHED
    0     0 ACCEPT     tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            tcp dpt:445
    0     0 DROP       udp  --  *      *       0.0.0.0/0            0.0.0.0/0            multiport ports 137,138
    0     0 DROP       tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            multiport ports 137,139
    0     0 ACCEPT     all  --  *      *       10.0.8.0/24          0.0.0.0/0            ctstate NEW policy match dir in pol none
    0     0 ufw-before-logging-forward  all  --  *      *       0.0.0.0/0            0.0.0.0/0           
    0     0 ufw-before-forward  all  --  *      *       0.0.0.0/0            0.0.0.0/0           
    0     0 ufw-after-forward  all  --  *      *       0.0.0.0/0            0.0.0.0/0           
    0     0 ufw-after-logging-forward  all  --  *      *       0.0.0.0/0            0.0.0.0/0           
    0     0 ufw-reject-forward  all  --  *      *       0.0.0.0/0            0.0.0.0/0           
    0     0 ufw-track-forward  all  --  *      *       0.0.0.0/0            0.0.0.0/0           

Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination         
    0     0 ACCEPT     all  --  *      *       0.0.0.0/0            10.0.8.0/24          owner GID match 15000
    0     0 DROP       all  --  *      *       0.0.0.0/0            169.254.0.0/16       owner GID match 15000
 1541  306K ufw-before-logging-output  all  --  *      *       0.0.0.0/0            0.0.0.0/0           
 1541  306K ufw-before-output  all  --  *      *       0.0.0.0/0            0.0.0.0/0           
  166 11698 ufw-after-output  all  --  *      *       0.0.0.0/0            0.0.0.0/0           
  166 11698 ufw-after-logging-output  all  --  *      *       0.0.0.0/0            0.0.0.0/0           
  166 11698 ufw-reject-output  all  --  *      *       0.0.0.0/0            0.0.0.0/0           
  166 11698 ufw-track-output  all  --  *      *       0.0.0.0/0            0.0.0.0/0           

Chain ufw-after-forward (1 references)
 pkts bytes target     prot opt in     out     source               destination         

Chain ufw-after-input (1 references)
 pkts bytes target     prot opt in     out     source               destination         
    0     0 ufw-skip-to-policy-input  udp  --  *      *       0.0.0.0/0            0.0.0.0/0            udp dpt:137
    0     0 ufw-skip-to-policy-input  udp  --  *      *       0.0.0.0/0            0.0.0.0/0            udp dpt:138
    0     0 ufw-skip-to-policy-input  tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            tcp dpt:139
    0     0 ufw-skip-to-policy-input  tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            tcp dpt:445
    0     0 ufw-skip-to-policy-input  udp  --  *      *       0.0.0.0/0            0.0.0.0/0            udp dpt:67
    0     0 ufw-skip-to-policy-input  udp  --  *      *       0.0.0.0/0            0.0.0.0/0            udp dpt:68
    0     0 ufw-skip-to-policy-input  all  --  *      *       0.0.0.0/0            0.0.0.0/0            ADDRTYPE match dst-type BROADCAST

Chain ufw-after-logging-forward (1 references)
 pkts bytes target     prot opt in     out     source               destination         
    0     0 LOG        all  --  *      *       0.0.0.0/0            0.0.0.0/0            limit: avg 3/min burst 10 LOG flags 0 level 4 prefix "(UFW BLOCK) "

Chain ufw-after-logging-input (1 references)
 pkts bytes target     prot opt in     out     source               destination         
   48  2928 LOG        all  --  *      *       0.0.0.0/0            0.0.0.0/0            limit: avg 3/min burst 10 LOG flags 0 level 4 prefix "(UFW BLOCK) "

Chain ufw-after-logging-output (1 references)
 pkts bytes target     prot opt in     out     source               destination         

Chain ufw-after-output (1 references)
 pkts bytes target     prot opt in     out     source               destination         

Chain ufw-before-forward (1 references)
 pkts bytes target     prot opt in     out     source               destination         
    0     0 ACCEPT     all  --  *      *       0.0.0.0/0            0.0.0.0/0            ctstate RELATED,ESTABLISHED
    0     0 ACCEPT     icmp --  *      *       0.0.0.0/0            0.0.0.0/0            icmptype 3
    0     0 ACCEPT     icmp --  *      *       0.0.0.0/0            0.0.0.0/0            icmptype 11
    0     0 ACCEPT     icmp --  *      *       0.0.0.0/0            0.0.0.0/0            icmptype 12
    0     0 ACCEPT     icmp --  *      *       0.0.0.0/0            0.0.0.0/0            icmptype 8
    0     0 ufw-user-forward  all  --  *      *       0.0.0.0/0            0.0.0.0/0           

Chain ufw-before-input (1 references)
 pkts bytes target     prot opt in     out     source               destination         
    0     0 ACCEPT     all  --  lo     *       0.0.0.0/0            0.0.0.0/0           
    0     0 ACCEPT     all  --  *      *       0.0.0.0/0            0.0.0.0/0            ctstate RELATED,ESTABLISHED
    5   208 ufw-logging-deny  all  --  *      *       0.0.0.0/0            0.0.0.0/0            ctstate INVALID
    5   208 DROP       all  --  *      *       0.0.0.0/0            0.0.0.0/0            ctstate INVALID
    0     0 ACCEPT     icmp --  *      *       0.0.0.0/0            0.0.0.0/0            icmptype 3
    0     0 ACCEPT     icmp --  *      *       0.0.0.0/0            0.0.0.0/0            icmptype 11
    0     0 ACCEPT     icmp --  *      *       0.0.0.0/0            0.0.0.0/0            icmptype 12
    0     0 ACCEPT     icmp --  *      *       0.0.0.0/0            0.0.0.0/0            icmptype 8
    0     0 ACCEPT     udp  --  *      *       0.0.0.0/0            0.0.0.0/0            udp spt:67 dpt:68
 1795  121K ufw-not-local  all  --  *      *       0.0.0.0/0            0.0.0.0/0           
    0     0 ACCEPT     udp  --  *      *       0.0.0.0/0            224.0.0.251          udp dpt:5353
    0     0 ACCEPT     udp  --  *      *       0.0.0.0/0            239.255.255.250      udp dpt:1900
 1795  121K ufw-user-input  all  --  *      *       0.0.0.0/0            0.0.0.0/0           

Chain ufw-before-logging-forward (1 references)
 pkts bytes target     prot opt in     out     source               destination         

Chain ufw-before-logging-input (1 references)
 pkts bytes target     prot opt in     out     source               destination         

Chain ufw-before-logging-output (1 references)
 pkts bytes target     prot opt in     out     source               destination         

Chain ufw-before-output (1 references)
 pkts bytes target     prot opt in     out     source               destination         
  183 15110 ACCEPT     all  --  *      lo      0.0.0.0/0            0.0.0.0/0           
 1192  279K ACCEPT     all  --  *      *       0.0.0.0/0            0.0.0.0/0            ctstate RELATED,ESTABLISHED
  166 11698 ufw-user-output  all  --  *      *       0.0.0.0/0            0.0.0.0/0           

Chain ufw-logging-allow (0 references)
 pkts bytes target     prot opt in     out     source               destination         
    0     0 LOG        all  --  *      *       0.0.0.0/0            0.0.0.0/0            limit: avg 3/min burst 10 LOG flags 0 level 4 prefix "(UFW ALLOW) "

Chain ufw-logging-deny (2 references)
 pkts bytes target     prot opt in     out     source               destination         
    5   208 RETURN     all  --  *      *       0.0.0.0/0            0.0.0.0/0            ctstate INVALID limit: avg 3/min burst 10
    0     0 LOG        all  --  *      *       0.0.0.0/0            0.0.0.0/0            limit: avg 3/min burst 10 LOG flags 0 level 4 prefix "(UFW BLOCK) "

Chain ufw-not-local (1 references)
 pkts bytes target     prot opt in     out     source               destination         
 1795  121K RETURN     all  --  *      *       0.0.0.0/0            0.0.0.0/0            ADDRTYPE match dst-type LOCAL
    0     0 RETURN     all  --  *      *       0.0.0.0/0            0.0.0.0/0            ADDRTYPE match dst-type MULTICAST
    0     0 RETURN     all  --  *      *       0.0.0.0/0            0.0.0.0/0            ADDRTYPE match dst-type BROADCAST
    0     0 ufw-logging-deny  all  --  *      *       0.0.0.0/0            0.0.0.0/0            limit: avg 3/min burst 10
    0     0 DROP       all  --  *      *       0.0.0.0/0            0.0.0.0/0           

Chain ufw-reject-forward (1 references)
 pkts bytes target     prot opt in     out     source               destination         

Chain ufw-reject-input (1 references)
 pkts bytes target     prot opt in     out     source               destination         

Chain ufw-reject-output (1 references)
 pkts bytes target     prot opt in     out     source               destination         

Chain ufw-skip-to-policy-forward (0 references)
 pkts bytes target     prot opt in     out     source               destination         
    0     0 DROP       all  --  *      *       0.0.0.0/0            0.0.0.0/0           

Chain ufw-skip-to-policy-input (7 references)
 pkts bytes target     prot opt in     out     source               destination         
    0     0 DROP       all  --  *      *       0.0.0.0/0            0.0.0.0/0           

Chain ufw-skip-to-policy-output (0 references)
 pkts bytes target     prot opt in     out     source               destination         
    0     0 ACCEPT     all  --  *      *       0.0.0.0/0            0.0.0.0/0           

Chain ufw-track-forward (1 references)
 pkts bytes target     prot opt in     out     source               destination         

Chain ufw-track-input (1 references)
 pkts bytes target     prot opt in     out     source               destination         

Chain ufw-track-output (1 references)
 pkts bytes target     prot opt in     out     source               destination         
   21  1260 ACCEPT     tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            ctstate NEW
  145 10438 ACCEPT     udp  --  *      *       0.0.0.0/0            0.0.0.0/0            ctstate NEW

Chain ufw-user-forward (1 references)
 pkts bytes target     prot opt in     out     source               destination         
    0     0 ACCEPT     all  --  wg0    ens3    10.6.0.0/24          0.0.0.0/0           

Chain ufw-user-input (1 references)
 pkts bytes target     prot opt in     out     source               destination         
    0     0 ACCEPT     udp  --  *      *       0.0.0.0/0            0.0.0.0/0            udp dpt:20
    0     0 ACCEPT     tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            tcp dpt:22 /* 'dapp_OpenSSH' */
    8   500 ACCEPT     tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            multiport dports 80,443 /* 'dapp_Apache%20Full' */
    0     0 ACCEPT     tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            tcp dpt:21
    0     0 ACCEPT     tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            tcp dpt:25565
    0     0 ACCEPT     udp  --  *      *       0.0.0.0/0            0.0.0.0/0            udp dpt:25565
    0     0 ACCEPT     tcp  --  *      *       HOME_IP       0.0.0.0/0            tcp dpt:53
 1698  115K ACCEPT     udp  --  *      *       HOME_IP       0.0.0.0/0            udp dpt:53

Chain ufw-user-limit (0 references)
 pkts bytes target     prot opt in     out     source               destination         
    0     0 LOG        all  --  *      *       0.0.0.0/0            0.0.0.0/0            limit: avg 3/min burst 5 LOG flags 0 level 4 prefix "(UFW LIMIT BLOCK) "
    0     0 REJECT     all  --  *      *       0.0.0.0/0            0.0.0.0/0            reject-with icmp-port-unreachable

Chain ufw-user-limit-accept (0 references)
 pkts bytes target     prot opt in     out     source               destination         
    0     0 ACCEPT     all  --  *      *       0.0.0.0/0            0.0.0.0/0           

Chain ufw-user-logging-forward (0 references)
 pkts bytes target     prot opt in     out     source               destination         

Chain ufw-user-logging-input (0 references)
 pkts bytes target     prot opt in     out     source               destination         

Chain ufw-user-logging-output (0 references)
 pkts bytes target     prot opt in     out     source               destination         

Chain ufw-user-output (1 references)
 pkts bytes target     prot opt in     out     source               destination 

Selling – Self Hosted Movie Streaming site for sale | Proxies-free

Domain Name: moviewiz.cc
Registered With: NameSilo, LLC
Expires: 2021-11-28
Price: $500
Methods of Payment: Crypto
Proof of ownership: https://ibb.co/NZGLmMR

Hi all, we decided to sell our movie streaming site moviewiz.cc. All movies and TV shows are self-hosted, for processing videos we using fire drive player, for hosting all files we using a hetzner dedicated server. Its used Dooplay theme with heavy customization. Total storage usage is around 11TB. Currently, we don’t have traffic or earning. Please let me know what you think.

How to know if Google Sheets IMPORTDATA, IMPORTFEED, IMPORTHTML or IMPORTXML functions are able to get data from a resource hosted on a website?

If the content is added dynamically (by using Javascript), it can’t be imported by using Google Sheets built-in functions. Also if the website webmaster have taken certain measures, this functions will not able to import the data.


To check if the content is added dynamically, using Chrome,

  1. Open the URL of the source data.
  2. Press F12 to open Chrome Developer Tools
  3. Press Control+Shift+P to open the Command Menu.
  4. Start typing javascript, select Disable JavaScript, and then press Enter to run the command. JavaScript is now disabled.

JavaScript will remain disabled in this tab so long as you have DevTools open.

Reload the page to see if the content that you want to import is shown, if it’s shown it could be imported by using Google Sheets built-in functions, otherwise it’s not possible but might be possible by using other means for doing web scraping.

According to Wikipedia,

Web scraping, web harvesting, or web data extraction is data scraping used for extracting data from websites.

The webmasters could use robots.txt file to block access to website. In such case the result will be #N/A Could not fetch url.

The webpage could be designed to return a special a custom message instead of the data.


IMPORTDATA, IMPORTFEED, IMPORTHTML and IMPORTXML are able to get content from resources hosted on websites that are:

  • Publicly available. This means that the resource doesn’t require authorization / to be logged in into any service to access it.
  • The content is “static”. This mean that if you open the resource using the view source code option of modern web browsers it will be displayed as plain text.
    • NOTE: The Chrome’s Inspect tool shows the parsed DOM; in other works the actual structure/content of the web page which could be dynamically modified by JavaScript code or browser extensions/plugins.
  • The content has the appropriated structure.
    • IMPORTDATA works with structured content as csv or tsv doesn’t matter of the file extension of the resource.
    • IMPORTFEED works with marked up content as ATOM/RSS
    • IMPORTHTML works with marked up content as HTML that includes properly markedup list or tables.
    • IMPORTXML works with marked up content as XML or any of its variants like XHTML.
  • Google servers are not blocked by means of robots.txt or the user agent.

On W3C Markup Validator there are several tools to checkout is the resources had been properly marked up.

Regarding CSV check out Are there known services to validate CSV files

It’s worth to note that the spreadsheet

  • should have enough room for the imported content; Google Sheets has a 5 million cell limit by spreadsheet, according to this post a columns limit of 18278, and a 50 thousand characters as cell content even as a value or formula.
  • it doesn’t handle well large in-cell content; the “limit” depends on the user screen size and resolution as now it’s possible to zoom in/out.

References

Related

The following question is about a different result, #N/A Could not fetch url

http – Is there a reliable way to get get the fingerprint of a file hosted online, without fully downloading it?

Background

Tertiary to this question, I have been building my own imageboard that prevents (for example) duplicate images from being downloaded again and again on behalf of the client. How I do this, is that I keep all files in a database with a key being a hash of the file. The client sees the hash, and first checks its database to see if it has been downloaded before actually making a request. Similarly for my server; I also prevent duplicate uploads by having the client send me the hash first.

I am expanding a more general purpose networking library for downloading files from the web, and to my dismay; I discovered that not all servers will supply me with some sort of hash.

Question

In an effort to de-duplicate downloads, and to continue partial downloads in which their url has changed, is there a way to reliably fingerprint a file from its headers and url?

Just taking an example here, of a plain HEAD request

QVariant reply->header( QNetworkRequest::ContentLengthHeader )
int
44374

QUrl url
scheme()   : https
userName() : NULL
password() : NULL
host()     : i.imgur.com
port()     : -1
path()     : /oEdf6Rl.png
fragment() : NULL
query()    : NULL
View post on imgur.com
QNetworkReply* reply Connection keep-alive Content-Length 44374 Last-Modified Sun, 21 Feb 2021 15:14:36 GMT ETag "83c16cca4ee371145485130383104315" Content-Type image/png cache-control public, max-age=31536000 Accept-Ranges bytes Date Fri, 26 Feb 2021 04:14:22 GMT Age 392375 X-Served-By cache-bwi5134-BWI, cache-yul12821-YUL X-Cache HIT, HIT X-Cache-Hits 1, 2 X-Timer S1614312862.217094,VS0,VE0 Strict-Transport-Security max-age=300 Access-Control-Allow-Methods GET, OPTIONS Access-Control-Allow-Origin * Server cat factory 1.0 X-Content-Type-Options nosniff NoError Unknown error

The only things that seem static here, are the Mime Type, and the file size. One thing I would be willing to do is do a Accept-Ranges Download of certain bits, as I have found most servers do support this header, and from there; create a hash of the corresponding bytearray, and fingerprint it that way.

However, I am skeptical whether this would work reliably, especially concerned with something like two image frames that are nearly identical, but are in fact, not.

Am I pursuing a lost cause here? Or is there a reasonable way to fingerprint a file hosted on the web, without having to fully download it?

I’d like to do this with any file above 1mb large, because I have an exceptionally slow connection at times. Thanks.