Python 1: My spider gives me all the results in one line in a CSV file and 2 I can not create any scrapy scrape information from the list of links I just scraped

First, if I use extract_first, scrapy gives me the first element of every page, and if I do it that way, it returns all the content I want, but in one-liners.

Second, I can not get Scrapy to go to the links I've just scraped, and retrieve information from those links, returning an empty CSV file.

from scrap import Spider
to import from companies.items CompaniesItem
Import again

Class Company Spider:
name = "company"
allowed_domains = ['http://startup.miami',]
    # Defining the list of pages to be scratched
start_urls = ["http://startup.miami/category/startups/page/" + str(1*i) + "/" for i in range(0, 10)]

    def parse (self, answer):
rows = response.xpath (& # 39; // *[@id="datafetch"]& # 39)

for line in lines:
link = row.xpath (& # 39; .//h2 / a / @ href & # 39;). extract ()

name = row.xpath (& # 39; .// header / h2 / a / text () & # 39;). extract ()

def parse_detaill_page (Self, Answer):
first_element_page = response.xpath (& # 39; .// p / a / @ href & # 39;). extract ()
first_element_page = & # 39; http: //startup.miami' + first_element_page
Yield request (url = first_element_page, callback = self.parse_detaill_page)


item = CompaniesItem ()
items['link'] = Link
items['name'] = name

income items

Problem with the output of Scrapy Spider in the Python script

I want to use the output of a spider in a Python script. To accomplish this, I've written the following code based on a different thread.

The problem I'm facing is that the spider_results () function only returns a list of the last item, instead of a list of all found items. When I manually execute the same spider with the Scrapy Crawl command, I get the desired output. The output of the script, the manual JSON output and the spider itself are below.

What's wrong with my code?

Scrapy import signals
From scrapy.crawler, import CrawlerProcess
From scrapy.utils.project, import get_project_settings
From circus.spiders.circus import MySpider

from scrapy.signalmanager import dispatcher


def spider_results ():
Results []

    def crawler_results (signal, sender, element, answer, spider):
results.append (item)


dispatcher.connect (crawler_results, signal = signals.item_passed)

process = CrawlerProcess (get_project_settings ())
process.crawl (MySpider)
process.start () # The script is blocked here until crawling is complete
Return results


if __name__ == & # 39; __ main __ & # 39 ;:
print (spider_results ())

Script output:

{& # 39; away_odds & # 39 ;: 1.44,
& # 39; away_team & # 39 ;: & # 39; Los Angeles Dodgers & # 39 ;,
& # 39; event_time & # 39 ;: datetime.datetime (2019, 6, 8, 2, 15),
& # 39; home_odds & # 39 ;: 2.85,
# 39; home_team # 39 ;: # 39; San Francisco Giants & # 39 ;,
& # 39; last_update & # 39 ;: datetime.datetime (2019, 6, 6, 20, 58, 41, 655497),
& # 39; league & # 39 ;: MLB & # 39;}, {'away_odds & # 39 ;: 1.44,
& # 39; away_team & # 39 ;: & # 39; Los Angeles Dodgers & # 39 ;,
& # 39; event_time & # 39 ;: datetime.datetime (2019, 6, 8, 2, 15),
& # 39; home_odds & # 39 ;: 2.85,
# 39; home_team # 39 ;: # 39; San Francisco Giants & # 39 ;,
& # 39; last_update & # 39 ;: datetime.datetime (2019, 6, 6, 20, 58, 41, 655497),
& # 39; league & # 39 ;: MLB & # 39;}, {'away_odds & # 39 ;: 1.44,
& # 39; away_team & # 39 ;: & # 39; Los Angeles Dodgers & # 39 ;,
& # 39; event_time & # 39 ;: datetime.datetime (2019, 6, 8, 2, 15),
& # 39; home_odds & # 39 ;: 2.85,
# 39; home_team # 39 ;: # 39; San Francisco Giants & # 39 ;,
& # 39; last_update & # 39 ;: datetime.datetime (2019, 6, 6, 20, 58, 41, 655497),
& # 39; league & # 39;: & # 39; MLB & # 39;}]

Json issue with Scrapy Crawl:

[
{"home_team": "Los Angeles Angels", "away_team": "Seattle Mariners", "event_time": "2019-06-08 02:07:00", "home_odds": 1.58, "away_odds": 2.4, "last_update": "2019-06-06 20:48:16", "league": "MLB"},
{"home_team": "San Diego Padres", "away_team": "Washington Nationals", "event_time": "2019-06-08 02:10:00", "home_odds": 1.87, "away_odds": 1.97, "last_update": "2019-06-06 20:48:16", "league": "MLB"},
{"home_team": "San Francisco Giants", "away_team": "Los Angeles Dodgers", "event_time": "2019-06-08 02:15:00", "home_odds": 2.85, "away_odds": 1.44, "last_update": "2019-06-06 20:48:16", "league": "MLB"}
]

MySpider:

From scrapy.spiders import Spider
from ..items import MatchItem
Import JSON
import datetime
import dateutil.parser

Class MySpider (Spider):
name = & # 39; first_spider & # 39;

start_urls = ["https://websiteXYZ.com"]

    def parse (self, answer):
item = MatchItem ()

timestamp = datetime.datetime.utcnow ()

response_json = json.loads (response.body)

for event in response_json["el"]:
for team in event["epl"]:
if team["so"] == 1: Article["home_team"] = Team["pn"]
                if team["so"] == 2: item["away_team"] = Team["pn"]

            for the market in the event["ml"]:
if market["mn"] == "Match result":
items["event_time"] = dateutil.parser.parse (market["dd"]) .replace (tzinfo = None)
for the result in the market["msl"]:
if result["mst"] == "1": item["home_odds"] = Result["msp"]
                        if result["mst"] == "X": item["draw_odds"] = Result["msp"]
                        if result["mst"] == "2": item["away_odds"] = Result["msp"]

                if market["mn"] == & # 39; Moneyline & # 39 ;:
items["event_time"] = dateutil.parser.parse (market["dd"]) .replace (tzinfo = None)
for the result in the market["msl"]:
if result["mst"] == "1": item["home_odds"] = Result["msp"]
                        #if result["mst"] == "X": item["draw_odds"] = Result["msp"]
                        if result["mst"] == "2": item["away_odds"] = Result["msp"]


            items["last_update"] = Timestamp
items["league"] = Event["scn"]

            income items

python – Problem with CrawlSpider and Scrapy

It was intended to provide additional information about this blog or blog.

from scrapy.item import field, element
from scrapy.spiders import CrawlSpider, rule
From scrapy.linkextractors import LinkExtractor
from scrapy.loader Import ItemLoader
Import Scrapy

Class error (Item):
tipo = field ()
capacity = Field ()

Class MySpider (CrawlSpider):
name = "blog"
allowed_domains = ["elblogdelnarco.com"]
    start_urls = ["https://elblogdelnarco.com/2019/06/21/la-vez-que-sicarios-del-cjng-de-el-mencho-presumieron-su-poder-en-calles-de-ciudad-de-mexico-video/"]

    rules = (
Rule (LinkExtractor (restricted_xpaths = ("// a[@class='next page-numbers']/ @ href "))),
Rule (LinkExtractor (restricted_xpaths = ("// h2[@class='title front-view-title']/ a / @ href ")), callback = & # 39; parse_item & # 39 ;,
)

def parse_item (self, response):
item = ItemLoader (misitems (), response)
item.add_xpath ("titulo", "// h1[@class='title single-title entry-title']/Text()")
item.add_xpath ("contenido", "(// div[@class='thecontent']/ p / b)[1]/Text()")
Yield item.load_item () 

python 3.x – How do I make loops in Scrapy?

I'm scrounging a website of the Dmoz site. I want to make a loop in functions, because the for loop which I use in every function I have to insert again and again into every function. Although their functionality is the same. The second thing I want to resolve is a loop Yield answer.Follow because if I have scratched more pages, I have to write that again and again. Is there a way from my two problems? I tried several times, but failed.

                                # Save and call another page
Result response.follow (self.about_page, self.parse_about, meta = {& # 39; items & # 39 ;: items})
Return response.Follow (self.editor, self.parse_editor, meta = {& # 39; items & # 39 ;: items})

def parse_about (yourself, answer):
# do your stuff on the second page
items = response.meta['items']
        names = {# name1 & # 39;: & # 39; headings & # 39 ;,
& # 39; name2 & # 39 ;: & # 39; paragraphs & # 39 ;;
"Name3": "# 3 projects"
& # 39; name4 & # 39 ;: & # 39; About Dmoz & # 39 ;,
5 name5:: & Languages ​​& # 39 ;,
Name name6:: & You can make a difference & # 39 ;,
& # 39; name7 & # 39 ;: & # 39; further information & # 39;
}

finder = {# find1 #: # h2 :: text, #mainContent h1 :: text #,
& # 39; find2 & # 39 ;: & # 39; p :: text & # 39 ;,
& 39; find3 & # 39 ;: li ~ li + li b a :: text, li: nth-child (1) b a :: text & # 39 ;,
Find find4:: nnav ul a :: text, li: nth-child (2) b a :: text & # 39 ;,
& # 39; find5 & # 39 ;: & # 39; nav ~ .nav a :: text & # 39 ;,
# Find6 #: # dd :: text # about-contrib :: text # 39
& # 39; find7 & # 39;: & # 39; li :: text, # about-more-info a :: text & # 39;
}
by name, see zip (names.values ​​(), finder.values ​​()):
items[name] = response.css (find) .extract ()
income items

python – Scrapy Django Save item model with foreign key objects

I am trying to save the Django model with eventprice_set but get the error message AttributeError: eventprice_set

I've tried to use the same code in models, and it works, but I can not figure out what the problem is here with the scraping

My event models

Class Event (Models.Model):
name = models.CharField (max_length = 259, empty = false, null = false)
place = models.ForeignKey (Place, on_delete = models.CASCADE, empty = false, null = false, verbose_name = & # 39; hosted by & # 39;)
event_cover = models.ImageField (upload_to = & # 39; event-covers & # 39 ;, empty = false, null = true)
description = models.TextField (empty = false, null = false)
start_date = models.DateTimeField (empty = false, null = false)
end_date = models.DateTimeField (blank = true, null = true)
pub_date = models.DateTimeField (default = timezone.now)
Class EventPrice (models.Model):
event = models.ForeignKey (Event, on_delete = models.CASCADE)
price = models.fields.DecimalField (empty = false, null = true, decimal_places = 2, max_digits = 5)

Scrapy article code

                def parse_khidi (self, answer):
item = EventItem (
name = response.css (> h3.eltdf-single-product-title :: text & # 39;). get (),
place = Place.objects.get (name = & # 39; KHIDI & # 39;),
event_cover = response.css (& # 39; img.wp-post-image & # 39;). attrib['src'],
description = str (response.css (& # 39; div[id=tab-description] p :: text & # 39;). get ()),
start_date = self.get_start_date (response.css (elth3.eltdf-single-product-title :: text #). get ()),
end_date = self.get_end_date (response.css (> h3.eltdf-single-product-title :: text & # 39;). get ())
) .eventprice_set.create (price = Decimal (response.css (& # 39; p.price span :: text & # 39;). get ()))

income items

Web Scraping – Python / Jupyter / Scrapy – Stack Overflow in Spanish

I'm trying to learn Web Scraping by working on this page to get started:
http://books.toscrape.com/catalogue/page-1.html

I have to retrieve a list of the books on this page with title and price in a data frame. For this I create a JSON with the name of the book and the price. I have that, but I do not control the CSS selectors very well and the JSON saves me nothing.

Class SimpleBookSpider (scrapy.Spider):
name = "simplebooks"
start_urls = [
    'http://books.toscrape.com/catalogue/page-1.html',
]
custom_settings = {
& # 39; LOG_LEVEL & # 39 ;: logging.WARNING,
& # 39; ITEM_PIPELINES & # 39 ;: {& # 39; __ main __. JsonWriterPipeline & # 39;: 1},
& # 39; FILE_NAME & # 39 ;: "quotes.jl"
}

def parse (self, answer):
Results = Answer.css (& # 39; div.product_pod & # 39;)
for quote in results:
give in {
& # 39; title & # 39;: product_pod.css (& # 39; h3 a :: attr (tittle) & # 39;). extract_first (),
Price price #: product_pod.css (div div.product_price p.price_color :: text #). extract_first (),
}

Thanks in advance

Python – Scrapy partially extracts the URL

My Scrapy spider works on everything, but extracting the URL by xpath @href does not extract the full URL, always missing the last part of the URL from the "? ……." character that identifies the icon variable GET the URL.
Does anyone know if Scrapy has a character limit or something?
It's the first time that happens to me, the rest of the spiders I push on other sites are perfectly fucnionan.
Thanks for the answers.

Python – Cron for Scrapy

I try to use a Scrapy Spider over Crontab, and I can not get it to work. I have confirmed that cron works well with some Linux commands. Everything works very well.
When I issue the command to run the spider, it does not work. I have registered the log that returns it and it is said that it does not find any scrapie.
I tried to create an SH file or with the following code:

* * * * * cd / home / pedro / Documents / environments / basic / basic / && scrapy crawl  

The spider returns a configured CSV directly. works by throwing the spider with the instruction crawling creep and if I ran the previous cron code in the console without the asterisks, it works too.
I have consulted hundreds of places and can not find the reason.
Can someone help me?
Thanks in advance