Home | History | Annotate | Download | only in tools

Lines Matching refs:links

6 """Downloads web pages with fillable forms after parsing through a set of links.
59 # lists, giving more weight to the links that contain a link clue.
84 domain: only links with this domain will be retrieved.
92 # Http links without clues from LINK_CLUES.
94 # Http links that contain a clue from LINK_CLUES.
96 # Https links that do not contain any clues from LINK_CLUES.
98 # Https links that contain a clue from LINK_CLUES.
118 link: the url that is inserted to the appropriate links list.
147 the url links. If it is a registration page, it saves it in a file as
212 # Indicates page is not a registration page and links must be parsed.
297 Downloads the originally-specified site url, parses it and gets the links.
317 needed for session cookies. It keeps track of 'visited links' and
318 'links to visit' of the site. To do this it uses the links discovered from
328 """Init crawler URL, links lists, logger, and creates a cookie temp file.
354 # Http links that contain a clue from LINK_CLUES.
356 # Http links that do not contain any clue from LINK_CLUES.
358 # Https links that contain a clue from LINK_CLUES.
360 # Https links that do not contain any clue from LINK_CLUES.
362 # All links downloaded and parsed so far.
402 currently set to 30 URLs. These URLs are taken from the links lists, which
414 does not contain any secure links, then csl and sl lists will be of 0 length
417 Since 30 URLs can be handled concurrently, the number of links taken from
418 other lists can be increased. This means that we can take 24 links from the
420 than 24 links, e.g. there are only 21 links, then only 9 links may be taken
434 # If some links within the list have fewer items than needed, the missing
435 # links will be taken by the following priority: csl, cgl, sl, gl.
450 for no_of_links, links in [
456 if not links:
458 url = links.pop(0)
509 """Appends new links discovered by each retriever to the appropriate lists.
511 Links are copied to the links list of the crawler object, which holds all
512 the links found from all retrievers that the crawler object created. The
540 Creates a Retriever object and calls its run method to get the first links,