industryterm:web crawler

  • Web Scraping With #google Sheets
    https://hackernoon.com/web-scraping-with-google-sheets-20d0dce323cc?source=rss----3a8144eabfe3-

    Web scraping and utilizing various APIs are great ways to collect #data from websites and applications that can later be used in data #analytics. There is a company called HiQ that is well known for web scraping. HiQ crawls various “Public” websites to collect data and provide analytics for companies on their employees. They help companies find top talent using sites data like Linkedin, and other public sources to gain the information needed in their algorithms.However, they ran into legal issues when Linkedin asked them to cease and desist as well as put in certain technical methods to slow down HiQ’s web crawlers. HiQ subsequently sued Linkedin and won! The judge said as long as the data was public, it was ok to scrape!Image from commit strip (Here)Web scraping typically requires a complex (...)

    #web-scraping #programming

  • What Edward Snowden Leaked Was Nothing Compared to What He Didn’t | The Nation
    http://www.thenation.com/article/178467/what-snowden-leaked-was-nothing-compared-what-he-didnt#

    Here, at least, is a place to start: intelligence officials have weighed in with an estimate of just how many secret files National Security Agency contractor Edward Snowden took with him when he headed for Hong Kong last June. Brace yourself: 1.7 million. At least they claim that as the number he or his web crawler accessed before he left town. Let’s assume for a moment that it’s accurate and add a caveat. Whatever he had with him on those thumb drives when he left the agency, Edward Snowden did not take all the NSA’s classified documents. Not by a long shot. He only downloaded a portion of them. We don’t have any idea what percentage, but assumedly millions of NSA secret documents did not get the Snowden treatment.

    Such figures should stagger us and what he did take will undoubtedly occupy journalists for months or years more (and historians long after that). Keep this in mind, however: the NSA is only one of seventeen intelligence outfits in what is called the US Intelligence Community. Some of the others are as large and well funded, and all of them generate their own troves of secret documents, undoubtedly stretching into the many millions.

    And keep something else in mind: that’s just intelligence agencies. If you’re thinking about the full sweep of our national security state (NSS), you also have to include places like the Department of Homeland Security, the Energy Department (responsible for the US nuclear arsenal), and the Pentagon. In other words, we’re talking about the kind of secret documentation that an army of journalists, researchers, and historians wouldn’t have a hope of getting through, not in a century.

    #Snowden #NSA #surveillance

  • #Snowden Used Low-Cost Tool to Best #N.S.A.
    http://www.nytimes.com/2014/02/09/us/snowden-used-low-cost-tool-to-best-nsa.html

    Using “web crawler” software designed to search, index and back up a website, Mr. Snowden “scraped data out of our systems” while he went about his day job, according to a senior intelligence official. “We do not believe this was an individual sitting at a machine and downloading this much material in sequence,” the official said. The process, he added, was “quite automated.”

    (...)

    .... from his first days working as a contractor inside the N.S.A.’s aging underground Oahu facility for Dell, the computer maker, and then at a modern office building on the island for Booz Allen Hamilton, the technology consulting firm that sells and operates computer security services used by the government, Mr. Snowden learned something critical about the N.S.A.’s culture: While the organization built enormously high electronic barriers to keep out foreign invaders, it had rudimentary protections against insiders.

    (...)

    Investigators have yet to answer the question of whether Mr. Snowden happened into an ill-defended outpost of the N.S.A. or sought a job there because he knew it had yet to install the security upgrades that might have stopped him.

    • Agency officials insist that if Mr. Snowden had been working from N.S.A. headquarters at Fort Meade, Md., which was equipped with monitors designed to detect when a huge volume of data was being accessed and downloaded, he almost certainly would have been caught. But because he worked at an agency outpost [Oahu, Hawaii] that had not yet been upgraded with modern security measures, his copying of what the agency’s newly appointed No. 2 officer, Rick Ledgett, recently called “the keys to the kingdom” raised few alarms.