industryterm:search engines

  • How To Boost Your Business With Residential Proxies : 5 Real-life Use Cases
    https://hackernoon.com/how-to-boost-your-business-with-residential-proxies-5-real-life-use-case

    Image credit: UnsplashMasking your IP address might be useful in a range of situations from accessing blocked content to bypassing anti-bot systems implemented by search engines and other online services. Here are several ways of organizing proxies:Residential #proxy — IP addresses that are assigned to homeowners by the ISP are called residential. These addresses are flagged in regional internet registries. Residential proxies provided by services like Infatica use such IP addresses because all requests that are sent via them are indistinguishable from those generated by regular users.Data center proxy — Such proxies are not connected to the ISP, while addresses are assigned by hosting providers who’ve purchased large pools of IP addresses.Shared proxy — In this case a proxy can be used by (...)

    #privacy #seo #private-proxies #residential-proxies

  • 8 Common Link Building Myths That Are Holding Back Your Website
    https://hackernoon.com/8-common-link-building-myths-that-are-holding-back-your-website-173307ab

    Link Building MythsThe myth is, after all, the never ending story!!Similar holds true for gossips too!!There is no end to the gossips, doing the round, much beyond the school grounds. The mind-boggling questions whether Google changed something or was it merely an algorithm update leads to infinite brainstorming sessions happening around the globe.As long as the search engines veil their algorithms in secrecy, the global industry will persevere to be rife with gossip, myths, and spams.And guess what it results in….This encourages the businesses to follow wrong strategies or seriously ending up damaging their backlink profile or perhaps more severe ones like a spiteful Google penalty, shifting which is quite a sturdy task.The fact is that everything, included under #seo and backlink building (...)

    #search-engine-optimizatio #link-building #link-building-myths #marketing

  • Body politics: The old and new public health risks of networked health misinformation
    https://points.datasociety.net/body-politics-the-old-and-new-public-health-risks-of-networked-h

    There are clear parallels between the tactics used to spread health disinformation and political content. For instance, in 2018, researchers found that large networks of bots and trolls were spreading anti-vaccination rhetoric to sow confusion online and amplify the appearance of an anti-vaccination community. The anti-vaccination tweets often referenced conspiracy theories, and some accounts almost singularly focused on the U.S. government. As a result, real-life users and orchestrated networks of bots are engaged in a feedback loop. Recently, political public figures have used their platform to amplify vaccination misinformation, such as tweeting that measles can help fight cancer. There is a long history of people using influence to sway public opinion about vaccines—particularly among celebrities.

    These are symptoms of a larger societal crisis: disinformation campaigns aimed to undermine social institutions.

    The search and recommendation algorithms that underpin our information retrieval systems are other modern tools mediating access to health information. When a user enters an inquiry into a search engine, they receive curated results. As so many people rely on search engines for health information, they are another important mechanism that is susceptible to manipulation. For instance, the websites of some crisis pregnancy centers—which are designed to look and sound like those of clinics that provide abortion care, but instead give misleading information about the negative effects of abortion to visitors—are optimized results for Google searches often made by women seeking abortion information.

    Similarly, recommendation systems on popular social media platforms, particularly Facebook and YouTube, create easy entry points for problematic content. For example, a mother joining a generic parenting group on Facebook may subsequently receive recommendations for anti-vaxx groups. Bots, search engine optimization, and gaming of recommendation systems are foundational tools used by various actors to influence public health discourse and skew public debates — often blurring the line between medical mistrust and larger political ideologies and agendas.

    #Information_médicale #Santé_publique #Vaccination #Complotisme #Médias_sociaux #Algorithmes

  • How #ai Can Be Improved By #blockchain
    https://hackernoon.com/how-ai-can-be-improved-by-blockchain-591a2fb7095e?source=rss----3a8144ea

    Quite some time age artificial intelligence (AI) has moved away from a fantastic fictional element and a limited gaming feature. Today AI can be found all across the board, from scientific experiments to everyday things like search engines and our favorite social media.But how can this new #technology that operates invisibly in almost every home change our lives — and more specifically, how can it make our lives better?AI market todayEngineers create AIs to allow computers to solve problems themselves. Essentially, they modify program code in response to the new difficulties they face. Computers beating world champions in chess and other games are no news, but for some time it seemed that AI development stagnated there.Until unmanned vehicles, created by corporations like Google and Uber, (...)

    #future #artificial-intelligence

  • Private Mossad for Hire
    Inside an effort to influence American elections, starting with one small-town race.
    February 18 & 25, 2019
    By Adam Entous and Ronan Farrow

    https://www.newyorker.com/magazine/2019/02/18/private-mossad-for-hire

    (...) Psy-Group had more success pitching an operation, code-named Project Butterfly, to wealthy Jewish-American donors. The operation targeted what Psy-Group described as “anti-Israel” activists on American college campuses who supported the Boycott, Divestment, Sanctions movement, known as B.D.S. Supporters of B.D.S. see the movement as a way to use nonviolent protest to pressure Israel about its treatment of the Palestinians; detractors say that B.D.S. wrongly singles out Israel as a human-rights offender. B.D.S. is anathema to many ardent supporters of the Israeli government.

    In early meetings with donors, in New York, Burstien said that the key to mounting an effective anti-B.D.S. campaign was to make it look as though Israel, and the Jewish-American community, had nothing to do with the effort. The goal of Butterfly, according to a 2017 company document, was to “destabilize and disrupt anti-Israel movements from within.” Psy-Group operatives scoured the Internet, social-media accounts, and the “deep” Web—areas of the Internet not indexed by search engines like Google—for derogatory information about B.D.S. activists. If a student claimed to be a pious Muslim, for example, Psy-Group operatives would look for photographs of him engaging in behavior unacceptable to many pious Muslims, such as drinking alcohol or having an affair. Psy-Group would then release the information online using avatars and Web sites that couldn’t be traced back to the company or its donors.

    Project Butterfly launched in February, 2016, and Psy-Group asked donors for $2.5 million for operations in 2017. Supporters were told that they were “investing in Israel’s future.” In some cases, a former company employee said, donors asked Psy-Group to target B.D.S. activists at universities where their sons and daughters studied.
    The project would focus on as many as ten college campuses. According to an update sent to donors in May, 2017, Psy-Group conducted two “tours of the main theatre of action,” and met with the campaign’s outside “partners,” which it did not name. Psy-Group employees had recently travelled to Washington to visit officials at a think tank called the Foundation for Defense of Democracies, which had shared some of its research on the B.D.S. movement. In a follow-up meeting, which was attended by Burstien, Psy-Group provided F.D.D. with a confidential memo describing how it had compiled dossiers on nine activists, including a lecturer at the University of California, Berkeley. In the memo, Psy-Group asked the foundation for guidance on identifying future targets. According to an F.D.D. official, the foundation “did not end up contracting with them, and their research did little to advance our own.”

    Burstien recruited Ram Ben-Barak, a former deputy director of Mossad, to help with the project. As the director general of Israel’s Ministry of Strategic Affairs, from 2014 to 2016, Ben-Barak had drawn up a plan for the state to combat the B.D.S. movement, but it was never implemented. Ben-Barak was enthusiastic about Butterfly. He said that the fight against B.D.S. was like “a war.” In the case of B.D.S. activists, he said, “you don’t kill them but you do have to deal with them in other ways.” (...)

    #BDS

  • Querying the #blockchain: Why The Graph Might Become One of the Important Protocols of the Web3…
    https://hackernoon.com/querying-the-blockchain-why-the-graph-might-become-one-of-the-important-

    Querying the Blockchain: Why The Graph Might Become One of the Important Protocols of the Web3 StackData access has played a prominent role in any technology trend in the history of software. Data access technologies such as databases, search engines or query APIs are so ubiquitous that we barely think about them when architecting software solutions. As Web 3.0, decentralized applications powered by blockchain technologies evolve, infrastructure blocks such as data access will become more relevant. However, solving data access in the blockchain have proven to be a very challenging endeavor that forces developers to spend significant amounts of time writing infrastructure code. Among, the Web3 data access solutions in the market, The Graph Protocol is that one that I particularly like (...)

    #invector-labs #ethereum #cryptocurrency #blockchain-technology

  • #searchpedia : A List of 250+ Search Engines
    https://hackernoon.com/searchpedia-a-list-of-250-search-engines-40198146adfc?source=rss----3a81

    An Exhaustive List of All Search Engines from the Dawn of the InternetSince the dawn of the Internet Era, we have been flooded with an ocean of information. But without a good search engine, this ocean is useless.Search Engines have gone through a great journey, we saw a lot of them, some came and went, and some stay to this date.Here is an incomplete, but a big list of search engines. If you find something wrong or missing, then shoot your suggestions in the comments.We have categorized the search engines according to their use-cases. Enjoy!All-Purpose Search EnginesGoogle: Well, probably you used this for coming to this article. The world’s most popular search engine.Visit: http://www.google.comBing Search: Microsoft’s entry into the burgeoning search engine market. Better late than (...)

    #alternative-search-engine #list-of-search-engines #search-engines #all-search-engines

  • #google Search — How A Master’s Thesis Became An Idea Worth $70 Billion
    https://hackernoon.com/google-search-how-a-masters-thesis-became-an-idea-worth-70-billion-c4c38

    Google Search — How A Master’s Thesis Became An Idea Worth $70 BillionWhat most of you might know is that the Google Search that you currently know and use began as a Master’s thesis that Larry Page and Sergey Brin worked on back in 1996, that revolutionized the way people looked at search engines. However, what most do not know is that their initial idea was not to rank websites, rather to rank annotations on websites.“ One idea Page presented to Winograd, a collaboration with Brin, seemed more promising than the others: creating a system where people could make annotations and comments on websites. But the more Page thought about annotation, the messier it got. How would you figure out who gets to comment or whose comment would be the one you’d see first? For that, he says, “We needed a (...)

    #google-search #google-thesis #google-founders-stanford #google-origin-story

  • Media Manipulation, Strategic Amplification, and Responsible Journalism, by danah boyd
    https://points.datasociety.net/media-manipulation-strategic-amplification-and-responsible-journ

    Media manipulators have developed a strategy with three parts that rely on how the current media ecosystem is structure:

    1. Create spectacle, using social media to get news media coverage.
    2. Frame the spectacle through phrases that drive new audiences to find your frames through search engines.
    3. Become a “digital martyr” to help radicalize others.

    (…) Using search to your advantage relies on what Michael Golebiewski at Bing calls a “data void.” When people search for a phrase that does not have natural informative results, it’s easy for manipulators to control the results.

    (...) YouTube is a disaster. First, there’s a lot less content on YouTube, which means that problematic content surfaces to the top faster. Second, YouTube isn’t simply a search engine; it’s also a recommendation engine that encourages people to view more videos and go on a journey. This is great for music discovery, but not so great when manipulators game the recommendation system to create pathways to extremist content

    (…) In addition to playing with algorithmic systems, media manipulators exploit a psychological process known as “apophenia.” By creating connections between random ideas, manipulators warp the cultural imaginary. By inviting people to see artificial patterns, they engage potential recruits to see reality in their terms.

  • What worries me about AI – François Chollet – Medium
    https://medium.com/@francois.chollet/what-worries-me-about-ai-ed9df072b704

    This data, in theory, allows the entities that collect it to build extremely accurate psychological profiles of both individuals and groups. Your opinions and behavior can be cross-correlated with that of thousands of similar people, achieving an uncanny understanding of what makes you tick — probably more predictive than what yourself could achieve through mere introspection (for instance, Facebook “likes” enable algorithms to better assess your personality that your own friends could). This data makes it possible to predict a few days in advance when you will start a new relationship (and with whom), and when you will end your current one. Or who is at risk of suicide. Or which side you will ultimately vote for in an election, even while you’re still feeling undecided. And it’s not just individual-level profiling power — large groups can be even more predictable, as aggregating data points erases randomness and individual outliers.
    Digital information consumption as a psychological control vector

    Passive data collection is not where it ends. Increasingly, social network services are in control of what information we consume. What see in our newsfeeds has become algorithmically “curated”. Opaque social media algorithms get to decide, to an ever-increasing extent, which political articles we read, which movie trailers we see, who we keep in touch with, whose feedback we receive on the opinions we express.

    In short, social network companies can simultaneously measure everything about us, and control the information we consume. And that’s an accelerating trend. When you have access to both perception and action, you’re looking at an AI problem. You can start establishing an optimization loop for human behavior, in which you observe the current state of your targets and keep tuning what information you feed them, until you start observing the opinions and behaviors you wanted to see. A large subset of the field of AI — in particular “reinforcement learning” — is about developing algorithms to solve such optimization problems as efficiently as possible, to close the loop and achieve full control of the target at hand — in this case, us. By moving our lives to the digital realm, we become vulnerable to that which rules it — AI algorithms.

    From an information security perspective, you would call these vulnerabilities: known exploits that can be used to take over a system. In the case of the human minds, these vulnerabilities never get patched, they are just the way we work. They’re in our DNA. The human mind is a static, vulnerable system that will come increasingly under attack from ever-smarter AI algorithms that will simultaneously have a complete view of everything we do and believe, and complete control of the information we consume.

    The issue is not AI itself. The issue is control.

    Instead of letting newsfeed algorithms manipulate the user to achieve opaque goals, such as swaying their political opinions, or maximally wasting their time, we should put the user in charge of the goals that the algorithms optimize for. We are talking, after all, about your news, your worldview, your friends, your life — the impact that technology has on you should naturally be placed under your own control. Information management algorithms should not be a mysterious force inflicted on us to serve ends that run opposite to our own interests; instead, they should be a tool in our hand. A tool that we can use for our own purposes, say, for education and personal instead of entertainment.

    Here’s an idea — any algorithmic newsfeed with significant adoption should:

    Transparently convey what objectives the feed algorithm is currently optimizing for, and how these objectives are affecting your information diet.
    Give you intuitive tools to set these goals yourself. For instance, it should be possible for you to configure your newsfeed to maximize learning and personal growth — in specific directions.
    Feature an always-visible measure of how much time you are spending on the feed.
    Feature tools to stay control of how much time you’re spending on the feed — such as a daily time target, past which the algorithm will seek to get you off the feed.

    Augmenting ourselves with AI while retaining control

    We should build AI to serve humans, not to manipulate them for profit or political gain.

    You may be thinking, since a search engine is still an AI layer between us and the information we consume, could it bias its results to attempt to manipulate us? Yes, that risk is latent in every information-management algorithm. But in stark contrast with social networks, market incentives in this case are actually aligned with users needs, pushing search engines to be as relevant and objective as possible. If they fail to be maximally useful, there’s essentially no friction for users to move to a competing product. And importantly, a search engine would have a considerably smaller psychological attack surface than a social newsfeed. The threat we’ve profiled in this post requires most of the following to be present in a product:

    Both perception and action: not only should the product be in control of the information it shows you (news and social updates), it should also be able to “perceive” your current mental states via “likes”, chat messages, and status updates. Without both perception and action, no reinforcement learning loop can be established. A read-only feed would only be dangerous as a potential avenue for classical propaganda.
    Centrality to our lives: the product should be a major source of information for at least a subset of its users, and typical users should be spending several hours per day on it. A feed that is auxiliary and specialized (such as Amazon’s product recommendations) would not be a serious threat.
    A social component, enabling a far broader and more effective array of psychological control vectors (in particular social reinforcement). An impersonal newsfeed has only a fraction of the leverage over our minds.
    Business incentives set towards manipulating users and making users spend more time on the product.

    Most AI-driven information-management products don’t meet these requirements. Social networks, on the other hand, are a frightening combination of risk factors.

    #Intelligence_artificielle #Manipulation #Médias_sociaux

    • This is made all the easier by the fact that the human mind is highly vulnerable to simple patterns of social manipulation. Consider, for instance, the following vectors of attack:

      Identity reinforcement: this is an old trick that has been leveraged since the first very ads in history, and still works just as well as it did the first time, consisting of associating a given view with markers that you identify with (or wish you did), thus making you automatically siding with the target view. In the context of AI-optimized social media consumption, a control algorithm could make sure that you only see content (whether news stories or posts from your friends) where the views it wants you to hold co-occur with your own identity markers, and inversely for views the algorithm wants you to move away from.
      Negative social reinforcement: if you make a post expressing a view that the control algorithm doesn’t want you to hold, the system can choose to only show your post to people who hold the opposite view (maybe acquaintances, maybe strangers, maybe bots), and who will harshly criticize it. Repeated many times, such social backlash is likely to make you move away from your initial views.
      Positive social reinforcement: if you make a post expressing a view that the control algorithm wants to spread, it can choose to only show it to people who will “like” it (it could even be bots). This will reinforce your belief and put you under the impression that you are part of a supportive majority.
      Sampling bias: the algorithm may also be more likely to show you posts from your friends (or the media at large) that support the views it wants you to hold. Placed in such an information bubble, you will be under the impression that these views have much broader support than they do in reality.
      Argument personalization: the algorithm may observe that exposure to certain pieces of content, among people with a psychological profile close to yours, has resulted in the sort of view shift it seeks. It may then serve you with content that is expected to be maximally effective for someone with your particular views and life experience. In the long run, the algorithm may even be able to generate such maximally-effective content from scratch, specifically for you.

      From an information security perspective, you would call these vulnerabilities: known exploits that can be used to take over a system. In the case of the human minds, these vulnerabilities never get patched, they are just the way we work. They’re in our DNA. The human mind is a static, vulnerable system that will come increasingly under attack from ever-smarter AI algorithms that will simultaneously have a complete view of everything we do and believe, and complete control of the information we consume.

  • Quantifying Biases in Online Information Exposure | Center for Complex Networks and Systems Research, Indiana University
    https://arxiv.org/abs/1807.06958
    https://arxiv.org/pdf/1807.06958.pdf

    Our consumption of online #information is mediated by filtering, ranking, and recommendation algorithms that introduce unintentional biases as they attempt to deliver relevant and engaging content. It has been suggested that our reliance on online technologies such as search engines and social media may limit exposure to diverse points of view and make us vulnerable to manipulation by disinformation. In this paper, we mine a massive dataset of Web traffic to quantify two kinds of bias: (i) homogeneity bias, which is the tendency to consume content from a narrow set of information sources, and (ii) popularity bias, which is the selective exposure to content from top sites. Our analysis reveals different bias levels across several widely used Web platforms. Search exposes users to a diverse set of sources, while social media traffic tends to exhibit high popularity and homogeneity #bias. When we focus our analysis on traffic to news sites, we find higher levels of popularity bias, with smaller differences across applications. Overall, our results quantify the extent to which our choices of online systems confine us inside “social bubbles.”

    #personnalisation #médias_sociaux #algorithme via @pomeranian99

  • Token Sale: How to Protect Yourself Against Scams and Hackers
    https://hackernoon.com/token-sale-how-to-protect-yourself-against-scams-and-hackers-a85edaa5e37

    Due to the booming cryptocurrency market, many genuine projects are being impersonated by scammers and hackers looking to take advantage of unsuspecting investors. Here, we take a look at some of the most common scams, and how to protect yourself against them.Brought to you by Namahe1. Make sure you are on the right websitePhishing is becoming more common as scammers become increasingly sophisticated, and arm themselves with new tools to fool unsuspecting investors.One of the major ways scammers are now targeting #ico investors is by cloning the official website. Often, these scammers will also purchase paid advertising on Google, and other search engines to increase their search ranking. As such, be sure to double check the URL when clicking through from a search engine or anywhere (...)

    #common-token-scams #weekly-sponsor #against-token-scams #blockchain

  • Top 10 #seo Tips for #ecommerce Websites — 2018
    https://hackernoon.com/top-10-seo-tips-for-ecommerce-websites-2018-6e926b484032?source=rss----3

    If you own an ecommerce website and want to optimize it for SEO, Well! This article is your guide. If your customers can’t find you online, how will you make a sale? Getting visibility on search engines and ranking higher for searches is very crucial for making sales.SEO for an ecommerce website is not same as SEO for other types of websites. We’ll help you to reach your potential customers along with improving revenue as well as your brand visibility. If you have an ecommerce Store then you’ll agree that:You have tons of pages.Many auto-generated URLs are added to your website every day.There might be many pages with same content on your website.You can’t go and optimize each and every page because it can become an unending task.Not to worry, we are here with some amazing tips that will help (...)

    #eccommerce-seo #seo-tips #ecommerce-seo

  • How the use of #blockchain can improve the quality of the #internet?
    https://hackernoon.com/how-the-use-of-blockchain-can-improve-the-quality-of-the-internet-a71a6d

    Most of us consider the Internet as one of the daily commodities we cannot imagine life without. In recent years, scrolling the channels of social media, using various search engines or news websites for collecting all necessary information, binge-watching favourite TV programmes on Netflix have already become things, which are perceived similarly as brushing teeth or drinking coffee in the morning. Our lives have become digitalized to an enormously large extent, and it is definitely an amazing development. However, the ongoing Internet revolution is not entirely bright, as some enthusiasts may try to claim. The global Internet structure is currently affected by a number of significant flaws which are already serious obstacles in terms of the further development of this phenomenon.The (...)

  • Google’s true origin partly lies in CIA and NSA research grants for mass surveillance — Quartz
    https://qz.com/1145669/googles-true-origin-partly-lies-in-cia-and-nsa-research-grants-for-mass-surveill
    https://qzprod.files.wordpress.com/2017/08/rts18wdq-e1502123358903.jpg?quality=80&strip=all&w=1600

    Le titre est un peu « clickbait », mais les infos sont intéressantes, quoique parfois elliptiques.

    C’est écrit par : Jeff Nesbit, Former director of legislative and public affairs, National Science Foundation
    Quelqu’un qui doit savoir de quoi il cause.

    In the mid 1990s, the intelligence community in America began to realize that they had an opportunity. The supercomputing community was just beginning to migrate from university settings into the private sector, led by investments from a place that would come to be known as Silicon Valley.

    The intelligence community wanted to shape Silicon Valley’s efforts at their inception so they would be useful for homeland security purposes. A digital revolution was underway: one that would transform the world of data gathering and how we make sense of massive amounts of information. The intelligence community wanted to shape Silicon Valley’s supercomputing efforts at their inception so they would be useful for both military and homeland security purposes. Could this supercomputing network, which would become capable of storing terabytes of information, make intelligent sense of the digital trail that human beings leave behind?

    Intelligence-gathering may have been their world, but the Central Intelligence Agency (CIA) and the National Security Agency (NSA) had come to realize that their future was likely to be profoundly shaped outside the government. It was at a time when military and intelligence budgets within the Clinton administration were in jeopardy, and the private sector had vast resources at their disposal. If the intelligence community wanted to conduct mass surveillance for national security purposes, it would require cooperation between the government and the emerging supercomputing companies.

    Silicon Valley was no different. By the mid 1990s, the intelligence community was seeding funding to the most promising supercomputing efforts across academia, guiding the creation of efforts to make massive amounts of information useful for both the private sector as well as the intelligence community.

    They funded these computer scientists through an unclassified, highly compartmentalized program that was managed for the CIA and the NSA by large military and intelligence contractors. It was called the Massive Digital Data Systems (MDDS) project.
    The Massive Digital Data Systems (MDDS) project

    MDDS was introduced to several dozen leading computer scientists at Stanford, CalTech, MIT, Carnegie Mellon, Harvard, and others in a white paper that described what the CIA, NSA, DARPA, and other agencies hoped to achieve. The research would largely be funded and managed by unclassified science agencies like NSF, which would allow the architecture to be scaled up in the private sector if it managed to achieve what the intelligence community hoped for.

    “Not only are activities becoming more complex, but changing demands require that the IC [Intelligence Community] process different types as well as larger volumes of data,” the intelligence community said in its 1993 MDDS white paper. “Consequently, the IC is taking a proactive role in stimulating research in the efficient management of massive databases and ensuring that IC requirements can be incorporated or adapted into commercial products. Because the challenges are not unique to any one agency, the Community Management Staff (CMS) has commissioned a Massive Digital Data Systems [MDDS] Working Group to address the needs and to identify and evaluate possible solutions.”

    In 1995, one of the first and most promising MDDS grants went to a computer-science research team at Stanford University with a decade-long history of working with NSF and DARPA grants. The primary objective of this grant was “query optimization of very complex queries that are described using the ‘query flocks’ approach.” A second grant—the DARPA-NSF grant most closely associated with Google’s origin—was part of a coordinated effort to build a massive digital library using the internet as its backbone. Both grants funded research by two graduate students who were making rapid advances in web-page ranking, as well as tracking (and making sense of) user queries: future Google cofounders Sergey Brin and Larry Page.

    The research by Brin and Page under these grants became the heart of Google: people using search functions to find precisely what they wanted inside a very large data set. The intelligence community, however, saw a slightly different benefit in their research: Could the network be organized so efficiently that individual users could be uniquely identified and tracked?

    The grants allowed Brin and Page to do their work and contributed to their breakthroughs in web-page ranking and tracking user queries. Brin didn’t work for the intelligence community—or for anyone else. Google had not yet been incorporated. He was just a Stanford researcher taking advantage of the grant provided by the NSA and CIA through the unclassified MDDS program.
    Left out of Google’s story

    The MDDS research effort has never been part of Google’s origin story, even though the principal investigator for the MDDS grant specifically named Google as directly resulting from their research: “Its core technology, which allows it to find pages far more accurately than other search engines, was partially supported by this grant,” he wrote. In a published research paper that includes some of Brin’s pivotal work, the authors also reference the NSF grant that was created by the MDDS program.

    Instead, every Google creation story only mentions just one federal grant: the NSF/DARPA “digital libraries” grant, which was designed to allow Stanford researchers to search the entire World Wide Web stored on the university’s servers at the time. “The development of the Google algorithms was carried on a variety of computers, mainly provided by the NSF-DARPA-NASA-funded Digital Library project at Stanford,” Stanford’s Infolab says of its origin, for example. NSF likewise only references the digital libraries grant, not the MDDS grant as well, in its own history of Google’s origin. In the famous research paper, “The Anatomy of a Large-Scale Hypertextual Web Search Engine,” which describes the creation of Google, Brin and Page thanked the NSF and DARPA for its digital library grant to Stanford. But the grant from the intelligence community’s MDDS program—specifically designed for the breakthrough that Google was built upon—has faded into obscurity.

    Google has said in the past that it was not funded or created by the CIA. For instance, when stories circulated in 2006 that Google had received funding from the intelligence community for years to assist in counter-terrorism efforts, the company told Wired magazine founder John Battelle, “The statements related to Google are completely untrue.”

    Did the CIA directly fund the work of Brin and Page, and therefore create Google? No. But were Brin and Page researching precisely what the NSA, the CIA, and the intelligence community hoped for, assisted by their grants? Absolutely.

    In this way, the collaboration between the intelligence community and big, commercial science and tech companies has been wildly successful. When national security agencies need to identify and track people and groups, they know where to turn – and do so frequently. That was the goal in the beginning. It has succeeded perhaps more than anyone could have imagined at the time.

  • Your Data is Being Manipulated – Data & Society : Points
    https://points.datasociety.net/your-data-is-being-manipulated-a7e31a83577b

    Fast forward to 2003, when the sitting Pennsylvania senator Rick Santorum publicly compared homosexuality to bestiality and pedophilia. Needless to say, the LGBT community was outraged. Journalist Dan Savage called on his readers to find a way to “memorialize the scandal.” One of his fans created a website to associate Santorum’s name with anal sex. To the senator’s horror, countless members of the public jumped in to link to that website in an effort to influence search engines. This form of crowdsourced SEO is commonly referred to as “Google bombing,” and it’s a form of media manipulation intended to mess with data and the information landscape.

    At this moment, AI is at the center of every business conversation. Companies, governments, and researchers are obsessed with data. Not surprisingly, so are adversarial actors. We are currently seeing an evolution in how data is being manipulated. If we believe that data can and should be used to inform people and fuel technology, we need to start building the infrastructure necessary to limit the corruption and abuse of that data — and grapple with how biased and problematic data might work its way into technology and, through that, into the foundations of our society.

    Like search engines, social media introduced a whole new target for manipulation. This attracted all sorts of people, from social media marketers to state actors. Messing with Twitter’s trending topics or Facebook’s news feed became a hobby for many. For $5, anyone could easily buy followers, likes, and comments on almost every major site. The economic and political incentives are obvious, but alongside these powerful actors, there are also a whole host of people with less-than-obvious intentions coordinating attacks on these systems.

    The goal with a story like that isn’t to convince journalists that it’s true, but to get them to foolishly use their amplification channels to negate it. This produces a “Boomerang effect,” whereby those who don’t trust the media believe that there must be merit to the conspiracy, prompting some to “self-investigate.”

    Consider, for example, the role of reddit and Twitter data as training data. Computer scientists have long pulled from the very generous APIs of these companies to train all sorts of models, trying to understand natural language, develop metadata around links, and track social patterns. They’ve trained models to detect depression, rank news, and engage in conversation. Ignoring the fact that this data is not representative in the first place, most engineers who use these APIs believe that it’s possible to clean the data and remove all problematic content. I can promise you it’s not.

    I’m watching countless actors experimenting with ways to mess with public data with an eye on major companies’ systems. They are trying to fly below the radar. If you don’t have a structure in place for strategically grappling with how those with an agenda might try to route around your best laid plans, you’re vulnerable. This isn’t about accidental or natural content. It’s not even about culturally biased data. This is about strategically gamified content injected into systems by people who are trying to guess what you’ll do.

    If you are building data-driven systems, you need to start thinking about how that data can be corrupted, by whom, and for what purpose.

    L’article est si intéressant qu’il faut faire attention à ne pas le copier en entier ici ;-)

    #danah_boyd #Machine_learning #médias_sociaux #data #fake_news

  • Algoliterary Encounter
    http://constantvzw.org/site/Algoliterary-Encounter.html

    In the framework of Saison Numérique the Maison du Livre opens its space for #Algolit during three days in a row. The group presents lectures, workshops and a small #Exhibition about the narrative perspective of neural networks. Neural networks are selflearning algorithms based on statistics. They often function as opaque ’blackbox’ algorithms, while they shape applications that are daily used on a worldwide scale, like search engines on the web, translation machines, advertising profiling, (...)

    Algolit / #Workshop, #Lecture, Exhibition, #Hybrid_languages, #Literature, #Algorithm

  • About us - knoema.com

    http://knoema.fr

    Je ne suis très convaincu par l’efficacité visuelle de certaines des infographies proposées, mais le site et les sujets sont intéressants. A suivre.

    Knoema’s search engine, unlike search engines such as Lucene or ElasticSearch, is designed for data and with the specifics of data in mind. The biggest challenge for traditional engines dealing with the data is that they were designed to handle vast amount of text and contextual information available.

    Data consists mostly of numbers, and there is very little metadata available. Knoema solved that problem and is powerful enough to search through billions of time series in a fraction of second and deliver highly relevant results.

    Over the years Knoema implemented hundreds of data portals for its customers. Every data portal provides its own, unique collection of data and insights with a set of data-driven tools for its users. Those data portals may vary from open portals accessible to everyone on the Internet (such as Open Data for Africa), to private portals for companies with a mix of private and public data available to their users.

    In the process of implementing solutions for its customers, Knoema created the largest collection of public data and statistics on the Internet featuring about 2.5 billion time series (as of February 2017) from thousands of sources. This collection of data is growing every day and maintained by our highly experienced data team with unique expertise in dealing with the data from the diverse set of sources all around the world.

    #infographie #visualisation #cartographie #statistiques

    • L’aspect collecte et agrégation de données issues de sources très nombreuses est impressionnant. Dans la foulée, élaboration de tableaux de bord interactifs, assez classique, je dirais. Sympathique possibilité d’accès libre à leur base de données, sympathique mais limitée : pas d’export autre qu’en images, pas de copier-coller, etc.

      Quant aux visualisations, celles des tableaux de bord sont classiques, celles qui sont mises en avant dans les descriptifs des sujets ont pratiquement toutes comme défaut de mettre beaucoup trop de choses. Ainsi sur celle que tu as retenue, il y a 3 axes d’analyse (année, type de véhicule, pays) pour la variable retenue, c’est au moins un de trop, sauf à concevoir le super graphique spécialisé. Et, honnêtement, les damiers comme sous-critères dans les barres empilées (puisque le type de véhicules donne un premier critère d’empilement), ça le fait pas du tout…

      idem pour celui-ci…


      Autant mettre le tableau interactif (possibilité de tri sur chaque variable, p. ex.)

  • Publishing with Apache Kafka at The New York Times
    https://www.confluent.io/blog/publishing-apache-kafka-new-york-times

    At The New York Times we have a number of different systems that are used for producing content. We have several Content Management Systems, and we use third-party data and wire stories. Furthermore, given 161 years of journalism and 21 years of publishing content online, we have huge archives of content that still need to be available online, that need to be searchable, and that generally need to be available to different services and applications.

    These are all sources of what we call published content. This is content that has been written, edited, and that is considered ready for public consumption.

    On the other side we have a wide range of services and applications that need access to this published content — there are search engines, personalization services, feed generators, as well as all the different front-end applications, like the website and the native apps. Whenever an asset is published, it should be made available to all these systems with very low latency — this is news, after all — and without data loss.

    This article describes a new approach we developed to solving this problem, based on a log-based architecture powered by Apache KafkaTM. We call it the Publishing Pipeline. The focus of the article will be on back-end systems. Specifically, we will cover how Kafka is used for storing all the articles ever published by The New York Times, and how Kafka and the Streams API is used to feed published content in real-time to the various applications and systems that make it available to our readers. The new architecture is summarized in the diagram below, and we will deep-dive into the architecture in the remainder of this article.

  • If SoundCloud Disappears, What Happens to Its Music Culture? - The New York Times
    https://www.nytimes.com/2017/08/01/magazine/if-soundcloud-disappears-what-happens-to-its-music-culture.html?_r=0

    After the layoffs, the technology blog TechCrunch published a report claiming that SoundCloud had enough money to finance itself for only 80 days. Though the company disputed the report, the possibility that SoundCloud might disappear sent a shock through the web. Data hoarders began trying to download the bulk of the service’s public archive in order to preserve it. Musicians like deadmau5, a Canadian electronic-music producer, tossed out suggestions on Twitter for how the company could save the service. Chance the Rapper tweeted: ‘‘I’m working on the SoundCloud thing.’’

    Since its start in 2008, SoundCloud has been a digital space for diverse music cultures to flourish, far beyond the influence of mainstream label trends. For lesser-known artists, it has been a place where you can attract the attention of fans and the record industry without having to work the usual channels. There is now a huge roster of successful artists who first emerged on SoundCloud, including the R.&B. singer Kehlani, the electronic musician Ta-Ha, the pop musician Dylan Brady and the rapper Lil Yachty, to name just a few.

    The death of SoundCloud, then, would mean more than the sunsetting of a service: It could mean the erasure of a decade of internet sound culture, says Jace Clayton, a musician and the author of ‘‘Uproot: Travels in 21st-Century Music and Digital Culture.’’ He reminded me of an online music service called imeem, which MySpace bought in 2009 in the hope of absorbing its 16 million users into its own platform. But the struggling service shut down, and all the music uploaded and shared to it was lost, including what Clayton recalls to be a very eclectic subset of black Chicago house music. ‘‘What does it mean if someone can delete hundreds and thousands of hours of sound culture overnight?’’ he asked.

    SoundCloud always let me get lost in a warren of music that I’d never heard — or even heard of — before. Once, it was Japanese trap songs. Another time it was Ethiopian jazz music. It somehow manages to evoke some of the most appealing features of offline music culture, like browsing through bins in a record store or catching indie acts at an underground club.

    SoundCloud took a community-first approach to building its business, prioritizing finding artists to post on its service over making deals with music labels to license their music, the approach taken by Spotify. The music industry was still in the process of adapting to a digital ecosystem when SoundCloud emerged; illegal file-sharing was rampant. But when the industry finally began squelching unauthorized distribution of artists’ tracks, SoundCloud was hit hard. D.J.s were also told to take down mixes of songs they didn’t own the rights to, and many of the remixes the site was known for were removed. SoundCloud ‘‘was very much built in the dot-com-era mentality of building an audience and then finding a way to make money,’’ Mark Mulligan, a music-industry analyst, told me. SoundCloud struggled to monetize the service. Artists who paid to be featured on the site balked at having ads run against their music, and when the company introduced its own version of a subscription service, called Go, the response was tepid. How do you persuade people who have been using your services free to start paying $5 or $10 a month?

    For the most part, streaming services feel sterile and devoid of community. Spotify, Tidal and even YouTube to a degree are vast and rich troves of music, but they primarily function as search engines organized by algorithms. You typically have to know what you’re looking for in order to find it. They have tried to remedy that drawback with customized playlists, but still they feel devoid of a human touch. Serendipity is rare.
    By contrast, the most successful online communities, like SoundCloud, have the feel of public spaces, where everyone can contribute to the culture. They feel as if they belong to the community that sustains them. But of course that’s not how it works. In ‘‘Who Owns Culture?,’’ Susan Scafidi writes: ‘‘Community-generated art forms have tremendous economic and social value — yet most source communities have little control over them.’’

    #Musique #SoundCloud #Streaming #Culture_Participative

  • The Geopolitical Economy of the Global Internet Infrastructure on JSTOR
    https://www.jstor.org/stable/10.5325/jinfopoli.7.2017.0228

    Article très intéressant qui repositionne les Etats dans la gestion de l’infrastructure globale de l’internet. En fait, une infrastructure globale pour le déploiement du capital (une autre approche de la géopolitique, issue de David Harvey).

    According to many observers, economic globalization and the liberalization of telecoms/internet policy have remade the world in the image of the United States. The dominant roles of Amazon, Apple, Facebook, and Google have also led to charges of US internet imperialism. This article, however, argues that while these internet giants dominate some of the most popular internet services, the ownership and control of core elements of the internet infrastructure—submarine cables, internet exchange points, autonomous system numbers, datacenters, and so on—are tilting increasingly toward the EU and BRICS (i.e., Brazil, Russia, India, China, and South Africa) countries and the rest of the world, complicating views of hegemonic US control of the internet and what Susan Strange calls the knowledge structure.

    This article takes a different tack. It argues that while US-based internet giants do dominate some of the middle and top layers of the internet—for example, operating systems (iOS, Windows, Android), search engines (Google), social networks (Facebook), online retailing (Amazon), over-the-top TV (Netflix), browsers (Google Chrome, Apple Safari, Microsoft Explorer), and domain names (ICANN)—they do not rule the hardware, or material infrastructure, upon which the internet and daily life, business, governments, society, and war increasingly depend. In fact, as the article shows, ownership and control of many core elements of the global internet infrastructure—for example, fiber optic submarine cables, content delivery networks (CDNs), autonomous system numbers (ASN), and internet exchange points (IXPs)—are tilting toward the rest of the world, especially Europe and the BRICS (i.e., Brazil, Russia, India, China, and South Africa). This reflects the fact that the United States’ standing in the world is slipping while an ever more multipolar world is arising.

    International internet backbone providers, internet content companies, and CDNs interconnect with local ISPs and at one or more of the nearly 2000 IXPs around the world. The largest IXPs are in New York, London, Amsterdam, Frankfurt, Seattle, Chicago, Moscow, Sao Paulo, Tokyo, and Hong Kong. They are core elements of the internet that switch traffic between all the various networks that comprise the internet system, and help to establish accessible, affordable, fast, and secure internet service.

    In developed markets, internet companies such as Google, Baidu, Facebook, Netflix, Youku, and Yandex use IXPs to interconnect with local ISPs such as Deutsche Telecoms in Germany, BT or Virgin Media in Britain, or Comcast in the United States to gain last-mile access to their customers—and vice versa, back up the chain. Indeed, 99 percent of internet traffic handled by peering arrangements among such parties occurs without any money changing hands or a formal contract.50 Where IXPs do not exist or are rare, as in Africa, or run poorly, as in India, the cost of bandwidth is far more expensive. This is a key factor that helps to explain why internet service is so expensive in areas of the world that can least afford it. It is also why the OECD and EU encourage developing countries to make IXPs a cornerstone of economic development and telecoms policy work.

    The network of networks that make up the internet constitute a sprawling, general purpose platform upon which financial markets, business, and trade, as well as diplomacy, spying, national security, and war depend. The world’s largest electronic payments system operator, the Society for Worldwide Interbank Financial Telecommunications’ (SWIFT) secure messaging network carries over 25 million messages a day involving payments that are believed to be worth over $7 trillion USD.59 Likewise, the world’s biggest foreign currency settlement system, the CLS Bank, executes upward of a million trades a day worth between $1.5 and $2.5 trillion over the global cable systems—although that is down by half from its high point in 2008.60 As Stephen Malphrus, former chief of staff to the US Federal Reserve Chairman Ben Bernanke, observed, when “communications networks go down, the financial services sector does not grind to a halt, rather it snaps to a halt.”61

    Governments and militaries also account for a significant portion of internet traffic. Indeed, 90 to 95 percent of US government traffic, including sensitive diplomatic and military orders, travels over privately owned cables to reach officials in the field.62 “A major portion of DoD data traveling on undersea cables is unmanned aerial vehicle video,” notes a study done for the Department of Homeland Security by MIT scholar Michael Sechrist.63 Indeed, the Department of Defense’s entire Global Information Grid shares space in these cables with the general public internet.64

    The 3.6 billion people as of early 2016 who use the internet to communicate, share music, ideas and knowledge, browse, upload videos, tweet, blog, organize social events and political protests, watch pornography, read sacred texts, and sell stuff are having the greatest influence on the current phase of internet infrastructure development. Video currently makes up an estimated two-thirds of all internet traffic, and is expected to grow to 80 percent in the next five years,69 with US firms leading the way. Netflix single-handedly accounts for a third of all internet traffic. YouTube is the second largest source of internet traffic on fixed and mobile networks alike the world over. Altogether, the big five internet giants account for roughly half of all “prime-time” internet traffic, a phrasing that deliberately reflects the fact that internet usage swells and peaks at the same time as the classic prime-time television period, that is, 7 p.m. to 11 p.m.

    Importance des investissements des compagnies de l’internet dans les projets de câbles.

    Several things stand out from this analysis. First, in less than a decade, Google has carved out a very large place for itself through its ownership role in four of the six projects (the SJC, Faster, Unity, and Pacific Cable Light initiatives), while Facebook has stakes in two of them (APG and PLCN) and Microsoft in the PLCN project. This is a relatively new trend and one that should be watched in the years ahead.

    A preliminary view based on the publicly available information is that the US internet companies are important but subordinate players in consortia dominated by state-owned national carriers and a few relatively new competitors. Keen to wrest control of core elements of the internet infrastructure that they perceive to have been excessively dominated by United States interests in the past, Asian governments and private investors have joined forces to change things in their favor. In terms of the geopolitical economy of the internet, there is both a shift toward the Asia-Pacific region and an increased role for national governments.

    Return of the State as Regulator of Concentrated Markets

    In addition to the expanded role of the state as market builder, regulator, and information infrastructure policy maker, many regulators have also rediscovered the reality of significant market concentration in the telecom-internet and media industries. Indeed, the US government has rejected several high-profile telecoms mergers in recent years, such as AT&T’s proposal to take over T-Mobile in 2011, T-Mobile’s bid for Sprint in 2014, and Comcast’s attempt to acquire Time Warner Cable last year. Even the approval of Comcast’s blockbuster takeover of NBC Universal in 2011, and Charter Communications acquisition of Time Warner Cable last year, respectively, came with important strings attached and ongoing conduct regulation designed to constrain the companies’ ability to abuse their dominant market power.87 The FCC’s landmark 2016 ruling to reclassify broadband internet access as a common carrier further indicated that US regulators have been alert to the realities of market concentration and telecoms-internet access providers’ capacity to abuse that power, and the need to maintain a vigilant eye to ensure that their practices do not swamp people’s rights to freely express themselves, maintain control over the collection, retention, use, and disclosure of their personal information, and to access a diverse range of services over the internet.88 The 28 members of the European Union, along with Norway, India, and Chile, have adopted similar “common carriage/network neutrality/open network”89 rules to offset the reality that concentration in core elements of these industries is “astonishingly high”90 on the basis of commonly used indicators (e.g., concentration ratios and the Herfindahl–Hirschman Index).

    These developments indicate a new phase in internet governance and control. In the first phase, circa the 1990s, technical experts and organizations such as the Internet Engineers Task Force played a large role, while the state sat relatively passively on the sidelines. In the second phase, circa the early to mid-2000s, commercial forces surged to the fore, while internet governance revolved around the ICANN and the multi-stakeholder model. Finally, the revelations of mass internet surveillance by many states and ongoing disputes over the multi-stakeholder, “internet freedom” agenda on the one side, versus the national sovereignty, multilateral model where the ITU and UN system would play a larger role in internet governance all indicate that significant moves are afoot where the relationship between states and markets is now in a heightened state of flux.

    Such claims, however, are overdrawn. They rely too heavily on the same old “realist,” “struggle for control” model where conflict between nation-states has loomed large and business interests and communication technologies served mainly as “weapons of politics” and the handmaidens of national interests from the telegraph in the nineteenth century to the internet today. Yet, nation-states and private business interests, then and now, not only compete with one another but also cooperate extensively to cultivate a common global space of economic accumulation. Communication technologies and business interests, moreover, often act independent of the nation-state and via “private structures of cooperation,” that is, cartels and consortia, as the history and contemporary state of the undersea cable networks illustrate. In fact, the internet infrastructure of the twenty-first century, much like that of the industrial information infrastructure of the past 150 years, is still primarily financed, owned, and operated by many multinational consortia, although more than a few submarine communications cables are now owned by a relatively new roster of competitive players, such as Tata, Level 3, Global Cloud Xchange, and so forth. They have arisen mostly in the last 20 years and from new quarters, such as India in the case of Tata, for example.

    #Economie_numérique #Géopolitique #Câbles_sous_marins

  • EU copyright reform is coming. Is your startup ready?
    https://medium.com/silicon-allee/eu-copyright-reform-is-coming-is-your-startup-ready-4be81a5fabf7?source=user

    Last Friday, members of Berlin’s startup community gathered at Silicon Allee for a copyright policy roundtable discussion hosted by Allied for Startups. The event sparked debate and elicited feedback surrounding the European Commission’s complex drafted legislation that would have significant impact on startups in the EU. Our Editor-in-Chief, Julia Neuman, gives you the rundown here — along with all the details you should know about the proposed reform.

    ‘Disruption’ in the startup world isn’t always a good thing — especially when it involves challenging legislation. Over the past five years, as big data and user-generated content began to play an increasing role in our society, startups have worked tirelessly to navigate laws regarding privacy and security in order to go about business as usual. Now, they may soon be adding copyright concerns to their list of potential roadblocks.

    The forthcoming copyright reform proposed by the European Commission severely threatens the success and momentum that startups have gained in the EU, and it’s being introduced under the guise of “a more modern, more European copyright framework.”

    On September 14, 2016, the European Commission tabled its Proposal for a Directive on Copyright in the Digital Single Market (commonly referred to as the “Copyright Directive”) — a piece of draft legislation that would have significant impact on a wide variety of modern copyrighted content. Consequently, it poses a direct threat to startups.

    Members of the startup community are now coming together, unwilling to accept these measures without a fight. On Friday, members of Allied for Startups and Silicon Allee — alongside copyright experts and Berlin-based entrepreneurs and investors — met at Silicon Allee’s new campus in Mitte for a policy roundtable discussion. Additional workshop discussions are taking place this week in Warsaw, Madrid and Paris. The ultimate goal? To get startups’ voices heard in front of policymakers and counter this legislation.
    Sparking conversation at Silicon Allee

    Bird & Bird Copyright Lawyer and IP Professor Martin Senftleben led the roundtable discussions in Berlin, outlining key clauses and offering clarifying commentary. He then invited conversation from guests — which included representatives from content-rich startups such as Fanmiles, Videopath, and Ubermetrics. The result was a well-balanced input of perspectives and testimonials that sparked an increased desire to fight back. The roundtable covered the three main areas affected by the proposed reforms: user-generated content, text and data mining, and the neighboring right for press publishers.
    User-generated content

    The internet has allowed us all to become content creators with an equal opportunity to make our voices heard around the world. With this transition comes evolving personal responsibilities. Whereas in the past, copyright law only concerned a small percentage of society — today it concerns anyone posting to social media, uploading unique content, or founding a company that relies on user-generated content as part of its business model.

    The proposed EU copyright reform shifts copyright burden to content providers, making them liable for user content and forcing them to apply content filtering technology to their platforms. As it stands now, management of copyright infringement is a passive process. Companies are not required to monitor or police user-generated content, instead waiting for infringement notices to initiate relevant takedowns.

    New laws imply that companies would have to constantly police their platforms. As you can imagine, this would quickly rack up operating costs — not to mention deter investors from committing if there’s such a inherently persistent and high legal risk for copyright infringement. Furthermore, filtering technology would not exactly promote public interest or media plurality, as an efficiency-based filtering system would be more likely to result in overblocking and censoring (even if unintentional). This result is counter to the expressed aims of the reform.

    “Having this necessity to add filtering technology from the start would kill any innovation for new startups, which is the reason why we’re all here and this economy is booming and creating jobs,” said Fabian Schmidt, Founder of Fanmiles. “The small companies suddenly cannot innovate and compete anymore.”

    Text and data mining

    The proposed reform also blocks startups from using text and data mining technology, consequently preventing the rich kind of data analysis that has added value and yielded deeper insights for growing startups. Copyright law today accounts for lawful access and consultation, however not for the automated process of reading and drawing conclusions. The scraping and mining of freely available texts could give rise to complex, costly legal problems from the get-go — problems that not even the most prudent founder teams could navigate (unless they work to the benefit of research institutions, which are exempt from the measure).

    What kind of message does this send out to new startups? As with laws dealing with user-generated content, these measures don’t entice entrepreneurs to turn their seeds of ideas into profitable companies. Nor do they get VCs jumping to invest. Data input from mining and scraping suddenly gives rise to a huge legal issue that certainly does not benefit the public interest.

    Senftleben reminded the group in Berlin that these types of legislation normally take several years to implement, and that the proposed policy could have amplified effects down the road as the role of data mining increases. “If this legislation is already limiting now, who knows what kind of text and data mining will be used in ten years and how it will play in,” he said.
    Neighboring right for press publishers

    The third and final point discussed at the roundtable has gathered the most media attention thus far. It’s the “elephant in the room,” unjustly pitting established publishers against startups. Proposed legislation creates an exclusive right for publishers that protects their content for digital use in order to “to ensure quality journalism and citizens’ access to information.”

    Sure, this reasoning sounds like a positive contribution to a free and democratic society. But closer examination reveals that these publishers’ outdated and financially unviable business models are being grandfathered in for protection at the expense of more innovative content models.

    It’s not hard to see why this is happening. Publishers have lobbying power, and they are bleeding money in today’s digital climate. “I work a lot with publishers. Their position here in Europe is a little more old school,” said one of the founders present at the discussion. “Their business model and revenues are going down, so they’re going to fight hard.”

    Axel Springer, for example, is lobbying for greater protection; they want a piece of Google’s success. But the most interesting aspect of this measure is that it’s unclear how much value it would add for publishers, who already have rights to digital reproduction from the individual content creators employed under contract with their firms. A freelance journalist contributing to Die Zeit, for example, is already transferring digital reproduction rights to the newspaper just by agreeing to publish.

    The drafted legislation makes it pretty clear that content aggregating search engines would take a big hit when they would inevitably have to pay content reproduction fees to publishers. But the interdependent relationship between publishers and online search aggregation services makes this legislation unlikely to generate a meaningful revenue stream for publishers anyway: Publishers want compensation for snippets of articles that show up on search engines, and search engines want compensation for bringing attention to them in the first place. In the end, content aggregators would likely just stop their use of content fragments instead of resorting to pay license fees to publishers.

    It’s unclear how the proposed legislation could promote media plurality and freedom; instead, it seems to promote market concentration and monopolization of content publishing, potentially stifling free and open access to information.

    “I know two small aggregators here in Germany that have given up because of this,” said Tobias Schwarz, Coworking Manager at Sankt Oberholz in Berlin.

    What comes next? Turning discussion into action

    What is clear now is that copyright law has potential to affect anyone. Startups in Europe, especially, are at risk with these new reforms. As players in the European economy, they have not been present in the policy debate so far. Allied for Startups and Silicon Allee are inviting founders, entrepreneurs, and interested members in the tech community to come forward and make their voices heard. They invite contributions to an open letter to the European Parliament which dives into this topic in more detail, explaining how toxic the Copyright Directive is for companies who are trying to stay alive without incurring €60 million in development costs.

    “A lot of startup leaders have their heads down working on their next feature, without realizing policymakers are also creating something that can instantly kill it,” said Silicon Allee co-founder Travis Todd. “But if more startups come to the table and tell others what they learned, they will become more aware of these potential roadblocks and ultimately help change them.”

    To find out more information, participate at the next discussion, or share your ideas and testimonials on this policy discussion, please get in touch! Drop a line to hello@alliedforstartups.org, tweet to @allied4startups, or join the online conversation using #copyright4startups.

  • About OldMapsOnline | OldMapsOnline

    http://www.oldmapsonline.org/about

    http://digitool.is.cuni.cz:5881/ImageServer/imageserver?&res=3&viewwidth=509&viewheight=636&imgClickX=0&im

    The search engine for historical maps

    OldMapsOnline developed out of a love of history and heritage of old maps. The project began as a collaboration between Klokan Technologies GmbH, Switzerland and The Great Britain Historical GIS Project based at the University of Portsmouth, UK thanks to funding from JISC. Since January 2013 is the project improved and maintained by volunteers and the team of Klokan Technologies GmbH in their free time.
    You can contact the project team at: info@oldmapsonline.org For inquiries related to rights for a particular map contact the relevant institution bellow!
    Technology

    The website has been created by Klokan Technologies GmbH, specialists in online map publishing and in applications of open-source software. It aims to demonstrate a combination of tools for publishing historical maps with a focus on their easy accessibility for the general public. The core of the retrieval system is the MapRank Search software, developed originally for the Swiss Kartenportal.CH project. The web is optimised for search engines with GeoSEO powered by Linked Data, and it also exposes Structured Data (Schema.org microdata). The technology and expertise from this project can be applied on another projects on request.
    Let’s work together
    We are keen to improve this project and participate on a new research and projects in the field of digital historical cartography. We are open for partnership on national or european R&D projects. Don’t hesitate to contact us!

    #cartographie_historique #cartes_anciennes

  • Where do DOI clicks come from?
    https://www.crossref.org/blog/where-do-doi-clicks-come-from

    The top 10 referring domains for the period:

    webofknowledge.com
    baidu.com
    serialssolutions.com
    scopus.com
    exlibrisgroup.com
    wikipedia.org
    google.com
    uni-trier.de
    ebsco.com
    google.co.uk

    It’s not surprising to see some of these domains here: for example serialssolutions.com and exlibrisgroup.com are effectively proxies for link resolvers, Baidu and Google are incredibly popular search engines which would show up anywhere. But it is exciting to see Wikipedia ranked amongst these. For more detail look out for the new Chronograph.

    #DOI
    #xyzaeiou

  • Theresa May promises a British version of Iran’s Halal Internet / Boing Boing
    http://boingboing.net/2017/05/19/little-england-little-internet.html

    The government now appears to be launching a similarly radical change in the way that social networks and internet companies work. While much of the internet is currently controlled by private businesses like Google and Facebook, Theresa May intends to allow government to decide what is and isn’t published, the manifesto suggests.

    The new rules would include laws that make it harder than ever to access pornographic and other websites. The government will be able to place restrictions on seeing adult content and any exceptions would have to be justified to ministers, the manifesto suggests.

    The manifesto even suggests that the government might stop search engines like Google from directing people to pornographic websites. “We will put a responsibility on industry not to direct users – even unintentionally – to hate speech, pornography, or other sources of harm,” the Conservatives write.

    The laws would also force technology companies to delete anything that a person posted when they were under 18.

    But perhaps most unusually they would be forced to help controversial government schemes like its Prevent strategy, by promoting counter-extremist narratives.

    #Grande_Bretagne #Royaume-Uni #censure #internet