industryterm:natural language processing

  • WordNet — A Lexical Database for English

    « WordNet® is a large lexical database of English. Nouns, verbs, adjectives and adverbs are grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept. Synsets are interlinked by means of conceptual-semantic and lexical relations. The resulting network of meaningfully related words and concepts can be navigated with the browser. WordNet is also freely and publicly available for download. WordNet’s structure makes it a useful tool for computational linguistics and natural language processing. »

    https://wordnet.princeton.edu #knowledge #graph #thesaurus #text

  • XDL Framework: Delivering powerful Performance for Large-scale Deep Learning Applications
    https://hackernoon.com/xdl-framework-delivering-powerful-performance-for-large-scale-deep-learn

    The Alibaba tech team open sourced its self-developed deep learning framework that goes where others have failedDeep learning AI technologies have brought remarkable breakthroughs to fields including speech recognition, computer vision, and natural language processing, with many of these developments benefiting from the prevalence of open source deep learning frameworks like TensorFlow, PyTorch, and MxNet. Nevertheless, efforts to bring deep learning to large-scale, industry-level scenarios like advertising, online recommendation, and search scenarios have largely failed due to the inadequacy of available frameworks.Whereas most open source frameworks are designed for low-dimensional, continuous data such as in images and speech, a majority of Internet applications deal with (...)

    #artificial-intelligence #data-analysis #machine-learning #deep-learning #hackernoon-top-story

  • BPEmb : Subword Embeddings in 275 Languages
    Benjamin Heinzerling and Michael Strube
    https://nlp.h-its.org/bpemb

    What is this?

    BPEmb is a collection of pre-trained subword #embeddings in 275 #languages, based on Byte-Pair Encoding (BPE) and trained on Wikipedia. Its intended use is as input for neural models in natural language processing.
    tl;dr

    Subwords allow guessing the meaning of unknown / out-of-vocabulary words. E.g., the suffix -shire in Melfordshire indicates a location.
    Byte-Pair Encoding gives a subword segmentation that is often good enough, without requiring tokenization or morphological analysis. In this case the BPE segmentation might be something like melf ord shire.
    Pre-trained byte-pair embeddings work surprisingly well, while requiring no tokenization and being much smaller than alternatives: an 11 MB BPEmb English model matches the results of the 6 GB FastText model in our evaluation.

    #NLP

  • What Kagglers are using for Text Classification
    https://hackernoon.com/what-kagglers-are-using-for-text-classification-c695b58b5709?source=rss-

    Advanced NLP techniques for deep learningWith the problem of Image Classification is more or less solved by Deep learning, Text Classification is the next new developing theme in deep learning. For those who don’t know, Text classification is a common task in natural language processing, which transforms a sequence of a text of indefinite length into a category of text. How could you use that?To find the sentiment of a review.Find toxic comments on a platform like FacebookFind Insincere questions on Quora. A current ongoing competition on KaggleFind fake reviews on websitesWill a text advert get clicked or notAnd much more. The whole internet is filled with text and to categorize that information algorithmically will only give us incremental benefits, to say the least in the field of AI.Here (...)

    #machine-learning #artificial-intelligence #ai #data-science #deep-learning

  • Text summarizer using deep learning made easy
    https://hackernoon.com/text-summarizer-using-deep-learning-made-easy-490880df6cd?source=rss----

    In this series we will discuss a truly exciting natural language processing topic that is using deep learning techniques to summarize text , the code for this series is open source , and is found in a jupyter notebook format , to allow it to run on google colab without the need to have a powerful gpu , in addition all data is open source , and you don’t have to download it , as you can connect google colab with google drive and put your data directly onto google drive , without the need to download it locally , read this blog to learn more about google colab with google drive .To summarize text you have 2 main approaches (i truly like how it is explained in this blog)Extractive method , which is choosing specific main words from the input to generate the output , this model tends to work , but (...)

    #machine-learning #seq2seq #text-summarization #artificial-intelligence #ai

  • Summarization With Wine Reviews Using #spacy
    https://hackernoon.com/summarization-with-wine-reviews-using-spacy-b49f18399577?source=rss----3

    “You don’t need a silver fork to eat good food.”? Introduction Wine ReviewsIn this article, I will try to explore the Wine Reviews Dataset. It contains 130k of reviews in Wine Reviews. And at the end of this article, I will try to make simple text summarizer that will summarize given reviews. The summarized reviews can be used as a reviews title also.I will use spaCy as natural language processing library for handling this project.? Object Of This ProjectThe objective of this project is to build a model that can create relevant summaries for reviews written on Wine reviews. This dataset contains above 130k reviews, and is hosted on Kaggle.What Is Text Summarization?Text summarization is the process of distilling the most important information from a source (or sources) to produce an abridged (...)

    #nltk #machine-learning #nlp #hackernoon

  • Text Summarization Using #keras Models
    https://hackernoon.com/text-summarization-using-keras-models-366b002408d9?source=rss----3a8144e

    Learn how to summarize text in this article by Rajdeep Dua who currently leads the developer relations team at Salesforce India, and Manpreet Singh Ghotra who is currently working at Salesforce developing a machine learning platform/APIs.Text summarization is a method in natural language processing (NLP) for generating a short and precise summary of a reference document. Producing a summary of a large document manually is a very difficult task. Summarization of a text using machine learning techniques is still an active research topic. Before proceeding to discuss text summarization and how we do it, here is a definition of summary.A summary is a text output that is generated from one or more texts that conveys relevant information from the original text in a shorter form. The goal of (...)

    #artificial-intelligence #machine-learning #keras-models #deep-learning

  • Amazon, AI and Medical Records: Do the Benefits Outweigh the Risks? - Knowledge Wharton
    http://knowledge.wharton.upenn.edu/article/amazon-medical-records

    Last month, Amazon unveiled a service based on AI and machine-learning technology that could comb through patient medical records and extract valuable insights. It was seen as a game changer that could alleviate the administrative burden of doctors, introduce new treatments, empower patients and potentially lower health care costs. But it also carries risks to patient data privacy that calls for appropriate regulation, according to Wharton and other experts.

    Branded Comprehend Medical, the Amazon Web Services offering aims “to understand and analyze the information that is often trapped in free-form, unstructured medical text, such as hospital admission notes or patient medical histories.” Essentially, it is a natural language processing service that pores through medical text for insights into disease conditions, medications and treatment outcomes from patient notes and other electronic health records.

    The new service is Amazon’s latest foray into the health care sector. In June, the company paid $1 billion to buy online pharmacy PillPack, a Boston-based startup that specializes in packing monthly supplies of medicines to chronically ill patients. In January, Amazon teamed up with Berkshire Hathaway and JPMorgan Chase to form a health care alliance that aims to lower costs and improve the quality of medical care for their employees.

    “Health care, like everything else, is becoming more of an information-based industry, and data is the gold standard — and Amazon knows as well as anyone how to handle and analyze data,” said Robert Field, Wharton lecturer in health care management who is also professor of health management and policy at Drexel University. “It’s a $3.5 trillion industry and 18% of our economy, so who wouldn’t want a piece of that?”

    AI offers “enormous” promise when it comes to bringing in new and improved treatments for patient conditions, such as in the area of radiology, added Hempstead. Machine learning also potentially enables the continual improvement of treatment models, such as identifying people who could participate in clinical trials. Moreover, Amazon’s service could “empower a consumer to be more in charge of their own health and maybe be more active consumer of medical services that might be beneficial to their health,” she said.

    On the flip side, it also could enable insurers to refuse to enroll patients that they might see as too risky, Hempstead said. Insurers are already accessing medical data and using technology in pricing their products for specific markets, and the Amazon service might make it easier for them to have access to such data, she noted.

    #Santé_publique #Données_médicales #Amazon #Intelligence_artificielle

  • The Biggest Misconceptions about Artificial Intelligence
    http://knowledge.wharton.upenn.edu/article/whats-behind-the-hype-about-artificial-intelligence-separat

    Knowledge@Wharton: Interest in artificial intelligence has picked up dramatically in recent times. What is driving this hype? What are some of the biggest prevailing misconceptions about AI and how would you separate the hype from reality?

    Apoorv Saxena: There are multiple factors driving strong interest in AI recently. First is significant gains in dealing with long-standing problems in AI. These are mostly problems of image and speech understanding. For example, now computers are able to transcribe human speech better than humans. Understanding speech has been worked on for almost 20 to 30 years, and only recently have we seen significant gains in that area. The same thing is true of image understanding, and also of specific parts of human language understanding such as translation.

    Such progress has been made possible by applying an old technique called deep learning and running it on highly distributed and scalable computing infrastructure. This combined with availability of large amounts of data to train these algorithms and easy-to-use tools to build AI models, are the major factors driving interest in AI.

    It is natural for people to project the recent successes in specific domains into the future. Some are even projecting the present into domains where deep learning has not been very effective, and that creates a lot of misconception and also hype. AI is still pretty bad in how it learns new concepts and extending that learning to new contexts.

    For example, AI systems still require a tremendous amount of data to train. Humans do not need to look at 40,000 images of cats to identify a cat. A human child can look at two cats and figure out what a cat and a dog is — and to distinguish between them. So today’s AI systems are nowhere close to replicating how the human mind learns. That will be a challenge for the foreseeable future.

    Alors que tout est clean, la dernière phrase est impressionnante : « That will be a challenge for the foreseeable future ». Il ne s’agit pas de renoncer à la compréhension/création de concepts par les ordinateurs, mais de se donner le temps de le faire demain. Dans World without mind , Franklin Foer parle longuement de cette volonté des dirigeants de Google de construire un ordinateur qui serait un cerveau humain amélioré. Mais quid des émotions, des sentiments, de la relation physique au monde ?

    As I mentioned in narrow domains such as speech recognition AI is now more sophisticated than the best humans while in more general domains that require reasoning, context understanding and goal seeking, AI can’t even compete with a five-year old child. I think AI systems have still not figured out to do unsupervised learning well, or learned how to train on a very limited amount of data, or train without a lot of human intervention. That is going to be the main thing that continues to remain difficult . None of the recent research have shown a lot of progress here.

    Knowledge@Wharton: In addition to machine learning, you also referred a couple of times to deep learning. For many of our readers who are not experts in AI, could you explain how deep learning differs from machine learning? What are some of the biggest breakthroughs in deep learning?

    Saxena: Machine learning is much broader than deep learning. Machine learning is essentially a computer learning patterns from data and using the learned patterns to make predictions on new data. Deep learning is a specific machine learning technique.

    Deep learning is modeled on how human brains supposedly learn and use neural networks — a layered network of neurons to learn patterns from data and make predictions. So just as humans use different levels of conceptualization to understand a complex problem, each layer of neurons abstracts out a specific feature or concept in an hierarchical way to understand complex patterns. And the beauty of deep learning is that unlike other machine learning techniques whose prediction performance plateaus when you feed in more training data, deep learning performance continues to improve with more data. Also deep learning has been applied to solve very different sets of problems and shown good performance, which is typically not possible with other techniques. All these makes deep learning special, especially for problems where you could throw in more data and computing power easily.

    Knowledge@Wharton: The other area of AI that gets a lot of attention is natural language processing, often involving intelligent assistants, like Siri from Apple, Alexa from Amazon, or Cortana from Microsoft. How are chatbots evolving, and what is the future of the chatbot?

    Saxena: This is a huge area of investment for all of the big players, as you mentioned. This is generating a lot of interest, for two reasons. It is the most natural way for people to interact with machines, by just talking to them and the machines understanding. This has led to a fundamental shift in how computers and humans interact. Almost everybody believes this will be the next big thing.

    Still, early versions of this technology have been very disappointing. The reason is that natural language understanding or processing is extremely tough. You can’t use just one technique or deep learning model, for example, as you can for image understanding or speech understanding and solve everything. Natural language understanding inherently is different. Understanding natural language or conversation requires huge amounts of human knowledge and background knowledge. Because there’s so much context associated with language, unless you teach your agent all of the human knowledge, it falls short in understanding even basic stuff.

    De la compétition à l’heure du vectorialisme :

    Knowledge@Wharton: That sounds incredible. Now, a number of big companies are active in AI — especially Google, Microsoft, Amazon, Apple in the U.S., or in China you have Baidu, Alibaba and Tencent. What opportunities exist in AI for startups and smaller companies? How can they add value? How do you see them fitting into the broader AI ecosystem?

    Saxena: I see value for both big and small companies. A lot of the investments by the big players in this space are in building platforms where others can build AI applications. Almost every player in the AI space, including Google, has created platforms on which others can build applications. This is similar to what they did for Android or mobile platforms. Once the platform is built, others can build applications. So clearly that is where the focus is. Clearly there is a big opportunity for startups to build applications using some of the open source tools created by these big players.

    The second area where startups will continue to play is with what we call vertical domains. So a big part of the advances in AI will come through a combination of good algorithms with proprietary data. Even though the Googles of the world and other big players have some of the best engineering talent and also the algorithms, they don’t have data. So for example, a company that has proprietary health care data can build a health care AI startup and compete with the big players. The same thing is true of industries such as finance or retail.

    #Intelligence_artificielle #vectorialisme #deep_learning #Google

  • [1710.10777] Understanding Hidden Memories of Recurrent Neural Networks

    https://arxiv.org/abs/1710.10777

    Recurrent neural networks (RNNs) have been successfully applied to various natural language processing (NLP) tasks and achieved better results than conventional methods. However, the lack of understanding of the mechanisms behind their effectiveness limits further improvements on their architectures. In this paper, we present a visual analytics method for understanding and comparing RNN models for NLP tasks. We propose a technique to explain the function of individual hidden state units based on their expected response to input texts. We then co-cluster hidden state units and words based on the expected response and visualize co-clustering results as memory chips and word clouds to provide more structured knowledge on RNNs’ hidden states. We also propose a glyph-based sequence visualization based on aggregate information to analyze the behavior of an RNN’s hidden state at the sentence-level. The usability and effectiveness of our method are demonstrated through case studies and reviews from domain experts.

    #langues #langage #mots #terminologie #grammaire

  • CASM : Centre for the Analytics of Social media (UK)

    http://www.demos.co.uk/projects/casm

    It produces new political, social and policy insight and understanding through social media research which combines new technologies with the social sciences.book the dark net

    Research areas: data dashboards (big data), software development for early detection of emerging events, e-health, scurity, ethics, privacy, digital democracy, public responses to announcements, speeches and events

    CASM, in collaboration with a wide network of experts and leaders in the field, combines natural language processing, machine learning, statistics, data visuals, grounded theory and ethnography in order to develop large scale social media analysis as a valid instrument of research is ethical, reliable, and useable.

  • Embedly makes your content more engaging and easier to share | Embedly
    http://embed.ly

    Embed
    Get the world’s most powerful tool for embedding videos, photos, and rich media into websites.
    Natural language processing and text analysis to retrieve elements and text from articles for smarter use in websites and apps

    Extract
    Use the elements—colors, text, keywords, and entities—that you want from articles. Discard the rest automatically.
    Easy image processing to resize and optimize images for better, faster display on websites and apps

    Display
    Make the images you use look great—and display quickly—on any screen, every time.
    ...

    Card Generator
    Generate an interactive and responsive Card to post on your site.

    Embed Button
    Make your content easier to share and embed on other sites with the Embed Button.

    The Bookmarklet
    Quickly generate a Card from any web page with the bookmarklet.

    Je me demande comment situer oEmbed pas rapport à un tel service web.