Welcome to Apache Avro !

http://avro.apache.org

  • The problem of managing schemas - O’Reilly Radar
    http://radar.oreilly.com/2014/11/the-problem-of-managing-schemas.html

    with #CSV and #JSON data, the data has a schema, but the schema isn’t stored with the data. For example, CSV files have columns, and those columns have meaning. They represent IDs, names, phone numbers, etc. Each of these columns also has a data type: they can represent integers, strings, or dates. There are also some constraints involved — you can dictate that some of those columns contain unique values or that others will never contain nulls. All this information exists in the head of the people managing the data, but it doesn’t exist in the data itself.

    The people who work with the data don’t just know about the schema; they need to use this knowledge when processing and analyzing the data. So the schema we never admitted to having is now coded in Python and Pig, Java and R, and every other application or script written to access the #data.

    solution: #AVRO Apache http://avro.apache.org
    cc @lazuly