notes on CIWS data feed

CIWS Live display

https://ciws.wx.ll.mit.edu/LiveDisplay

wendell
z...Cvb

Atom data feed

parsed item similar to beautifulsoup; "feedparser seems to be the most popular feed parser library"

doc: https://pythonhosted.org/feedparser/index.html

pypi: https://pypi.org/project/feedparser/

latest: Feb 24, 2020

Atom, RSS and JSON feed parser for Python 3 Atoma is another library for Python 3 that deals with RSS feed parsing. An optional but useful library to install alongside Atoma is Requests.

pypi: https://pypi.org/project/atoma/

github: https://github.com/NicolasLM/atoma

last update: Jul 7, 2019

https://www.geeks3d.com/hacklab/20190118/how-to-use-feedparser-and-atoma-to-read-rss-feeds-in-python-3/

Product description

issue tracker: https://cf-trac.llnl.gov/trac
last item dated: 4 years ago

General Cartographic Transformation Package (GCTP)
URL: http://gcmd.nasa.gov/records/USGS-GCTP.html
Welcome to the CEOS International Directory Network (IDN) - a Gateway to the world of Earth Science data.

The GCTP library, named “geolib.a”, contains functions that perform forward and inverse mapping projection conversions.

asdi-db

Cisco AnyConnect Secure Mobility Client

-> connect.cssiinc.com

"Connected to ..."

Server: asdi-db.cssiinc.com
User: wturner@cssiinc.com
Pass: xxxxx

postgis on asdi

postgresql v10 on port 5433 still installed and running:

    "dbname='swim' user='cssiuser' host='asdi-db.cssiinc.com' " + \
    " port='5433' password='REALab'"

yep.

    "dbname='swim' user='postgres' host='asdi-db.cssiinc.com' " + \
    " port='5433' password='425thirdst'"

yep.

    #"dbname='cssitest' user='postgres' host='asdi-db.cssiinc.com' " + \
    #" port='5432' password='cssisuper'"

nope.

Install of PostgreSQL 12.3, PostGIS 3.0 on asdi-db

TODO: open postgresql port 5432 to outside
done: turn off v10 on port 5433

-> Server Manager -> Local Server -> Services (middle window) -> "postgresql-x64-10 = PostgreSQL Server 10"

-> Stop Services

seems to have stopped it; need to:
1) migrate existing Postg v10 tables to v12 (however, nobody cares)
2) uninstall v10

install

will go for V12.3 (which is NOT compatible with MobilityDB, but neither is windows, so there.)

WTurner@ASDI-DB ~
$ mkdir ciws
$ cd ciws/
$ mkdir postgresql
$ cd postgresql/

~/ciws/postgresql

and use windows explorer to copy over installation exe

username: postgres
425thirdst
port: 5432

test/sample spatial db: postgis_30_sample

$ psql -h localhost -U postgres
Password for user postgres:
Type "help" for help.

postgres=# \dt
Did not find any relations.
postgres=# \l
                                                     List of databases
       Name        |  Owner   | Encoding |          Collate           |           Ctype            |   Access privileges
-------------------+----------+----------+----------------------------+----------------------------+-----------------------
 postgis_30_sample | postgres | UTF8     | English_United States.1252 | English_United States.1252 |
 postgres          | postgres | UTF8     | English_United States.1252 | English_United States.1252 |
 template0         | postgres | UTF8     | English_United States.1252 | English_United States.1252 | =c/postgres          +
                   |          |          |                            |                            | postgres=CTc/postgres
 template1         | postgres | UTF8     | English_United States.1252 | English_United States.1252 | =c/postgres          +
                   |          |          |                            |                            | postgres=CTc/postgres
(4 rows)

admin

user        pw           privs
postgres    425thirdst   superuser
ciwsuser    weather      all for ciwsdb
wxreader    readwx       role==readonly

role       privs
readonly   read only

first, create the db:

    psql -h localhost -p 5432 -U postgres
    <admin pw>

    postgres=# create database ciwsdb;
    CREATE DATABASE

login and add extension:

    $ psql -h localhost -p 5432 -U postgres ciwsdb
    Password for user postgres:
    psql (12.2, server 12.3)

    ciwsdb=# CREATE EXTENSION postgis;
    CREATE EXTENSION

and user:

    ciwsdb=# create user ciwsuser with encrypted password 'weather';
    CREATE ROLE
    ciwsdb=# grant all privileges on database ciwsdb to ciwsuser;
    GRANT

and test user access:

    $  psql -h localhost -p 5432 -U ciwsuser ciwsdb
    Password for user ciwsuser:
    psql (12.2, server 12.3)

    ciwsdb=> \dt
                  List of relations
     Schema |      Name       | Type  |  Owner
    --------+-----------------+-------+----------
     public | spatial_ref_sys | table | postgres
    (1 row)

Cygwin

1) updated everything; seems ok.

2) installed Python 3.7

3) consider: remove Python 2.7 ???

python installation maintenance

psycopg2

via cygwin setup:

    install python3.7-pip
    install python3.7-devel
    install libpq-devel
    postgresql-devel (was ALREADY installed)

and here goes:

    $ /usr/bin/pip3.7 install psycopg2
    Collecting psycopg2
      Using cached psycopg2-2.8.5.tar.gz (380 kB)
    Using legacy setup.py install for psycopg2, since package 'wheel' is not installed.
    Installing collected packages: psycopg2
        Running setup.py install for psycopg2 ... done
    Successfully installed psycopg2-2.8.5

Yipee!! import of psycopg2 works in cygwin's python!

some suggestions from: https://stackoverflow.com/questions/7631080/error-installing-psycopg2-in-cygwin?rq=1

pandas, geopandas:

    $ mypip install pandas
    Collecting pandas
      Downloading https://files.pythonhosted.org/packages/
    Installing collected packages: numpy, python-dateutil, pandas
    Successfully installed numpy-1.19.1 pandas-1.1.0 python-dateutil-2.8.1

geopandas:

cygwin:
    gdal
    libgdal-devel
    python-gdal
    libgeos-devel
    libproj5
    libproj-devel
    proj 6.3.1

and prob. others...

    $ /usr/bin/pip3.7 install geopandas

    Successfully built pyproj pandas numpy

    Installing collected packages: attrs, click, cligj, click-plugins, six,
    munch, fiona, pyproj, shapely, pytz, python-dateutil,
    numpy, pandas, geopandas

    Successfully installed attrs-19.3.0 click-7.1.2 click-plugins-1.1.1 cligj-0.5.0
    fiona-1.8.13.post1 geopandas-0.8.1 munch-2.5.0 numpy-1.19.1 pandas-1.1.0
    pyproj-2.6.1.post1 python-dateutil-2.8.1 pytz-2020.1 shapely-1.7.0 six-1.15.0

is taking a long time... (about an hour or more), but FINALLY:

    $ python3.7
    Python 3.7.7 (default, Apr 10 2020, 07:59:19)
    [GCC 9.3.0] on cygwin

    >>> import shapely
    >>> import pandas
    >>> import fiona
    >>> import numpy
    >>> import geopandas
    >>> import psycopg2
    >>>

and to test:

$ psql -h localhost -U ciwsuser ciwsdb
psql (12.2, server 12.3)

ciwsdb=> create table atest ( name text, position geography(Point, 4326));
ciwsdb=> insert into atest values ('angela', ST_GeomFromText('POINT(-71.064544 42.28787)'));
ciwsdb=> select name, st_astext(position) from atest;
  name  |         st_astext
--------+----------------------------
 angela | POINT(-71.064544 42.28787)
(1 row)

feedparser

    $ /usr/bin/pip3.7 install feedparser
    Collecting feedparser
      Downloading feedparser-5.2.1.tar.bz2
    Successfully installed feedparser-5.2.1

port 5432 "alternatives"

1) setup an 'extra' account on webfaction: https://docs.webfaction.com/user-guide/access.html?highlight=additional%20login#additional-users

and setup .ssh keys (somehow?)

and only allow access to ??? directory

2) install paramiko on both

2.a) home laptop:

  c/_cffi_backend.c:15:10: fatal error: ffi.h: No such file or directory


cygwin: have python37-cffi
cygwin install: libffi-devel
build/wheel of PyNaCl has begun... 11:32am

after 32 mins: (pynacl may have gotten installed)
build/temp.cygwin-3.1.6-x86_64-3.7/_openssl.c:575:10: fatal error: openssl/opensslv.h: No such file or directory 575 | #include
have: openssl cygwin install: openssl-devel; python37-openssl

    pip3.7 install paramiko
    Installing collected packages: pynacl, bcrypt, paramiko
    Successfully installed bcrypt-3.1.7 paramiko-2.7.1 pynacl-1.4.0

Yipee!

2.b) other machines:
client: done
server: compiling right now
finally, after >12 hours on asdi-db:
Successfully installed bcrypt-3.1.7 cryptography-3.0 paramiko-2.7.1 pynacl-1.4.0

2.c) psycopg2 on home laptop?

Q: cygwin postgresql-client ????
prob. requires: "Psycopg is a C wrapper around the libpq PostgreSQL client library."
already installed: libpq
cygwin install: postgresql-client ; nope
install: postgresql-devel ; nope already installed: libpq-devel ; <<< this fixed it

3) cc.py works from r machine

4) make sure you can ssh withtout pw:

ftp help from here: https://medium.com/@keagileageek/paramiko-how-to-ssh-and-file-transfers-with-python-75766179de73

also interesting: https://www.tutorialspoint.com/How-to-copy-a-file-to-a-remote-server-in-Python-using-SCP-or-SSH

vim

something got messed up on gvim startup on asdi-db
to get colors on asdi-db gvim:

    :source C:\cygwin64\WTurner\homepg\dot.vimrc

HOWEVER, then doesn't work, ONLY 'jk' does!!!

the actual feed code

following along from the instructions:

    $ unzip ClientPackageDelivery_CSSI_200816.zip

    $ mkdir libs
    $ cd libs
    $ unzip ../atom-libs.jar
    $ export LIB_DIR=/home/WTurner/ciws/feed/libs

    $ mkdir client
    $ cd client
    $ unzip ../atomClient.jar
    $ export ATOM_TARGET=/home/WTurner/ciws/feed/client

    $ export DOWNLOAD_DIR=/home/WTurner/ciws/feed/files

got past that:

    cd ~/ciws/feed/client/target/classes
    ./m2.sh

then:
java.lang.NoClassDefFoundError: com/sun/syndication/io/XmlReader

nope: d/l jdom from here: http://www.jdom.org/downloads/index.html

DO NOT USE cygwin paths; USE RELATIVE PATHS!!!

and change http: to https: and it worked!!!

the feed data files:

now what?

$ find files -exec file {} \;
files: directory
files/netcdf: directory
files/netcdf/ft_input: directory
files/netcdf/ft_input/downloadCompleted_EchoTopsForecast_1km: ASCII text
files/netcdf/ft_input/downloadCompleted_EchoTop_1km: ASCII text
files/netcdf/ft_input/downloadCompleted_VILForecast_1km: ASCII text
files/netcdf/ft_input/downloadCompleted_VIL_1km: ASCII text
files/netcdf/ft_input/edu.mit.ll.wx.ciws.EchoTop.Netcdf4.1km.20200817T161230Z.nc: Hierarchical Data Format (version 5) data
files/netcdf/ft_input/edu.mit.ll.wx.ciws.EchoTop.Netcdf4.1km.20200817T161500Z.nc: Hierarchical Data Format (version 5) data
files/netcdf/ft_input/edu.mit.ll.wx.ciws.EchoTopsForecast.Netcdf4.1km.20200817T160500Z.nc: Hierarchical Data Format (version 5) data
files/netcdf/ft_input/edu.mit.ll.wx.ciws.EchoTopsForecast.Netcdf4.1km.20200817T161000Z.nc: Hierarchical Data Format (version 5) data
files/netcdf/ft_input/edu.mit.ll.wx.ciws.VIL.Netcdf4.1km.20200817T161230Z.nc: Hierarchical Data Format (version 5) data
files/netcdf/ft_input/edu.mit.ll.wx.ciws.VIL.Netcdf4.1km.20200817T161500Z.nc: Hierarchical Data Format (version 5) data
files/netcdf/ft_input/edu.mit.ll.wx.ciws.VILForecast.Netcdf4.1km.20200817T161000Z.nc: Hierarchical Data Format (version 5) data

here goes...

plain (stock) gridded data:

Loading library to get version: hdf5.dll
error: No such file or directory

cygwin: install hdf5, hdf5-devel

issue: cygwin reports almost all hdf5 items are installed :-(

these are installed:

    ./lib/libhdf5.dll.a
    ./lib/libhdf5.settings
    ./lib/libhdf5_cpp.dll.a
    ./lib/libhdf5_hl.dll.a
    ./lib/libhdf5_hl_cpp.dll.a

Still trying to get either of these to work:

    $ pip3.7 install h5py
    > pip3.7 install --no-binary=h5py h5py

hdf5 under cywwin:

will attempt install of hdf5 from sources...

1) d/l source from here: https://www.hdfgroup.org/downloads/hdf5/source-code/

    ./configure
    # didn't barf...

Post process src/libhdf5.settings
config.status: executing depfiles commands
config.status: executing libtool commands
config.status: executing .classes commands
            SUMMARY OF THE HDF5 CONFIGURATION
            =================================

General Information:
-------------------
                   HDF5 Version: 1.12.0
                  Configured on: Tue Aug 18 13:02:13 EDT 2020
                  Configured by: wendell@wwtlaptop
                    Host system: x86_64-unknown-cygwin
              Uname information: CYGWIN_NT-6.3 wwtlaptop 3.1.6(0.340/5/3) 2020-07-09 08:20 x86_64 Cygwin
                       Byte sex: little-endian
             Installation point: /home/wendell/cssi/feed/netcdf/hdf5-1.12.0/hdf5

Compiling Options:
------------------
                     Build Mode: debug
              Debugging Symbols: yes
                        Asserts: yes
                      Profiling: no
             Optimization Level: debug

Linking Options:
----------------
                      Libraries: static, shared
  Statically Linked Executables:
                        LDFLAGS:
                     H5_LDFLAGS:  -no-undefined
                     AM_LDFLAGS:
                Extra libraries: -lz -ldl -lm
                       Archiver: ar
                       AR_FLAGS: cr
                         Ranlib: ranlib

Languages:
----------
                              C: yes
                     C Compiler: /usr/bin/gcc ( gcc (GCC) 9.3.0)
                       CPPFLAGS:
                    H5_CPPFLAGS:   ...
                    AM_CPPFLAGS:
                        C Flags:
                     H5 C Flags:   ...
                     AM C Flags:
               Shared C Library: yes
               Static C Library: yes


                        Fortran: no

                            C++: no

                           Java: no


Features:
---------
                   Parallel HDF5: no
Parallel Filtered Dataset Writes: no
              Large Parallel I/O: no
              High-level library: yes
                Build HDF5 Tests: yes
                Build HDF5 Tools: yes
                    Threadsafety: no
             Default API mapping: v112
  With deprecated public symbols: yes
          I/O filters (external): deflate(zlib)
                             MPE: no
                   Map (H5M) API: no
                      Direct VFD: no
              (Read-Only) S3 VFD: no
            (Read-Only) HDFS VFD: no
                         dmalloc: no
  Packages w/ extra debug output: AC,B2,CX,D,F,HL,I,O,S,ST,T,Z
                     API tracing: yes
            Using memory checker: no
 Memory allocation sanity checks: yes
          Function stack tracing: no
       Strict file format checks: yes
    Optimization instrumentation: no

ok so far,

    make   ; took maybe an hour
    make check

SOME checks FAILED, and there was no hdf5.dll file anywhere!

pip3.7 install h5py worked FINE on rserver!

ANALYSIS:

STATUS:

interesting non-gridded data:

$ pip3.7 install bs4 ; needs parser

have: xml need: lxml

hmm... choices:

a comment: https://stackoverflow.com/questions/4071696/python-beautifulsoup-xml-parsing

I'd recommend using the builtin ElementTree module. BeautifulSoup is meant to handle unwell-formed code like hacked up HTML, whereas XML is well-formed and meant to be read by an XML library. Update: some of my recent reading here suggests lxml as a library built on and enhancing the standard ElementTree.

and someone else said this:

I use Beautiful Soup for parsing XML. From the docs: "Beautiful Soup is a Python library for pulling data out of HTML and XML files." Beautiful Soup will use whichever parser you tell it to, including lxml.

From Sean Gillies: I enjoy using ElementTree. It's standardized in Python since 2.5 as xml.etree.ElementTree. Forgive me for being blunt, but you're using it wrong. I suggest trying the find, findtext, and findall methods when you know the structure of the data. Is Order your root element? If so,...

HOWEVER, elementTree was too hard; switched back to BeautifulSoup

    $ pip3.7 install lxml
    Error: Please make sure the libxml2 and libxslt development packages are installed.

cygwin: have libxml2, libxslt;
cygwin: install libxml2-devel, libxslt-devel;

    $ pip3.7 install lxml
    $./cc.py

works FINE.

Tuesday, 8/18 and Friday, 8/21

    ~/ciws/feed/myown  (on asdi-db)
        poll_ciws_feed.py
        atom_persist.json
    ilc: ~/cssi/feed/parsing  (on webfaction)
         ./xml2gpkg.py  files/edu.mit.ll.wx.ciws.Standard_VilForecastContours_20200817T210000Z.xml.gz

parsing VIL contours seems ok; will put aside and work on grid

ok, but slow on unix machines (prob. lots slower on windows)

work in progress

ALL pieces are in place!:

1) on asdi-db, in ~/ciws/feed/myown/ :

    ./poll_ciws_feed.py

to collect files

2) manually send one to webfaction:

    $ cd ~/ciws/feed/myown/files
    $ scp edu.mit.ll.wx.ciws.QuantizedVIL.Netcdf4.1km.20200825T025500Z.nc wendell@ilikecarrots.com:cssi/ciws/using_rserver

3) convert nc to csv:

    $ ~/cssi/ciws/using_rserver
    $ ./pp_vec_csv.py  2020-08-24 edu.mit.ll.wx.ciws.QuantizedVIL.Netcdf4.1km.20200825T025500Z.nc | \
             grep 2020 >  QuantizedVIL.Netcdf4.1km.20200825T025500Z.csv

4) convert to geopackage

    $ ./hh.py  QuantizedVIL.Netcdf4.1km.20200825T025500Z.csv

CORRECT name of geometry column!!!

5) move back to home laptop

    cd ~/cssi/feed/netcdf/from_rserver
    scp wendell@ilikecarrots.com:cssi/ciws/using_rserver/the_second.gpkg .

6) show in jupyter

7) compare to the official:

https://ciws.wx.ll.mit.edu/LiveDisplay

Yipee!!!

git

$ cd ciws_data_feed/
WTurner@ASDI-DB ~/ciws/feed/for_git/ciws_data_feed

$ git init
Initialized empty Git repository in /home/WTurner/ciws/feed/for_git/ciws_data_feed/.git/

WTurner@ASDI-DB ~/ciws/feed/for_git/ciws_data_feed
$ git add .

$ git remote add origin git@github.com:wendellwt/ciws_data_feed

$ git remote add origin git@github.com:wendellwt/ciws_data_feed.git

gave up on using git.

svn

initial commit (after lots of attempts):

    > svn --username wendell --password c... mkdir  http://repo.ilikecarrots.com/ciws_data_feed -m "initial import"
    Committing transaction...
    Committed revision 200.

    > svn --username wendell --password c... import ciws_data_feed/ http://repo.ilikecarrots.com/ciws_data_feed -m "initial import"
    Adding         ciws_data_feed/atom_persist.json
    Adding         ciws_data_feed/files
    Adding         ciws_data_feed/files/.svnignore
    Adding         ciws_data_feed/poll_ciws.py
    Adding         ciws_data_feed/xml2gpkg.py
    Committing transaction...
    Committed revision 201.

then, to check out:

    > svn --username wendell --password c... checkout http://repo.ilikecarrots.com/ciws_data_feed
    A    ciws_data_feed/atom_persist.json
    A    ciws_data_feed/files
    A    ciws_data_feed/files/.svnignore
    A    ciws_data_feed/poll_ciws.py
    A    ciws_data_feed/xml2gpkg.py
    Checked out revision 202.

PROBLEM:

NEED to ignore:

    files/
    atom_persist.json

tried this:

    $ svn propset svn:ignore files .
    property 'svn:ignore' set on '.'

    $ svn propset svn:ignore atom_persist.json .
    property 'svn:ignore' set on '.'

parsing gml

email sent to gdal mailing list said to use GMAL
cygwin:

install xerces-3.1, libxerces


GMLAS: https://gdal.org/drivers/vector/gmlas.html#vector-gmlas
Even's company: http://www.spatialys.com
mailing list: https://lists.osgeo.org/mailman/listinfo/gdal-dev

not on work laptop:

01:31:00 files $ ogrinfo -ro GMLAS:edu.mit.ll.wx.ciws.Standard_VilForecastContours_20200817T210000Z.xml
FAILURE:
Unable to open datasource `GMLAS:edu.mit.ll.wx.ciws.Standard_VilForecastContours_20200817T210000Z.xml' with the following drivers.
  -> PCIDSK
  -> netCDF
  -> PDS4
  -> JP2OpenJPEG

possibly useful web pages:

had too many tabs open in chrome; here were some useful ones:

ogrinfo: https://gdal.org/programs/ogrinfo.html#ogrinfo

GMLAS - Geography Markup Language (GML) driven by application schemas: https://gdal.org/drivers/vector/gmlas.html

GMLAS - Mapping examples https://gdal.org/drivers/vector/gmlas_mapping_examples.html

GDAL Virtual File Systems (compressed, network hosted, etc…): https://gdal.org/user/virtual_file_systems.html#virtual-file-systems

Even's overview of VSI: http://osgeo-org.1560.x6.nabble.com/gdal-dev-Understanding-VSI-caching-td5360850.html

[gdal-dev] Understanding VSI caching http://osgeo-org.1560.x6.nabble.com/gdal-dev-Understanding-VSI-caching-td5360850.html

run-time (?) config options: https://trac.osgeo.org/gdal/wiki/ConfigOptions

d/l of wxxm schema: http://wxxm.aero/page/documents-0

ucar wxxm page: https://ral.ucar.edu/projects/css-wx/external/wxxm/doc/2.0.0-RC2/xsd/2_0RC2.html

[xsd-users] compiling WXXM schema (2012): https://www.codesynthesis.com/pipermail/xsd-users/2012-June/003699.html

stupid eurocontrol site: https://www.eurocontrol.int/search?keywords=wx&sort_by=search_api_relevance

Geo tips & tricks; A new GDAL virtual file system to read streamed data (e.g. for OGR WFS) (Even R.) 2012: http://erouault.blogspot.com/2012/05/new-gdal-virtual-file-system-to-read.html

Messing around with (City)GML on GDAL 2.2 Jul 25, 2017 https://3d.bk.tudelft.nl/svitalis/citygml/gdal/2017/07/25/messing-around-with-citygml-on-gdal-2.2.html

Yipee, finally:

a doit.sh script that works:

export VSI_CACHE=NO
export CPL_CURL_VERBOSE=TRUE

~/usr/local/bin/ogrinfo.exe  -ro -al \
    -oo XSD=wxxm-2.1.0/schemas/wxxm.xsd,http://schemas.opengis.net/gml/3.2.1/gml.xsd,http://www.opengis.net/om/1.0/gml32,http://www.w3.org/1999/xlink.xsd \
    GMLAS:aaaa.xml

and an alias that works also:

alias aaa='~/usr/local/bin/ogrinfo.exe  -ro -oo XSD=wxxm-2.1.0/schemas/wxxm.xsd,http:/schemas.opengis.net/gml/3.2.1/gml.xsd,http://www.opengis.net/om/1.0/gml32,http://www.w3.org/1999/xlink.xsd'

nearest neighbor

PySAL claims to do it, but havn't found exact ref.

Efficiently grouping a list of coordinates points by location in Python https://stackoverflow.com/questions/24985127/efficiently-grouping-a-list-of-coordinates-points-by-location-in-python

wiki: Connected-component labeling https://en.wikipedia.org/wiki/Connected-component_labeling

There are many implementations to extract connected components. Below are a few for guidance.

cclabel: https://github.com/spwhitt/cclabel/blob/master/cclabel.py

findContours: OpenCV: https://docs.opencv.org/2.4/modules/imgproc/doc/structural_analysis_and_shape_descriptors.html#findcontours

bwconncomp Find connected components in binary image: https://www.mathworks.com/help/images/ref/bwconncomp.html

However, I've implemented this kind of blob detector and they are not that hard to write up if you are looking for a learning experience. If not, then I would go with the most mature library like OpenCV and use their Python API if that's all you need.

the end.