Showing posts with label data-warehousing. Show all posts
Showing posts with label data-warehousing. Show all posts

Installing Arelle 32 bit on Windows 7

In web sites you may not find old 32 version. But if are running 32 bit windows version then you can download and install it from past versions of the website :)
Download it from here
https://web.archive.org/web/20140731125505/http://arelle.org/wordpress/wp-content/uploads/downloads/2013/08/arelle-win-x86-2013-08-15.exe



After installing you can update at menu:
Help -> Check for Updates

Start at:
File-> Open Web -> SEC RSS

This will fetch latest SEC filings.
For openning an xbrl form:
File->Open Web -> SEC RSS
which will load recent RSS feed for filings
Right clicking on any one of filings and selecting 'Filing->Open Instance Document' will open load and display every part of the filing

Related:
http://mohiplanet.blogspot.com/2016/02/getting-started-with-open-source-xbrl.html

Getting started with open source xbrl platform Arelle on CentOS

Download:
Download Arelle red hat distribution from http://arelle.org/downloads

http://arelle.org/downloads/16

Install Arelle:
Extract:
  1. tar -zxvf arelle-redhat-x86_64-2014-12-31.tar.gz
Move into directory:
  1. cd arelle-redhat-x86_64-2014-12-31

run the following to verify & see help:
  1. ./arelleCmdLine -h

Usage: arelleCmdLine [options]

Options:
  --version             show program's version number and exit
  -h, --help            show this help message and exit
  -f ENTRYPOINTFILE, --file=ENTRYPOINTFILE
                        FILENAME is an entry point, which may be an XBRL
                        instance, schema, linkbase file, inline XBRL instance,
                        testcase file, testcase index file.  FILENAME may be a
                        local file or a URI to a web located file.
  --username=USERNAME   user name if needed (with password) for web file
                        retrieval
  --password=PASSWORD   password if needed (with user name) for web retrieval
  -i IMPORTFILES, --import=IMPORTFILES
                        FILENAME is a list of files to import to the DTS, such
                        as additional formula or label linkbases.  Multiple
                        file names are separated by a '|' character.
  -d DIFFFILE, --diff=DIFFFILE
                        FILENAME is a second entry point when comparing
                        (diffing) two DTSes producing a versioning report.
  -r VERSREPORTFILE, --report=VERSREPORTFILE
                        FILENAME is the filename to save as the versioning
                        report.
  -v, --validate        Validate the file according to the entry file type.
                        If an XBRL file, it is validated according to XBRL
                        validation 2.1, calculation linkbase validation if
                        either --calcDecimals or --calcPrecision are
                        specified, and SEC EDGAR Filing Manual (if --efm
                        selected) or Global Filer Manual disclosure system
                        validation (if --gfm=XXX selected). If a test suite or
                        testcase, the test case variations are individually so
                        validated. If formulae are present they will be
                        validated and run unless --formula=none is specified.
  --calcDecimals        Specify calculation linkbase validation inferring
                        decimals.
  --calcPrecision       Specify calculation linkbase validation inferring
                        precision.
  --efm                 Select Edgar Filer Manual (U.S. SEC) disclosure system
                        validation (strict).
  --disclosureSystem=DISCLOSURESYSTEMNAME
                        Specify a disclosure system name and select disclosure
                        system validation.  Enter --disclosureSystem=help for
                        list of names or help-verbose for list of names and
                        descriptions.
  --hmrc                Select U.K. HMRC disclosure system validation.
  --utr                 Select validation with respect to Unit Type Registry.
  --utrUrl=UTRURL       Override disclosure systems Unit Type Registry
                        location (URL or file path).
  --infoset             Select validation with respect testcase infosets.
  --labelLang=LABELLANG
                        Language for labels in following file options
                        (override system settings)
  --labelRole=LABELROLE
                        Label role for labels in following file options
                        (instead of standard label)
  --DTS=DTSFILE, --csvDTS=DTSFILE
                        Write DTS tree into FILE (may be .csv or .html)
  --facts=FACTSFILE, --csvFacts=FACTSFILE
                        Write fact list into FILE
  --factListCols=FACTLISTCOLS
                        Columns for fact list file
  --factTable=FACTTABLEFILE, --csvFactTable=FACTTABLEFILE
                        Write fact table into FILE
  --concepts=CONCEPTSFILE, --csvConcepts=CONCEPTSFILE
                        Write concepts into FILE
  --pre=PREFILE, --csvPre=PREFILE
                        Write presentation linkbase into FILE
  --cal=CALFILE, --csvCal=CALFILE
                        Write calculation linkbase into FILE
  --dim=DIMFILE, --csvDim=DIMFILE
                        Write dimensions (of definition) linkbase into FILE
  --formulae=FORMULAEFILE, --htmlFormulae=FORMULAEFILE
                        Write formulae linkbase into FILE
  --viewArcrole=VIEWARCROLE
                        Write linkbase relationships for viewArcrole into
                        viewFile
  --viewFile=VIEWFILE   Write linkbase relationships for viewArcrole into
                        viewFile
  --roleTypes=ROLETYPESFILE
                        Write defined role types into FILE
  --arcroleTypes=ARCROLETYPESFILE
                        Write defined arcrole types into FILE
  --testReport=TESTREPORT, --csvTestReport=TESTREPORT
                        Write test report of validation (of test cases) into
                        FILE
  --testReportCols=TESTREPORTCOLS
                        Columns for test report file
  --rssReport=RSSREPORT
                        Write RSS report into FILE
  --rssReportCols=RSSREPORTCOLS
                        Columns for RSS report file
  --skipDTS             Skip DTS activities (loading, discovery, validation),
                        useful when an instance needs only to be parsed.
  --skipLoading=SKIPLOADING
                        Skip loading discovered or schemaLocated files
                        matching pattern (unix-style file name patterns
                        separated by '|'), useful when not all linkbases are
                        needed.
  --logFile=LOGFILE     Write log messages into file, otherwise they go to
                        standard output.  If file ends in .xml it is xml-
                        formatted, otherwise it is text.
  --logFormat=LOGFORMAT
                        Logging format for messages capture, otherwise default
                        is "[%(messageCode)s] %(message)s - %(file)s".
  --logLevel=LOGLEVEL   Minimum level for messages capture, otherwise the
                        message is ignored.  Current order of levels are
                        debug, info, info-semantic, warning, warning-semantic,
                        warning, assertion-satisfied, inconsistency, error-
                        semantic, assertion-not-satisfied, and error.
  --logLevelFilter=LOGLEVELFILTER
                        Regular expression filter for logLevel.  (E.g., to not
                        match *-semantic levels,
                        logLevelFilter=(?!^.*-semantic$)(.+).
  --logCodeFilter=LOGCODEFILTER
                        Regular expression filter for log message code.
  --parameters=PARAMETERS
                        Specify parameters for formula and validation
                        (name=value[,name=value]).
  --parameterSeparator=PARAMETERSEPARATOR
                        Specify parameters separator string (if other than
                        comma).
  --formula=FORMULAACTION
                        Specify formula action: validate - validate only,
                        without running, run - validate and run, or none -
                        prevent formula validation or running when also
                        specifying -v or --validate.  if this option is not
                        specified, -v or --validate will validate and run
                        formulas if present
  --formulaParamExprResult
                        Specify formula tracing.
  --formulaParamInputValue
                        Specify formula tracing.
  --formulaCallExprSource
                        Specify formula tracing.
  --formulaCallExprCode
                        Specify formula tracing.
  --formulaCallExprEval
                        Specify formula tracing.
  --formulaCallExprResult
                        Specify formula tracing.
  --formulaVarSetExprEval
                        Specify formula tracing.
  --formulaVarSetExprResult
                        Specify formula tracing.
  --formulaVarSetTiming
                        Specify showing times of variable set evaluation.
  --formulaAsserResultCounts
                        Specify formula tracing.
  --formulaSatisfiedAsser
                        Specify formula tracing.
  --formulaUnsatisfiedAsser
                        Specify formula tracing.
  --formulaUnsatisfiedAsserError
                        Specify formula tracing.
  --formulaFormulaRules
                        Specify formula tracing.
  --formulaVarsOrder    Specify formula tracing.
  --formulaVarExpressionSource
                        Specify formula tracing.
  --formulaVarExpressionCode
                        Specify formula tracing.
  --formulaVarExpressionEvaluation
                        Specify formula tracing.
  --formulaVarExpressionResult
                        Specify formula tracing.
  --formulaVarFilterWinnowing
                        Specify formula tracing.
  --formulaVarFiltersResult
                        Specify formula tracing.
  --formulaRunIDs=FORMULARUNIDS
                        Specify formula/assertion IDs to run, separated by a
                        '|' character.
  --uiLang=UILANG       Language for user interface (override system settings,
                        such as program messages).  Does not save setting.
  --proxy=PROXY         Modify and re-save proxy settings configuration.
                        Enter 'system' to use system proxy setting, 'none' to
                        use no proxy, 'http://[user[:password]@]host[:port]'
                        (e.g., http://192.168.1.253, http://example.com:8080,
                        http://joe:secret@example.com:8080),  or 'show' to
                        show current setting, .
  --internetConnectivity=INTERNETCONNECTIVITY
                        Specify internet connectivity: online or offline
  --internetTimeout=INTERNETTIMEOUT
                        Specify internet connection timeout in seconds (0
                        means unlimited).
  --internetRecheck=INTERNETRECHECK
                        Specify rechecking cache files (weekly is default)
  --internetLogDownloads
                        Log info message for downloads to web cache.
  --xdgConfigHome=XDGCONFIGHOME
                        Specify non-standard location for configuration and
                        cache files (overrides environment parameter
                        XDG_CONFIG_HOME).
  --plugins=PLUGINS     Modify plug-in configuration.  Re-save unless 'temp'
                        is in the module list.  Enter 'show' to show current
                        plug-in configuration.  Commands show, and module urls
                        are '|' separated: +url to add plug-in by its url or
                        filename, ~name to reload a plug-in by its name, -name
                        to remove a plug-in by its name, relative URLs are
                        relative to installation plug-in directory,  (e.g.,
                        '+http://arelle.org/files/hello_web.py', '+C:\Program
                        Files\Arelle\examples\plugin\hello_dolly.py' to load,
                        or +../examples/plugin/hello_dolly.py for relative use
                        of examples directory, ~Hello Dolly to reload, -Hello
                        Dolly to remove).  If + is omitted from .py file
                        nothing is saved (same as temp).  Packaged plug-in
                        urls are their directory's url.
  --packages=PACKAGES   Modify taxonomy packages configuration.  Re-save
                        unless 'temp' is in the module list.  Enter 'show' to
                        show current packages configuration.  Commands show,
                        and module urls are '|' separated: +url to add package
                        by its url or filename, ~name to reload package by its
                        name, -name to remove a package by its name, URLs are
                        full absolute paths.  If + is omitted from package
                        file nothing is saved (same as temp).
  --packageManifestName=PACKAGEMANIFESTNAME
                        Provide non-standard archive manifest file name
                        pattern (e.g., *taxonomyPackage.xml).  Uses unix file
                        name pattern matching.  Multiple manifest files are
                        supported in archive (such as oasis catalogs).
                        (Replaces search for either .taxonomyPackage.xml or
                        catalog.xml).
  --abortOnMajorError   Abort process on major error, such as when load is
                        unable to find an entry or discovered file.
  --showEnvironment     Show Arelle's config and cache directory and host OS
                        environment parameters.
  --collectProfileStats
                        Collect profile statistics, such as timing of
                        validation activities and formulae.
  --webserver=WEBSERVER
                        start web server on host:port[:server] for REST and
                        web access, e.g., --webserver locahost:8080, or
                        specify nondefault a server name, such as cherrypy,
                        --webserver locahost:8080:cherrypy. (It is possible to
                        specify options to be defaults for the web server,
                        such as disclosureSystem and validations, but not
                        including file names.)
  --store-to-XBRL-DB=STOREINTOXBRLDB
                        Store into XBRL DB.  Provides connection string: host,
                        port,user,password,database[,timeout[,{postgres|rexste
                        r|rdfDB}]]. Autodetects database type unless 7th
                        parameter is provided.
  --load-from-XBRL-DB=LOADFROMXBRLDB
                        Load from XBRL DB.  Provides connection string: host,p
                        ort,user,password,database[,timeout[,{postgres|rexster
                        |rdfDB}]]. Specifies DB parameters to load and
                        optional file to save XBRL into.
  -a, --about           Show product version, copyright, and license.

[ make sure you have installed python  and pg8000 ] run:
  1. pip install pg8000

Creating the database:

Download database from here:  https://github.com/Arelle/Arelle/blob/master/arelle/plugin/xbrlDB/xbrlSemanticPostgresDB.ddl:
  1. wget --no-check-certificate https://github.com/Arelle/Arelle/blob/master/arelle/plugin/xbrlDB/xbrlSemanticPostgresDB.ddl
  1. su - postgres
  2. create database sec
Run DDL script on 'sec' database:
  1. psql -h HOST -U USERNAME -d sec -a -f xbrlSemanticPostgresDB.ddl

Add plugins:
see all installed plugins:
  1. ./arelleCmdLine --plugins show
Should produce something like:
[info] Plug-in modules: -
[info] Plug-in: XBRL Database; author: Mark V Systems Limited; version: 0.9; status: enabled; date: 2014-12-09T04:41:53 UTC; description: This plug-in implements the XBRL Public Postgres, Abstract Model and DPM Databases.  ; license Apache-2 (Arelle plug-in), BSD license (pg8000 library). - xbrlDB
Install plugin xbrlDB:
  1. ./arelleCmdLine --plugins +xbrlDB
should print:
[info] Addition of plug-in XBRL Database successful. - xbrlDB
Scrape, parse and populate SEC filing data into DB:
run:
  1. ./arelleCmdLine -f https://www.sec.gov/Archives/edgar/xbrlrss.all.xml -v --store-to-XBRL-DB "HOST,5432,usrname,password,sec,120,pgSemantic"
This will download latest 100 SEC filings and store tag based xbrl data at database HOST:5432/sec

To download,parse and store all SEC filing for a single month say 2016 January  we can simple run:
  1. ./arelleCmdLine -f https://www.sec.gov/Archives/edgar/monthly/xbrlrss-2016-01.xml
  2. -v --store-to-XBRL-DB "HOST,5432,username,password,sec,120,pgSemantic"

Querying the database:
Say for a single company central index key 0001372183 the following query will get all tag based financial information:
  1. select distinct period.end_date from aspect,data_point,period,entity_identifier where aspect.aspect_id=data_point.aspect_id and data_point.report_id=entity_identifier.report_id and period.period_id=data_point.period_id and entity_identifier.identifier='0001372183';
Sample output:
dei_AmendmentFlag,false,2015-12-01
dei_CurrentFiscalYearEndDate,--02-29,2015-12-01
dei_DocumentFiscalPeriodFocus,Q3,2015-12-01
dei_DocumentFiscalYearFocus,2016,2015-12-01
dei_DocumentPeriodEndDate,2015-11-30,2015-12-01
dei_DocumentType,10-Q,2015-12-01
dei_EntityCentralIndexKey,0001372183,2015-12-01
dei_EntityCommonStockSharesOutstanding,5491753,2016-01-19
dei_EntityFilerCategory,Smaller Reporting Company,2015-12-01
dei_EntityRegistrantName,"Monaker Group, Inc.",2015-12-01
dei_TradingSymbol,MKGI,2015-12-01
invest_InvestmentWarrantsExercisePrice,0.05,2012-08-22
invest_InvestmentWarrantsExercisePrice,3,2009-03-01
mkgi_AdditionalExpendituresForCostsAssociatedWithEmploymentWebsite,10000,2015-12-01
mkgi_AdditionalOffsettingRentExpenseMonthly,2500,2015-12-01
mkgi_AdvancesConversionConvertedIntoPromissoryNoteAmount,70000,2011-04-14
mkgi_AdvancesToFormerSubsidiary,75000,2015-12-01
mkgi_AssetsImpairmentChargesShares,0,2015-12-01
mkgi_AssignmentOfPrincipalToNonRelatedParty,225000,2012-02-16
mkgi_CarryingValueOfBusinessAfterAdjustments,7811286,2014-11-01
mkgi_CarryingValueOfBusinessAfterAudit,1556098,2014-11-01
mkgi_CashFromMerger,0,2014-12-01
mkgi_CashFromMerger,56902,2015-12-01
...............................................................
...............................................................
...............................................................

Full Output File:
https://www.dropbox.com/s/m9finc2x88k75jl/sec-filing-1372183.csv?dl=0
Here is a list of all us-gaap tags with descriptions:
https://github.com/ifanchu/pyXBRL/blob/master/us-gaap/concepts_2014.csv
Cleaning up:
By default arelle puts every downloaded filings at: 
/root/.config/arelle/cache/

Sp periodically you may need cleaning up:
  1. rm -rf /root/.config/arelle/cache/*

References:
http://arelle.org/documentation/xbrl-database/
http://www.openfiling.info/wp-content/upLoads/data/ArelleUsersManual.pdf
http://arelle.org/wordpress/wp-content/uploads/downloads/2011/09/ComparabilityAndDataMiningUnifiedModel-Paper.pdf

Related:
http://mohiplanet.blogspot.com/2016/02/installing-arelle-32-bit-on-windows-7.html

Getting started with Python, Web Scraping, MS SQL Server, Windows with a web crawler

For getting started install python 2.7 on win7 with this *.bat script here:
http://mohiplanet.blogspot.com/2015/12/install-python-on-windows-7-scriptbat.html

Download SQL Server 2005 :
https://www.microsoft.com/en-us/download/details.aspx?id=21844
SQL Server 2005 Management Studio :
www.microsoft.com/en-us/download/details.aspx?id=8961
If you are  used to with terminal you can rather install command line client rather than visual management studio:
https://www.microsoft.com/en-us/download/details.aspx?id=36433

Make sure you have enabled Administrator mode.

After installation has completed checkout the commandline tool:

  1. sqlcmd -S .\SQLEXPRESS
  2. create some_db
  3. go
  4. use some_db
  5. go
  6. select * from some_table
  7. go
Scraping FEC(Federal Election Commission) Filings (Getting started with a simple crawler) :
Download a sample scraper which downloads all Federal Election Commission electronic filings:
  1. git clone https://github.com/cschnaars/FEC-Scraper/
  2. cd FEC-Scraper
Load FEC sql database into sql server through script:
  1. sqlcmd -S .\SQLEXPRESS
  2. create database FEC
  3. go
  4. exit
  5. sqlcmd -S .\SQLEXPRESS -i FECScraper.sql
  6. go

Setup connection string in both of  FECScraper.py and FECParser.py as follows:
  1. connstr = 'DRIVER={SQL Server};SERVER=.\SQLEXPRESS;DATABASE=FEC;UID=;PWD=;'


create the following directories for convenience of the crawler:
  1. mkdir C:\Data\
  2. mkdir C:\Data\Python
  3. mkdir C:\Data\Python\FEC
  4. mkdir C:\Data\Python\FEC\Import
  5. mkdir C:\Data\Python\FEC\Review
  6. mkdir C:\Data\Python\FEC\Processed
  7. mkdir C:\Data\Python\FEC\Output

In case you can't find any data filings:
Check out this working code:
https://drive.google.com/file/d/0B5hTtesq_tWdZFo3eThQRzY3aEU/view?usp=sharing
as last time I had to change one CSS Query from "Form F3" to"F3" in FECScraper.py

Check a sample commitee for downloading specific filings:
Add one committe id
commidappend.txt content:
  1. echo C00494393 > commidappend.txt




--------------------------------------------------------------------------------------------------------------
Doing more on scraping FEC filings :
The latest FEC scraper supports all FEC filings from v1 to v8.1  : 
it has 8.1 filing version support:
  1. git clone https://github.com/cschnaars/FEC-Scraper-Toolbox
  2. cd FEC-Scraper-Toolbox
  3. :: make sure you create following directories
  4. mkdir C:\Data\FEC\Master
  5. mkdir C:\Data\FEC\Master\Archive
  6. mkdir C:\Data\FEC\Reports\ErrorLogs
  7. mkdir C:\Data\FEC\Reports\Hold
  8. mkdir C:\Data\FEC\Reports\Output
  9. mkdir C:\Data\FEC\Reports\Processed
  10. mkdir C:\Data\FEC\Reports\Review
  11. mkdir C:\Data\FEC\Reports\Import
  12. mkdir C:\Data\FEC\Archives\Processed
  13. mkdir C:\Data\FEC\Archives\Import
  14. :: run the update_master_files.py which download all committees lists along with
  15. :: tons of other info.
  16. python update_master_files.py
  17. :: run this for downloading daily filings
  18. python download_reports.py
  19. :: run this for parsing and mapping the filing data into database
  20. python parse_reports.py
  21. :: make sure to running the db sql script first
  22. :: https://drive.google.com/file/d/0B5hTtesq_tWdYUVRSzNCcHlJYjA/view?usp=sharing
  23. :: and Import directory has *.fec files and not downloaded *.zip files
Please see this if you dont find any of this commands above not installed :
http://mohiplanet.blogspot.com/2015/10/convert-windows-command-prompt-to-linux.html

References:
https://s3.amazonaws.com/NICAR2015/FEC/MiningFECData.pdf

Getting started with web crawling with Ruby 2 on CentOS

Default previously installed ruby version with CentOS 6 may create lots of setup issues while installing crawler packages ( gems ). Following are a series of scripts that will help us installing a fresh copy of Ruby 2.2.3. Please ignore the slashes('#') and shell script comment blocks (:<<'END' and END) as they will help you automatically comment out additional text messages added to this script while you select & copy all these codes of the script.

As this will help you automatically ignore comments as you go along and paste all this codes in your terminal and get the job done :)

# Uninstall previous gems :
  1. gem update --system
  2. gem --version
  3. # 2.1.8
  4. gem uninstall --all
# Remove previous ruby installation :
  1. rm -f /usr/bin/ruby
  2. rm -f /usr/local/bin/ruby
  3. yum remove ruby -y
  4. yum remove rubygems
  5. #update yum
  6. yum update -y
# Install Ruby 2 on CentOS 6 :
  1. cd /opt/
  2. #download
  3. wget --no-check-certificate https://ftp.ruby-lang.org/pub/ruby/ruby-2.2.3.tar.gz
  4. #extract
  5. tar xvzf ruby-2.2.3.tar.gz
  6. #remove backup
  7. rm -f ruby-2.2.3.tar.gz
  8. cd ruby-2.2.3
  9. #build
  10. ./configure
  11. make
  12. make install
  13. #create symlinks
  14. ln -s /opt/ruby-2.2.3/ruby /usr/bin/ruby
  15. ln -s /opt/ruby-2.2.3/ruby /usr/local/bin/ruby
  16. #check ruby version
  17. ruby --version
  18. #should produce something like:
  19. #ruby 2.2.3p173 (2015-08-18 revision 51636) [i686-linux]

#Install updated rubygems:
  1. cd /opt/
  2. #download rubygems 1.8
  3. wget http://production.cf.rubygems.org/rubygems/rubygems-1.8.24.tgz
  4. #extract
  5. tar xvzf rubygems-1.8.24.tgz
  6. #remove backup
  7. rm -f rubygems-1.8.24.tgz
  8. cd rubygems-1.8.24
  9. ruby setup.rb
  10. #check gem version
  11. gem --version
  12. #should produce something like
  13. #1.8.24


  1. #check Ruby REPL version
  2. irb --version
  3. #should produce something like
  4. #irb 0.9.5(05/04/13)

  1. #install a sample crawler package
  2. gem install fech
  3. #see installed crawler package version
  4. gem list fech
  5. #should produce something like this:
  6. #
  7. #*** LOCAL GEMS ***
  8. #
  9. #fech(1.8)
  10. #
# Run the ruby REPL:

  1. irb
  2. #Checkout Helloworld!
  3. puts 'Helloworld'
  4. #run the follwing lines in REPL and check out crawled data 
  5. # by installed FEC crawler package at:
  6. #/tmp/723604.fec
  7. filing = Fech::Filing.new(723604)
  8. filing.download

# See properties and methods of a ruby object:
  1. filing.inspect
  2. #will print all properties of this object
  3. : <<'END'
  4. <Fech::Filing:0x29ba8b4 @filing_id=1029398, @download_dir=\"/tmp\", @translator=nil, @quote_char=\"\\\"\", @csv_parser=Fech::Csv, @resaved=false, @customized=false, @encoding=\"iso-8859-1:utf-8\">"
  5. END
  6. filing.methods.sort
  7. #will print properties + all methods as well
  8. : <<'END'
  9. [:!, :!=, :!~, :<=>, :==, :===, :=~, :__id__, :__send__, :amendment?, :amends, :class, :clone, :custom_file_path, :define_singleton_method, :delimiter, :display, :download, :download_dir, :download_dir=, :dup, :each_row, :each_row_with_index, :enum_for, :eql?, :equal?, :extend, :file_contents, :file_name, :file_path, :filing_id, :filing_id=, :filing_url, :filing_version, :fix_f99_contents, :form_type, :freeze, :frozen?, :hash, :hash_zip, :header, :inspect, :instance_eval, :instance_exec, :instance_of?, :instance_variable_defined?, :instance_variable_get, :instance_variable_set, :instance_variables, :is_a?, :itself, :kind_of?, :map, :map_for, :mappings, :method, :methods, :nil?, :object_id, :parse_filing_version, :parse_row?, :private_methods, :protected_methods, :public_method, :public_methods, :public_send, :readable?, :remove_instance_variable, :resave_f99_contents, :respond_to?, :rows_like, :send, :singleton_class, :singleton_method, :singleton_methods, :summary, :taint, :tainted?, :tap, :to_enum, :to_s, :translate, :translator, :trust, :untaint, :untrust, :untrusted?]
  10. END

Be sure to download wget before running the script

A simple crawl script with ruby:


Following is a sample ruby script that downloads all F3P filings from FEC Wensite:
  1. require 'fech'
  2. require 'fileutils'
  3. require 'logger'
  4. # 100MB logger
  5. logger = Logger.new('fec-f3p-filings-downloader.log', 10, 102400000)
  6. # download filings from 2001 to 2015 Nov 13
  7. for i in 11850..1032472
  8. filing = Fech::Filing.new(i)
  9. logger.info("Downloading... #{i}.fec")
  10. filing.download
  11. #TODO: check type
  12. #TODO: delete if not F3P
  13. type = filing.form_type
  14. if type.include? "F3P"
  15. logger.info("filing is F3P type")
  16. #or move to /usr/local/fec-f3p-filings/
  17. logger.info("moving to into filings directory...")
  18. FileUtils.mv("/tmp/#{i}.fec", "/usr/local/fec-f3p-filings/#{i}.fec")
  19. else
  20. logger.info("Form type is #{type}")
  21. logger.info("Deleting... /tmp/#{i}.fec")
  22. FileUtils.rm("/tmp/#{i}.fec")
  23. end
  24. end