NetSquared enables social benefit organizations to leverage the tools of the social web.

Blogs

Hot Spot

Register for the NetSquared Conference (N2Y3) by May 16

We've opened registration for the 2008 NetSquared Conference (N2Y3). The Conference will be held at Cisco Systems' Vineyard Conference Center in San Jose, California on May 27 and 28 (just after Memorial Day).

View the N2Y3 21 Featured Projects, Register for the Net2 Conference by May 16, see the working Agenda. Participate in the DonateNow Mashup Challenge and check out the Yahoo! Green Award.

Technical Specifications for the CorpWatch Mashup

The job here is to screen scrape the subsidiary pages on the EDGAR database and translate them into an open format that will be published to the web and can be inputted into the Prefuse visualization software or another API.

The SEC's EDGAR website theoretically has lists of subsidiaries for all publicly traded companies in the USA. They are listed as "Exhibit 21" on forms S-1, S-4, S-11, F-1, F-4, 10, and the annual report filed on Form 10-K. Here are two examples:

http://www.sec.gov/Archives/edgar/data/831001/000119312507038505/dex2101.htm

http://www.sec.gov/Archives/edgar/data/12927/000119312506040952/dex21.htm

They are not in a standardized format and are HTML or plain text. These would need to be screen scraped, parsed, saved to a database, made available in an open format (so that others could use the data) and then plugged into the visualization API.

Can this data be GPL'ed? We would like to use copyleft to assure that others using this data must keep it free as well. To get the data into prefuse we would have to convert it to either GraphML or this format.

Upon further investigation I found another potential source for this data, although it may not be as useful since it is hisoric:

ftp://ftp.sec.gov/edgar/Feed/

Comments

Data Sources

ftp://ftp.sec.gov/edgar/Feed/

Looks like a daily download, in XML.

I have been lurking the

I have been lurking the group for a few weeks now, after a suggestion
from the people at MetaVid lead me here. CorpWatch is one of the
winners of the NetSquared Mashup Challenge, and our proposal is to
gather information from the SEC's EDGAR database to create a
visualization of parent company/subsidiary relationships. Here is the
full proposal: tinyurl.com/2z5f4v

I am still in the phase of gathering information on the idea, I have had
a blessing from a few of the technologists at Google, but haven't
actually nailed down a chain of tasks that will be required to get the
job done. It seems like the tough part will be designing the screen
scrape, but hopefully if we do it right it can be run routinely so that
the data stays updated.

Any thoughts on the feasibility of such a project or suggestions for
ways to tackle it would be greatly appreciated.

 

 

__________________________

Submited by : Bebes

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Latest Comments

User login

Subscribe to Net2News

Sign up for NetSquared's e-newsletter


Sitemap

About

Share

Projects

Conferences

Partner